summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorMartin Fischer <martin@push-f.com>2023-09-26 08:22:21 +0200
committerMartin Fischer <martin@push-f.com>2023-09-28 10:57:11 +0200
commitee8ab781672e7ab608e74a5b605eb189828f0afe (patch)
treea5fcb14faf4dc2c29a4ed9f6a0c087acf8bcfb0d
parentade1034c46f3c5e50bdf07fd1556b7957011fe98 (diff)
fix(tokenizer): don't lowercase temp chars in ScriptDataEndTagName
This bug resulted in e.g. "<script></SCRI" being wrongly tokenized as: StartTag(StartTag { name: "script", self_closing: false, attributes: {} }) Char('<') Char('/') Char('s') Char('c') Char('r') Char('i') EndOfFile Note that the Char tokens should be uppercase. (This bug could only be observed when properly doing state switching via tree construction.)
-rw-r--r--CHANGELOG.md5
-rw-r--r--src/tokenizer/machine.rs2
2 files changed, 6 insertions, 1 deletions
diff --git a/CHANGELOG.md b/CHANGELOG.md
index b05e51f..2702e96 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -14,6 +14,11 @@
* Added a blanket implementation to implement `Reader` for boxed readers.
+#### Fixes
+
+* Removed incorrect lowercasing of char tokens when
+ an eof-in-tag error occurred in a `</script>` tag.
+
#### Breaking changes
* Byte offsets were moved out of the `Token` enum into a new `Trace` enum.
diff --git a/src/tokenizer/machine.rs b/src/tokenizer/machine.rs
index 944eb01..100f645 100644
--- a/src/tokenizer/machine.rs
+++ b/src/tokenizer/machine.rs
@@ -428,7 +428,7 @@ where
}
Some(x) if x.is_ascii_alphabetic() => {
slf.push_tag_name(ctostr!(x.to_ascii_lowercase()));
- slf.temporary_buffer.push(x.to_ascii_lowercase());
+ slf.temporary_buffer.push(x);
Ok(ControlToken::Continue)
}
c => {