diff options
author | Markus Unterwaditzer <markus-honeypot@unterwaditzer.net> | 2021-11-28 00:05:21 +0100 |
---|---|---|
committer | Markus Unterwaditzer <markus-honeypot@unterwaditzer.net> | 2021-11-28 00:05:21 +0100 |
commit | e14abf483b238da4d5b69dbc425b2ab80d1c3e98 (patch) | |
tree | 09801b839a98441793dafa8bd326d5df3f38d201 /README.md | |
parent | 95afc5359e940398498310d46e81352f04b43a49 (diff) |
clarify what html5gum isn't, fix #5
Diffstat (limited to 'README.md')
-rw-r--r-- | README.md | 16 |
1 files changed, 8 insertions, 8 deletions
@@ -30,16 +30,16 @@ for token in Tokenizer::new(html).infallible() { assert_eq!(new_html, "<title>hello world</title>"); ``` -It fully implements [13.2 of the WHATWG HTML -spec](https://html.spec.whatwg.org/#parsing) and passes [html5lib's tokenizer -test suite](https://github.com/html5lib/html5lib-tests/tree/master/tokenizer), -except that: +It fully implements [13.2.5 of the WHATWG HTML +spec](https://html.spec.whatwg.org/#tokenization), i.e. is able to tokenize HTML documents and passes [html5lib's tokenizer +test suite](https://github.com/html5lib/html5lib-tests/tree/master/tokenizer). Most importantly it does not: -* this implementation requires all input to be Rust strings and therefore valid - UTF-8. There is no charset detection or handling of invalid surrogates, and - the relevant html5lib tests are skipped in CI. +* [Implement charset detection.](https://html.spec.whatwg.org/#determining-the-character-encoding) This implementation requires all input to be + Rust strings and therefore valid UTF-8. -* there's some remaining testcases to be decided on at [issue 5](https://github.com/untitaker/html5gum/issues/5). +* [Correct mis-nested tags](https://html.spec.whatwg.org/#an-introduction-to-error-handling-and-strange-cases-in-the-parser) + +* Generally qualify as a complete HTML *parser* as per the WHATWG spec (yet). A distinguishing feature of `html5gum` is that you can bring your own token datastructure and hook into token creation by implementing the `Emitter` trait. |