# html5tokenizer [](https://docs.rs/html5tokenizer) [](https://crates.io/crates/html5tokenizer) Spec-compliant HTML parsing [requires both tokenization and tree-construction][parsing model]. While this crate implements a spec-compliant HTML tokenizer it does not implement any tree-construction. Instead it just provides a `NaiveParser` that may be used as follows: ```rust use std::fmt::Write; use html5tokenizer::{NaiveParser, Token}; let html = "<title >hello world</title>"; let mut new_html = String::new(); for token in NaiveParser::new(html).flatten() { match token { Token::StartTag(tag) => { write!(new_html, "<{}>", tag.name).unwrap(); } Token::String(hello_world) => { write!(new_html, "{}", hello_world).unwrap(); } Token::EndTag(tag) => { write!(new_html, "</{}>", tag.name).unwrap(); } _ => panic!("unexpected input"), } } assert_eq!(new_html, "<title>hello world</title>"); ``` This library can provide source spans. For an example, see [`examples/spans.rs`], which produces the following output: ```output id=spans note: ┌─ file.html:1:2 │ 1 │ <img src=example.jpg alt="some description"> │ ^^^ ^^^ ^^^^^^^^^^^ ^^^ ^^^^^^^^^^^^^^^^ attr value │ │ │ │ │ │ │ │ │ attr name │ │ │ attr value │ │ attr name │ tag name ``` ## Limitations * This crate does not yet implement tree construction (which is necessary for spec-compliant HTML parsing). * This crate does not yet implement [character encoding detection]. * This crate does not yet implement spans for character tokens. ## Compliance & testing The tokenizer passes the [html5lib tokenizer test suite]. The library is not yet fuzz tested. ## Credits html5tokenizer was forked from [html5gum] 0.2.1, which was created by Markus Unterwaditzer who deserves major props for implementing all 80 (!) tokenizer states. * Code span support has been added. * The API has been revised. For details please refer to the [changelog]. ## License Licensed under the MIT license, see [the LICENSE file]. [parsing model]: https://html.spec.whatwg.org/multipage/parsing.html#overview-of-the-parsing-model [`examples/spans.rs`]: ./examples/spans.rs [character encoding detection]: https://html.spec.whatwg.org/multipage/parsing.html#determining-the-character-encoding [html5lib tokenizer test suite]: https://github.com/html5lib/html5lib-tests/tree/master/tokenizer [html5gum]: https://crates.io/crates/html5gum [changelog]: ./CHANGELOG.md [the LICENSE file]: ./LICENSE