summaryrefslogtreecommitdiff
path: root/src/reader.rs
AgeCommit message (Collapse)Author
2023-09-03fix!: make PosTrackingReader encoding-independentMartin Fischer
While much of the span logic currently assumes UTF-8, we also want to support other character encodings, such as e.g. UTF-16 where characters can take up more or less bytes than in UTF-8.
2023-09-03fix: BufReadReader skips line on invalid UTF-8Martin Fischer
2023-09-03test: verify BufReadReader skips line on invalid UTF-8Martin Fischer
2023-09-03docs: document that BufReadReader reads UTF-8Martin Fischer
2023-08-19feat: introduce NaiveParserMartin Fischer
2023-08-19break!: stop re-exporting reader traits & typesMartin Fischer
This is primarily done to make the rustdoc more readable (by grouping Reader, IntoReader, StringReader and BufReadReader in the reader module). Ideally IntoReader is already implemented for your input type and you don't have to concern yourself with these traits / types at all.
2023-08-19docs: remove `crate::` from link labelsMartin Fischer
2023-08-19break!: merge Tokenizer::new_with_emitter into Tokenizer::newMartin Fischer
The Tokenizer does not perform any state switching, since proper state switching requires a feedback loop between tokenization and DOM tree building. Using the Tokenizer directly therefore is a bit of a pitfall, since you might not expect it to e.g. tokenize `<script><b>` as: StartTag(StartTag { name: "script", .. }) StartTag(StartTag { name: "b", .. }) Since we don't want to make walking into pitfalls particularly easy, this commit changes the Tokenizer::new method so that you have to specify the Emitter. Since this makes new_with_emitter redundant it is removed.
2023-08-19docs: move note about Reader impls to Reader traitMartin Fischer
2023-08-19break!: remove Never in favor of std::convert::InfallibleMartin Fischer
This change is a backport of 04e6cbe[1] from html5gum. [1]: https://github.com/untitaker/html5gum/commit/04e6cbe44bb7a388bd61d1c9cfe4c618eb3b0e29
2023-08-19docs: remove Tokenizer::new examples from Reader docsMartin Fischer
2023-08-19break!: rename Readable to IntoReaderMartin Fischer
The trait of the standard library is also called IntoIterator and not Iterable.
2021-12-05rename to html5tokenizer, bump versionv0.4.0Martin Fischer
2021-11-26Read html from io::BufRead (#8)Markus Unterwaditzer
2021-11-26clean up reader interfaceMarkus Unterwaditzer
2021-11-24hello worldMarkus Unterwaditzer