Age | Commit message (Collapse) | Author | |
---|---|---|---|
2023-08-19 | docs: remove `crate::` from link labels | Martin Fischer | |
2023-08-19 | docs: move `produce ("emit")` clue to Emitter doc | Martin Fischer | |
2023-08-19 | break!: merge Tokenizer::new_with_emitter into Tokenizer::new | Martin Fischer | |
The Tokenizer does not perform any state switching, since proper state switching requires a feedback loop between tokenization and DOM tree building. Using the Tokenizer directly therefore is a bit of a pitfall, since you might not expect it to e.g. tokenize `<script><b>` as: StartTag(StartTag { name: "script", .. }) StartTag(StartTag { name: "b", .. }) Since we don't want to make walking into pitfalls particularly easy, this commit changes the Tokenizer::new method so that you have to specify the Emitter. Since this makes new_with_emitter redundant it is removed. | |||
2023-08-19 | docs: move note about Reader impls to Reader trait | Martin Fischer | |
2023-08-19 | break!: remove Default impl for Attribute | Martin Fischer | |
2023-08-19 | break!: remove Default impls for StartTag and EndTag | Martin Fischer | |
2023-08-19 | break!: privatize PosTrackingReader fields | Martin Fischer | |
2023-08-19 | break!: rename PosTracker to PosTrackingReader | Martin Fischer | |
2023-08-19 | break!: remove Never in favor of std::convert::Infallible | Martin Fischer | |
This change is a backport of 04e6cbe[1] from html5gum. [1]: https://github.com/untitaker/html5gum/commit/04e6cbe44bb7a388bd61d1c9cfe4c618eb3b0e29 | |||
2023-08-19 | break!: remove InfallibleTokenizer in favor of Iterator::flatten | Martin Fischer | |
2023-08-19 | docs: remove Tokenizer::new examples from Reader docs | Martin Fischer | |
2023-08-19 | break!: rename Readable to IntoReader | Martin Fischer | |
The trait of the standard library is also called IntoIterator and not Iterable. | |||
2023-08-19 | fix(docs): remove outdated list of Readable impls | Martin Fischer | |
dced8066f77f570dd3e396ec3570c71aa86c454e introduced a Readable impl for std::io::BufReader. Manually listing impls in a doc comment is a bad idea since such lists will just get out of date and there's no need for that since rustdoc automatically lists all implementations on the trait page. | |||
2023-08-19 | fix(docs): fix Error variant doc saying `$literal` | Martin Fischer | |
2023-08-19 | fix(docs): Span is a byte range (not character range) | Martin Fischer | |
2023-08-19 | fix(docs): StartTag is a start tag | Martin Fischer | |
2023-08-19 | fix(docs): Error::EndTagWithAttributes should be emitted by emit_current_tag | Martin Fischer | |
2023-08-19 | break!: remove StartTag::next_state | Martin Fischer | |
You shouldn't manually have to match tokens yielded by the tokenizer iterator just to correctly handle state transitions. A better NaiveParser API will be introduced. | |||
2023-08-19 | break!: remove set_last_start_tag from Emitter | Martin Fischer | |
2021-12-05 | rename to html5tokenizer, bump versionv0.4.0 | Martin Fischer | |
2021-12-05 | spans: get rid of code duplication by introducing Span trait | Martin Fischer | |
2021-12-05 | spans: refactor to avoid one clone() | Martin Fischer | |
2021-12-05 | rename internal emit_error to push_error (to avoid confusion with trait method) | Martin Fischer | |
2021-12-05 | improve duplicate attribute span | Martin Fischer | |
2021-12-05 | refactor: match btree_map::Entry instead of using and_modify closure | Martin Fischer | |
2021-12-05 | spans: slightly refactor DefaultEmitter | Martin Fischer | |
2021-12-05 | spans: add spans to Token::Error | Martin Fischer | |
2021-12-05 | spans: fix spans for quoted attribute values | Martin Fischer | |
2021-12-05 | spans: support attribute values | Martin Fischer | |
2021-12-05 | spans: support attribute names | Martin Fischer | |
2021-12-05 | spans: add span tests | Martin Fischer | |
2021-12-05 | spans: start implementing SpanEmitter | Martin Fischer | |
2021-12-05 | spans: introduce PosTracker | Martin Fischer | |
2021-12-05 | spans: introduce GetPos trait | Martin Fischer | |
2021-12-05 | spans: rename to SpanEmitter, adjust generics | Martin Fischer | |
2021-12-05 | spans: copy DefaultEmitter to new span module | Martin Fischer | |
2021-12-05 | spans: make Emitter generic over Reader | Martin Fischer | |
2021-12-05 | spans: make Emitter generic over Span | Martin Fischer | |
2021-12-05 | fix wrong state transition in ScriptDataLessThanSign state | Martin Fischer | |
Before the following happened: % printf '<script><b>test</b></script>' | cargo run --example=switch-state StartTag(StartTag { self_closing: false, name: "script", attributes: {} }) String("<b>test") EndTag(EndTag { name: "b" }) EndTag(EndTag { name: "script" }) Which is obviously wrong. After a <script> tag we want to switch to the ScriptData state (instead of the Data state). This commit fixes this implementation error, making the above command produce the expected output of: StartTag(StartTag { self_closing: false, name: "script", attributes: {} }) String("<b>test</b>") EndTag(EndTag { name: "script" }) | |||
2021-12-05 | introduce StartTag::next_state | Martin Fischer | |
Closes #11. | |||
2021-12-05 | allow setting the Tokenizer to Data, PlainText, RcData, RawText and ↵ | Martin Fischer | |
ScriptData states | |||
2021-12-05 | prepare for introduction of public State enum | Martin Fischer | |
2021-12-03 | fix new clippy | Markus Unterwaditzer | |
2021-11-28 | clarify what html5gum isn't, fix #5 | Markus Unterwaditzer | |
2021-11-27 | fix crash in try_read_string | Markus Unterwaditzer | |
2021-11-27 | split up match-arms and tokenizer to isolate some tokenizer-internal state | Markus Unterwaditzer | |
purpose: don't want to expose self.to_reconsume to the consume() method | |||
2021-11-26 | Read html from io::BufRead (#8) | Markus Unterwaditzer | |
2021-11-26 | clean up reader interface | Markus Unterwaditzer | |
2021-11-24 | hello world | Markus Unterwaditzer | |