aboutsummaryrefslogtreecommitdiff
path: root/src
AgeCommit message (Collapse)Author
2023-08-19break!: merge Tokenizer::new_with_emitter into Tokenizer::newMartin Fischer
The Tokenizer does not perform any state switching, since proper state switching requires a feedback loop between tokenization and DOM tree building. Using the Tokenizer directly therefore is a bit of a pitfall, since you might not expect it to e.g. tokenize `<script><b>` as: StartTag(StartTag { name: "script", .. }) StartTag(StartTag { name: "b", .. }) Since we don't want to make walking into pitfalls particularly easy, this commit changes the Tokenizer::new method so that you have to specify the Emitter. Since this makes new_with_emitter redundant it is removed.
2023-08-19docs: move note about Reader impls to Reader traitMartin Fischer
2023-08-19break!: remove Default impl for AttributeMartin Fischer
2023-08-19break!: remove Default impls for StartTag and EndTagMartin Fischer
2023-08-19break!: privatize PosTrackingReader fieldsMartin Fischer
2023-08-19break!: rename PosTracker to PosTrackingReaderMartin Fischer
2023-08-19break!: remove Never in favor of std::convert::InfallibleMartin Fischer
This change is a backport of 04e6cbe[1] from html5gum. [1]: https://github.com/untitaker/html5gum/commit/04e6cbe44bb7a388bd61d1c9cfe4c618eb3b0e29
2023-08-19break!: remove InfallibleTokenizer in favor of Iterator::flattenMartin Fischer
2023-08-19docs: remove Tokenizer::new examples from Reader docsMartin Fischer
2023-08-19break!: rename Readable to IntoReaderMartin Fischer
The trait of the standard library is also called IntoIterator and not Iterable.
2023-08-19fix(docs): remove outdated list of Readable implsMartin Fischer
dced8066f77f570dd3e396ec3570c71aa86c454e introduced a Readable impl for std::io::BufReader. Manually listing impls in a doc comment is a bad idea since such lists will just get out of date and there's no need for that since rustdoc automatically lists all implementations on the trait page.
2023-08-19fix(docs): fix Error variant doc saying `$literal`Martin Fischer
2023-08-19fix(docs): Span is a byte range (not character range)Martin Fischer
2023-08-19fix(docs): StartTag is a start tagMartin Fischer
2023-08-19fix(docs): Error::EndTagWithAttributes should be emitted by emit_current_tagMartin Fischer
2023-08-19break!: remove StartTag::next_stateMartin Fischer
You shouldn't manually have to match tokens yielded by the tokenizer iterator just to correctly handle state transitions. A better NaiveParser API will be introduced.
2023-08-19break!: remove set_last_start_tag from EmitterMartin Fischer
2021-12-05rename to html5tokenizer, bump versionv0.4.0Martin Fischer
2021-12-05spans: get rid of code duplication by introducing Span traitMartin Fischer
2021-12-05spans: refactor to avoid one clone()Martin Fischer
2021-12-05rename internal emit_error to push_error (to avoid confusion with trait method)Martin Fischer
2021-12-05improve duplicate attribute spanMartin Fischer
2021-12-05refactor: match btree_map::Entry instead of using and_modify closureMartin Fischer
2021-12-05spans: slightly refactor DefaultEmitterMartin Fischer
2021-12-05spans: add spans to Token::ErrorMartin Fischer
2021-12-05spans: fix spans for quoted attribute valuesMartin Fischer
2021-12-05spans: support attribute valuesMartin Fischer
2021-12-05spans: support attribute namesMartin Fischer
2021-12-05spans: add span testsMartin Fischer
2021-12-05spans: start implementing SpanEmitterMartin Fischer
2021-12-05spans: introduce PosTrackerMartin Fischer
2021-12-05spans: introduce GetPos traitMartin Fischer
2021-12-05spans: rename to SpanEmitter, adjust genericsMartin Fischer
2021-12-05spans: copy DefaultEmitter to new span moduleMartin Fischer
2021-12-05spans: make Emitter generic over ReaderMartin Fischer
2021-12-05spans: make Emitter generic over SpanMartin Fischer
2021-12-05fix wrong state transition in ScriptDataLessThanSign stateMartin Fischer
Before the following happened: % printf '<script><b>test</b></script>' | cargo run --example=switch-state StartTag(StartTag { self_closing: false, name: "script", attributes: {} }) String("<b>test") EndTag(EndTag { name: "b" }) EndTag(EndTag { name: "script" }) Which is obviously wrong. After a <script> tag we want to switch to the ScriptData state (instead of the Data state). This commit fixes this implementation error, making the above command produce the expected output of: StartTag(StartTag { self_closing: false, name: "script", attributes: {} }) String("<b>test</b>") EndTag(EndTag { name: "script" })
2021-12-05introduce StartTag::next_stateMartin Fischer
Closes #11.
2021-12-05allow setting the Tokenizer to Data, PlainText, RcData, RawText and ↵Martin Fischer
ScriptData states
2021-12-05prepare for introduction of public State enumMartin Fischer
2021-12-03fix new clippyMarkus Unterwaditzer
2021-11-28clarify what html5gum isn't, fix #5Markus Unterwaditzer
2021-11-27fix crash in try_read_stringMarkus Unterwaditzer
2021-11-27split up match-arms and tokenizer to isolate some tokenizer-internal stateMarkus Unterwaditzer
purpose: don't want to expose self.to_reconsume to the consume() method
2021-11-26Read html from io::BufRead (#8)Markus Unterwaditzer
2021-11-26clean up reader interfaceMarkus Unterwaditzer
2021-11-24hello worldMarkus Unterwaditzer