summaryrefslogtreecommitdiff
path: root/src
AgeCommit message (Collapse)Author
2023-08-19refactor!: remove current_is_appropriate_end_tag_token from EmitterMartin Fischer
2023-08-19refactor: proxy essential Emitter methods through TokenizerMartin Fischer
2023-08-19break!: stop re-exporting reader traits & typesMartin Fischer
This is primarily done to make the rustdoc more readable (by grouping Reader, IntoReader, StringReader and BufReadReader in the reader module). Ideally IntoReader is already implemented for your input type and you don't have to concern yourself with these traits / types at all.
2023-08-19docs: remove `crate::` from link labelsMartin Fischer
2023-08-19docs: move `produce ("emit")` clue to Emitter docMartin Fischer
2023-08-19break!: merge Tokenizer::new_with_emitter into Tokenizer::newMartin Fischer
The Tokenizer does not perform any state switching, since proper state switching requires a feedback loop between tokenization and DOM tree building. Using the Tokenizer directly therefore is a bit of a pitfall, since you might not expect it to e.g. tokenize `<script><b>` as: StartTag(StartTag { name: "script", .. }) StartTag(StartTag { name: "b", .. }) Since we don't want to make walking into pitfalls particularly easy, this commit changes the Tokenizer::new method so that you have to specify the Emitter. Since this makes new_with_emitter redundant it is removed.
2023-08-19docs: move note about Reader impls to Reader traitMartin Fischer
2023-08-19break!: remove Default impl for AttributeMartin Fischer
2023-08-19break!: remove Default impls for StartTag and EndTagMartin Fischer
2023-08-19break!: privatize PosTrackingReader fieldsMartin Fischer
2023-08-19break!: rename PosTracker to PosTrackingReaderMartin Fischer
2023-08-19break!: remove Never in favor of std::convert::InfallibleMartin Fischer
This change is a backport of 04e6cbe[1] from html5gum. [1]: https://github.com/untitaker/html5gum/commit/04e6cbe44bb7a388bd61d1c9cfe4c618eb3b0e29
2023-08-19break!: remove InfallibleTokenizer in favor of Iterator::flattenMartin Fischer
2023-08-19docs: remove Tokenizer::new examples from Reader docsMartin Fischer
2023-08-19break!: rename Readable to IntoReaderMartin Fischer
The trait of the standard library is also called IntoIterator and not Iterable.
2023-08-19fix(docs): remove outdated list of Readable implsMartin Fischer
dced8066f77f570dd3e396ec3570c71aa86c454e introduced a Readable impl for std::io::BufReader. Manually listing impls in a doc comment is a bad idea since such lists will just get out of date and there's no need for that since rustdoc automatically lists all implementations on the trait page.
2023-08-19fix(docs): fix Error variant doc saying `$literal`Martin Fischer
2023-08-19fix(docs): Span is a byte range (not character range)Martin Fischer
2023-08-19fix(docs): StartTag is a start tagMartin Fischer
2023-08-19fix(docs): Error::EndTagWithAttributes should be emitted by emit_current_tagMartin Fischer
2023-08-19break!: remove StartTag::next_stateMartin Fischer
You shouldn't manually have to match tokens yielded by the tokenizer iterator just to correctly handle state transitions. A better NaiveParser API will be introduced.
2023-08-19break!: remove set_last_start_tag from EmitterMartin Fischer
2021-12-05rename to html5tokenizer, bump versionv0.4.0Martin Fischer
2021-12-05spans: get rid of code duplication by introducing Span traitMartin Fischer
2021-12-05spans: refactor to avoid one clone()Martin Fischer
2021-12-05rename internal emit_error to push_error (to avoid confusion with trait method)Martin Fischer
2021-12-05improve duplicate attribute spanMartin Fischer
2021-12-05refactor: match btree_map::Entry instead of using and_modify closureMartin Fischer
2021-12-05spans: slightly refactor DefaultEmitterMartin Fischer
2021-12-05spans: add spans to Token::ErrorMartin Fischer
2021-12-05spans: fix spans for quoted attribute valuesMartin Fischer
2021-12-05spans: support attribute valuesMartin Fischer
2021-12-05spans: support attribute namesMartin Fischer
2021-12-05spans: add span testsMartin Fischer
2021-12-05spans: start implementing SpanEmitterMartin Fischer
2021-12-05spans: introduce PosTrackerMartin Fischer
2021-12-05spans: introduce GetPos traitMartin Fischer
2021-12-05spans: rename to SpanEmitter, adjust genericsMartin Fischer
2021-12-05spans: copy DefaultEmitter to new span moduleMartin Fischer
2021-12-05spans: make Emitter generic over ReaderMartin Fischer
2021-12-05spans: make Emitter generic over SpanMartin Fischer
2021-12-05fix wrong state transition in ScriptDataLessThanSign stateMartin Fischer
Before the following happened: % printf '<script><b>test</b></script>' | cargo run --example=switch-state StartTag(StartTag { self_closing: false, name: "script", attributes: {} }) String("<b>test") EndTag(EndTag { name: "b" }) EndTag(EndTag { name: "script" }) Which is obviously wrong. After a <script> tag we want to switch to the ScriptData state (instead of the Data state). This commit fixes this implementation error, making the above command produce the expected output of: StartTag(StartTag { self_closing: false, name: "script", attributes: {} }) String("<b>test</b>") EndTag(EndTag { name: "script" })
2021-12-05introduce StartTag::next_stateMartin Fischer
Closes #11.
2021-12-05allow setting the Tokenizer to Data, PlainText, RcData, RawText and ↵Martin Fischer
ScriptData states
2021-12-05prepare for introduction of public State enumMartin Fischer
2021-12-03fix new clippyMarkus Unterwaditzer
2021-11-28clarify what html5gum isn't, fix #5Markus Unterwaditzer
2021-11-27fix crash in try_read_stringMarkus Unterwaditzer
2021-11-27split up match-arms and tokenizer to isolate some tokenizer-internal stateMarkus Unterwaditzer
purpose: don't want to expose self.to_reconsume to the consume() method
2021-11-26Read html from io::BufRead (#8)Markus Unterwaditzer