html5tokenizer - Fork of html5gum with code span support

Age	Commit message (Collapse)	Author
2023-08-19	break!: merge Tokenizer::new_with_emitter into Tokenizer::new	Martin Fischer
	The Tokenizer does not perform any state switching, since proper state switching requires a feedback loop between tokenization and DOM tree building. Using the Tokenizer directly therefore is a bit of a pitfall, since you might not expect it to e.g. tokenize `<script><b>` as: StartTag(StartTag { name: "script", .. }) StartTag(StartTag { name: "b", .. }) Since we don't want to make walking into pitfalls particularly easy, this commit changes the Tokenizer::new method so that you have to specify the Emitter. Since this makes new_with_emitter redundant it is removed.
2023-08-19	docs: move note about Reader impls to Reader trait	Martin Fischer

2023-08-19	break!: remove Default impl for Attribute	Martin Fischer

2023-08-19	break!: remove Default impls for StartTag and EndTag	Martin Fischer

2023-08-19	break!: privatize PosTrackingReader fields	Martin Fischer

2023-08-19	break!: rename PosTracker to PosTrackingReader	Martin Fischer

2023-08-19	break!: remove Never in favor of std::convert::Infallible	Martin Fischer
	This change is a backport of 04e6cbe[1] from html5gum. [1]: https://github.com/untitaker/html5gum/commit/04e6cbe44bb7a388bd61d1c9cfe4c618eb3b0e29
2023-08-19	break!: remove InfallibleTokenizer in favor of Iterator::flatten	Martin Fischer

2023-08-19	docs: remove Tokenizer::new examples from Reader docs	Martin Fischer

2023-08-19	break!: rename Readable to IntoReader	Martin Fischer
	The trait of the standard library is also called IntoIterator and not Iterable.
2023-08-19	fix(docs): remove outdated list of Readable impls	Martin Fischer
	dced8066f77f570dd3e396ec3570c71aa86c454e introduced a Readable impl for std::io::BufReader. Manually listing impls in a doc comment is a bad idea since such lists will just get out of date and there's no need for that since rustdoc automatically lists all implementations on the trait page.
2023-08-19	fix(docs): fix Error variant doc saying `$literal`	Martin Fischer

2023-08-19	fix(docs): Span is a byte range (not character range)	Martin Fischer

2023-08-19	fix(docs): StartTag is a start tag	Martin Fischer

2023-08-19	fix(docs): Error::EndTagWithAttributes should be emitted by emit_current_tag	Martin Fischer

2023-08-19	break!: remove StartTag::next_state	Martin Fischer
	You shouldn't manually have to match tokens yielded by the tokenizer iterator just to correctly handle state transitions. A better NaiveParser API will be introduced.
2023-08-19	break!: remove set_last_start_tag from Emitter	Martin Fischer

2021-12-05	rename to html5tokenizer, bump versionv0.4.0	Martin Fischer

2021-12-05	spans: get rid of code duplication by introducing Span trait	Martin Fischer

2021-12-05	spans: refactor to avoid one clone()	Martin Fischer

2021-12-05	rename internal emit_error to push_error (to avoid confusion with trait method)	Martin Fischer

2021-12-05	improve duplicate attribute span	Martin Fischer

2021-12-05	refactor: match btree_map::Entry instead of using and_modify closure	Martin Fischer

2021-12-05	spans: slightly refactor DefaultEmitter	Martin Fischer

2021-12-05	spans: add spans to Token::Error	Martin Fischer

2021-12-05	spans: fix spans for quoted attribute values	Martin Fischer

2021-12-05	spans: support attribute values	Martin Fischer

2021-12-05	spans: support attribute names	Martin Fischer

2021-12-05	spans: add span tests	Martin Fischer

2021-12-05	spans: start implementing SpanEmitter	Martin Fischer

2021-12-05	spans: introduce PosTracker	Martin Fischer

2021-12-05	spans: introduce GetPos trait	Martin Fischer

2021-12-05	spans: rename to SpanEmitter, adjust generics	Martin Fischer

2021-12-05	spans: copy DefaultEmitter to new span module	Martin Fischer

2021-12-05	spans: make Emitter generic over Reader	Martin Fischer

2021-12-05	spans: make Emitter generic over Span	Martin Fischer

2021-12-05	fix wrong state transition in ScriptDataLessThanSign state	Martin Fischer
	Before the following happened: % printf '<script><b>test</b></script>' \| cargo run --example=switch-state StartTag(StartTag { self_closing: false, name: "script", attributes: {} }) String("<b>test") EndTag(EndTag { name: "b" }) EndTag(EndTag { name: "script" }) Which is obviously wrong. After a <script> tag we want to switch to the ScriptData state (instead of the Data state). This commit fixes this implementation error, making the above command produce the expected output of: StartTag(StartTag { self_closing: false, name: "script", attributes: {} }) String("<b>test</b>") EndTag(EndTag { name: "script" })
2021-12-05	introduce StartTag::next_state	Martin Fischer
	Closes #11.
2021-12-05	allow setting the Tokenizer to Data, PlainText, RcData, RawText and ↵	Martin Fischer
	ScriptData states
2021-12-05	prepare for introduction of public State enum	Martin Fischer

2021-12-03	fix new clippy	Markus Unterwaditzer

2021-11-28	clarify what html5gum isn't, fix #5	Markus Unterwaditzer

2021-11-27	fix crash in try_read_string	Markus Unterwaditzer

2021-11-27	split up match-arms and tokenizer to isolate some tokenizer-internal state	Markus Unterwaditzer
	purpose: don't want to expose self.to_reconsume to the consume() method
2021-11-26	Read html from io::BufRead (#8)	Markus Unterwaditzer

2021-11-26	clean up reader interface	Markus Unterwaditzer

2021-11-24	hello world	Markus Unterwaditzer