aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-08-19refactor: decouple html5lib_tests from html5tokenizerMartin Fischer
Previously we mapped the test tokens to our own token type. Now we do the reverse, which makes more sense as it enables us to easily add more detailed fields to our own token variants without having to worry about these fields not being present in the html5lib test data. (An alternative would be to normalize the values of these fields to some arbitrary value so that PartialEq still holds but seeing such normalized fields in the diff printed by pretty_assertions on a test failure would be quite confusing).
2023-08-19chore(html5lib_tests): simplify control flowMartin Fischer
2023-08-19refactor: split off reusable html5lib_tests crateMartin Fischer
2023-08-19refactor: separate test logic from html5lib-test parsingMartin Fischer
2023-08-19break!: privatize PosTrackingReader fieldsMartin Fischer
2023-08-19break!: rename PosTracker to PosTrackingReaderMartin Fischer
2023-08-19break!: remove Never in favor of std::convert::InfallibleMartin Fischer
This change is a backport of 04e6cbe[1] from html5gum. [1]: https://github.com/untitaker/html5gum/commit/04e6cbe44bb7a388bd61d1c9cfe4c618eb3b0e29
2023-08-19break!: remove InfallibleTokenizer in favor of Iterator::flattenMartin Fischer
2023-08-19docs: remove Tokenizer::new examples from Reader docsMartin Fischer
2023-08-19break!: rename Readable to IntoReaderMartin Fischer
The trait of the standard library is also called IntoIterator and not Iterable.
2023-08-19fix(docs): remove outdated list of Readable implsMartin Fischer
dced8066f77f570dd3e396ec3570c71aa86c454e introduced a Readable impl for std::io::BufReader. Manually listing impls in a doc comment is a bad idea since such lists will just get out of date and there's no need for that since rustdoc automatically lists all implementations on the trait page.
2023-08-19fix(docs): fix Error variant doc saying `$literal`Martin Fischer
2023-08-19fix(docs): Span is a byte range (not character range)Martin Fischer
2023-08-19fix(docs): StartTag is a start tagMartin Fischer
2023-08-19fix(docs): Error::EndTagWithAttributes should be emitted by emit_current_tagMartin Fischer
2023-08-19test: enable previously skipped tokenizer testMartin Fischer
2023-08-19break!: remove StartTag::next_stateMartin Fischer
You shouldn't manually have to match tokens yielded by the tokenizer iterator just to correctly handle state transitions. A better NaiveParser API will be introduced.
2023-08-19break!: remove set_last_start_tag from EmitterMartin Fischer
2023-08-19refactor: move html5lib test to own crate to fix `cargo test`Martin Fischer
Previously `cargo test` failed because it ran the test_html5lib integration test, which depends on the integration-tests feature (so you always had to run `cargo test` with `--features integration-tests` or `--all-features`, which was annoying). This commit moves the integration tests to another crate, so that the dependency on the feature can be properly defined in a way so that `cargo test` just works and runs the test.
2023-08-19chore: drop test-generator dev-dependencyMartin Fischer
I want to move the test_html5lib integration test to a separate crate so that it can properly depend on the integration-tests feature in a way so that `cargo test` just works and runs the integration test. (Currently `cargo test` fails since test_html5lib depends on that feature.) However test_html5lib currently depends on the test-generator crate and test-generator doesn't support Cargo workspaces[1] and appears to be unmaintained. This commit therefore drops the test-generator dev-dependency. [1]: https://github.com/frehberg/test-generator/issues/6
2021-12-05rename to html5tokenizer, bump versionv0.4.0Martin Fischer
2021-12-05spans: get rid of code duplication by introducing Span traitMartin Fischer
2021-12-05spans: refactor to avoid one clone()Martin Fischer
2021-12-05rename internal emit_error to push_error (to avoid confusion with trait method)Martin Fischer
2021-12-05improve duplicate attribute spanMartin Fischer
2021-12-05refactor: match btree_map::Entry instead of using and_modify closureMartin Fischer
2021-12-05spans: slightly refactor DefaultEmitterMartin Fischer
2021-12-05spans: add spans to Token::ErrorMartin Fischer
2021-12-05spans: fix spans for quoted attribute valuesMartin Fischer
2021-12-05spans: support attribute valuesMartin Fischer
2021-12-05spans: support attribute namesMartin Fischer
2021-12-05spans: add span testsMartin Fischer
2021-12-05spans: start implementing SpanEmitterMartin Fischer
2021-12-05spans: introduce PosTrackerMartin Fischer
2021-12-05spans: introduce GetPos traitMartin Fischer
2021-12-05spans: rename to SpanEmitter, adjust genericsMartin Fischer
2021-12-05spans: copy DefaultEmitter to new span moduleMartin Fischer
2021-12-05spans: make Emitter generic over ReaderMartin Fischer
2021-12-05spans: make Emitter generic over SpanMartin Fischer
2021-12-05fix wrong state transition in ScriptDataLessThanSign stateMartin Fischer
Before the following happened: % printf '<script><b>test</b></script>' | cargo run --example=switch-state StartTag(StartTag { self_closing: false, name: "script", attributes: {} }) String("<b>test") EndTag(EndTag { name: "b" }) EndTag(EndTag { name: "script" }) Which is obviously wrong. After a <script> tag we want to switch to the ScriptData state (instead of the Data state). This commit fixes this implementation error, making the above command produce the expected output of: StartTag(StartTag { self_closing: false, name: "script", attributes: {} }) String("<b>test</b>") EndTag(EndTag { name: "script" })
2021-12-05introduce StartTag::next_stateMartin Fischer
Closes #11.
2021-12-05allow setting the Tokenizer to Data, PlainText, RcData, RawText and ↵Martin Fischer
ScriptData states
2021-12-05prepare for introduction of public State enumMartin Fischer
2021-12-03fix new clippyMarkus Unterwaditzer
2021-12-03Fix typo and add example (#9)Martin Fischer
2021-11-28version 0.2.1html5gum-0.2.1Markus Unterwaditzer
2021-11-28update wordingMarkus Unterwaditzer
2021-11-28restructure readmeMarkus Unterwaditzer
2021-11-28add another exampleMarkus Unterwaditzer
2021-11-28update wording againMarkus Unterwaditzer