diff options
Diffstat (limited to 'README.md')
-rw-r--r-- | README.md | 39 |
1 files changed, 21 insertions, 18 deletions
@@ -1,30 +1,33 @@ # html5tokenizer -This crate provides the tokenizer from [html5ever](https://crates.io/crates/html5ever), -repackaged with all of its dependencies removed. The following dependencies were removed: +This library is a fork of the tokenizer from [html5ever] with the following +changes: -* [markup5ever](https://crates.io/crates/markup5ever) - `buffer_queue`, `smallcharset` and the entity data were merged into the source code +* The dependencies on [markup5ever], [tendril], [mac] and [log] were removed. + This spares you about 40 build dependencies and the `unsafe` code from Tendril. -* [tendril](https://crates.io/crates/tendril) - According to its README it contains "a substantial amount of unsafe code". - This fork replaces the tendril strings with plain old `std::string::String`s. +* The dependency on [phf] was made optional: if you don't need to resolve named + entities like `&`, you can disable the `named-entities` feature, in which + case this library does not have any dependencies (other than the standard + library). -* [mac](https://crates.io/crates/mac) - The only macros actually needed (`format_if` and `test_eq`) were merged into - the source code. +* This library takes care of appropriately switching tokenizer states based on + tag names (e.g. for `script` and `styles`) ... with the html5ever tokenizer + you had to do this yourself. -* [log](https://crates.io/crates/log) - Was only used for debug output. +* The API has been cleaned up a bit (e.g. the internal tokenizer state enums + are no longer public). If you want to parse HTML into a tree (DOM) you should by all means use html5ever, this crate is merely for those who only want an HTML5 tokenizer and -seek to minimize their compile dependencies (html5ever pulls in 56). - -To efficiently resolve named entities like `&` the tokenizer uses -[phf](https://crates.io/crates/phf) for a compile-time static map. If you -don't need to resolve named entities, you can avoid the `phf` dependency -by disabling the `named-entities` feature (which is enabled by default). +seek to minimize their build dependencies (html5ever pulls in 56). + +[html5ever]: https://crates.io/crates/html5ever +[markup5ever]: https://crates.io/crates/markup5ever +[tendril]: https://crates.io/crates/tendril +[mac]: https://crates.io/crates/mac +[log]: https://crates.io/crates/log +[phf]: https://crates.io/crates/phf ## Credits |