diff options
author | Martin Fischer <martin@push-f.com> | 2021-11-29 17:06:28 +0100 |
---|---|---|
committer | Martin Fischer <martin@push-f.com> | 2021-11-30 11:22:35 +0100 |
commit | d7f353bd52de6d8f647f7dfa12fde10917266ada (patch) | |
tree | 946454e226537e4b206bbb18b6948c534bb092d2 | |
parent | 26a4b848cd83ed5fea3fb2b420d1295b784f449b (diff) |
docs: update README
-rw-r--r-- | README.md | 39 | ||||
-rw-r--r-- | src/lib.rs | 6 |
2 files changed, 22 insertions, 23 deletions
@@ -1,30 +1,33 @@ # html5tokenizer -This crate provides the tokenizer from [html5ever](https://crates.io/crates/html5ever), -repackaged with all of its dependencies removed. The following dependencies were removed: +This library is a fork of the tokenizer from [html5ever] with the following +changes: -* [markup5ever](https://crates.io/crates/markup5ever) - `buffer_queue`, `smallcharset` and the entity data were merged into the source code +* The dependencies on [markup5ever], [tendril], [mac] and [log] were removed. + This spares you about 40 build dependencies and the `unsafe` code from Tendril. -* [tendril](https://crates.io/crates/tendril) - According to its README it contains "a substantial amount of unsafe code". - This fork replaces the tendril strings with plain old `std::string::String`s. +* The dependency on [phf] was made optional: if you don't need to resolve named + entities like `&`, you can disable the `named-entities` feature, in which + case this library does not have any dependencies (other than the standard + library). -* [mac](https://crates.io/crates/mac) - The only macros actually needed (`format_if` and `test_eq`) were merged into - the source code. +* This library takes care of appropriately switching tokenizer states based on + tag names (e.g. for `script` and `styles`) ... with the html5ever tokenizer + you had to do this yourself. -* [log](https://crates.io/crates/log) - Was only used for debug output. +* The API has been cleaned up a bit (e.g. the internal tokenizer state enums + are no longer public). If you want to parse HTML into a tree (DOM) you should by all means use html5ever, this crate is merely for those who only want an HTML5 tokenizer and -seek to minimize their compile dependencies (html5ever pulls in 56). - -To efficiently resolve named entities like `&` the tokenizer uses -[phf](https://crates.io/crates/phf) for a compile-time static map. If you -don't need to resolve named entities, you can avoid the `phf` dependency -by disabling the `named-entities` feature (which is enabled by default). +seek to minimize their build dependencies (html5ever pulls in 56). + +[html5ever]: https://crates.io/crates/html5ever +[markup5ever]: https://crates.io/crates/markup5ever +[tendril]: https://crates.io/crates/tendril +[mac]: https://crates.io/crates/mac +[log]: https://crates.io/crates/log +[phf]: https://crates.io/crates/phf ## Credits @@ -7,11 +7,7 @@ // option. This file may not be copied, modified, or distributed // except according to those terms. -/*! -The HTML5 tokenizer from the [html5ever](https://crates.io/crates/html5ever) -crate, repackaged with its dependencies removed. -*/ - +#![doc = include_str!("../README.md")] #![crate_type = "dylib"] #![cfg_attr(test, deny(warnings))] #![allow(unused_parens)] |