aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorMartin Fischer <martin@push-f.com>2021-11-29 17:06:28 +0100
committerMartin Fischer <martin@push-f.com>2021-11-30 11:22:35 +0100
commitd7f353bd52de6d8f647f7dfa12fde10917266ada (patch)
tree946454e226537e4b206bbb18b6948c534bb092d2
parent26a4b848cd83ed5fea3fb2b420d1295b784f449b (diff)
docs: update README
-rw-r--r--README.md39
-rw-r--r--src/lib.rs6
2 files changed, 22 insertions, 23 deletions
diff --git a/README.md b/README.md
index 193b43d..91265b0 100644
--- a/README.md
+++ b/README.md
@@ -1,30 +1,33 @@
# html5tokenizer
-This crate provides the tokenizer from [html5ever](https://crates.io/crates/html5ever),
-repackaged with all of its dependencies removed. The following dependencies were removed:
+This library is a fork of the tokenizer from [html5ever] with the following
+changes:
-* [markup5ever](https://crates.io/crates/markup5ever)
- `buffer_queue`, `smallcharset` and the entity data were merged into the source code
+* The dependencies on [markup5ever], [tendril], [mac] and [log] were removed.
+ This spares you about 40 build dependencies and the `unsafe` code from Tendril.
-* [tendril](https://crates.io/crates/tendril)
- According to its README it contains "a substantial amount of unsafe code".
- This fork replaces the tendril strings with plain old `std::string::String`s.
+* The dependency on [phf] was made optional: if you don't need to resolve named
+ entities like `&amp;`, you can disable the `named-entities` feature, in which
+ case this library does not have any dependencies (other than the standard
+ library).
-* [mac](https://crates.io/crates/mac)
- The only macros actually needed (`format_if` and `test_eq`) were merged into
- the source code.
+* This library takes care of appropriately switching tokenizer states based on
+ tag names (e.g. for `script` and `styles`) ... with the html5ever tokenizer
+ you had to do this yourself.
-* [log](https://crates.io/crates/log)
- Was only used for debug output.
+* The API has been cleaned up a bit (e.g. the internal tokenizer state enums
+ are no longer public).
If you want to parse HTML into a tree (DOM) you should by all means use
html5ever, this crate is merely for those who only want an HTML5 tokenizer and
-seek to minimize their compile dependencies (html5ever pulls in 56).
-
-To efficiently resolve named entities like `&amp;` the tokenizer uses
-[phf](https://crates.io/crates/phf) for a compile-time static map. If you
-don't need to resolve named entities, you can avoid the `phf` dependency
-by disabling the `named-entities` feature (which is enabled by default).
+seek to minimize their build dependencies (html5ever pulls in 56).
+
+[html5ever]: https://crates.io/crates/html5ever
+[markup5ever]: https://crates.io/crates/markup5ever
+[tendril]: https://crates.io/crates/tendril
+[mac]: https://crates.io/crates/mac
+[log]: https://crates.io/crates/log
+[phf]: https://crates.io/crates/phf
## Credits
diff --git a/src/lib.rs b/src/lib.rs
index 69557b9..57b7b05 100644
--- a/src/lib.rs
+++ b/src/lib.rs
@@ -7,11 +7,7 @@
// option. This file may not be copied, modified, or distributed
// except according to those terms.
-/*!
-The HTML5 tokenizer from the [html5ever](https://crates.io/crates/html5ever)
-crate, repackaged with its dependencies removed.
-*/
-
+#![doc = include_str!("../README.md")]
#![crate_type = "dylib"]
#![cfg_attr(test, deny(warnings))]
#![allow(unused_parens)]