blob: 91265b0d4c06d6886ff1efbebb7449429f5b72d0 (
plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
|
# html5tokenizer
This library is a fork of the tokenizer from [html5ever] with the following
changes:
* The dependencies on [markup5ever], [tendril], [mac] and [log] were removed.
This spares you about 40 build dependencies and the `unsafe` code from Tendril.
* The dependency on [phf] was made optional: if you don't need to resolve named
entities like `&`, you can disable the `named-entities` feature, in which
case this library does not have any dependencies (other than the standard
library).
* This library takes care of appropriately switching tokenizer states based on
tag names (e.g. for `script` and `styles`) ... with the html5ever tokenizer
you had to do this yourself.
* The API has been cleaned up a bit (e.g. the internal tokenizer state enums
are no longer public).
If you want to parse HTML into a tree (DOM) you should by all means use
html5ever, this crate is merely for those who only want an HTML5 tokenizer and
seek to minimize their build dependencies (html5ever pulls in 56).
[html5ever]: https://crates.io/crates/html5ever
[markup5ever]: https://crates.io/crates/markup5ever
[tendril]: https://crates.io/crates/tendril
[mac]: https://crates.io/crates/mac
[log]: https://crates.io/crates/log
[phf]: https://crates.io/crates/phf
## Credits
Thanks to the developers of html5ever for their awesome parser!
|