summaryrefslogtreecommitdiff
path: root/README.md
blob: 193b43d5d56505031cf774ab70a292f49751d926 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
# html5tokenizer

This crate provides the tokenizer from [html5ever](https://crates.io/crates/html5ever),
repackaged with all of its dependencies removed. The following dependencies were removed:

* [markup5ever](https://crates.io/crates/markup5ever)  
  `buffer_queue`, `smallcharset` and the entity data were merged into the source code

* [tendril](https://crates.io/crates/tendril)  
  According to its README it contains "a substantial amount of unsafe code".
  This fork replaces the tendril strings with plain old `std::string::String`s.

* [mac](https://crates.io/crates/mac)  
  The only macros actually needed (`format_if` and `test_eq`) were merged into
  the source code.

* [log](https://crates.io/crates/log)  
  Was only used for debug output.

If you want to parse HTML into a tree (DOM) you should by all means use
html5ever, this crate is merely for those who only want an HTML5 tokenizer and
seek to minimize their compile dependencies (html5ever pulls in 56).

To efficiently resolve named entities like `&` the tokenizer uses
[phf](https://crates.io/crates/phf) for a compile-time static map. If you
don't need to resolve named entities, you can avoid the `phf` dependency
by disabling the `named-entities` feature (which is enabled by default).

## Credits

Thanks to the developers of html5ever for their awesome parser!