diff options
author | Markus Unterwaditzer <markus-honeypot@unterwaditzer.net> | 2021-11-26 13:17:39 +0100 |
---|---|---|
committer | Markus Unterwaditzer <markus-honeypot@unterwaditzer.net> | 2021-11-26 13:17:39 +0100 |
commit | e1cdb4a6ac40aa562605990d58425978a5dc295b (patch) | |
tree | 48ee360700f05443364b95bbaedcfdb809ada6ea /tests/html5lib-tests/tokenizer/README.md | |
parent | bb1e11cc9421c3096d82c4fceb74bb8f0aa82201 (diff) |
remove html5lib-tests
Diffstat (limited to 'tests/html5lib-tests/tokenizer/README.md')
-rw-r--r-- | tests/html5lib-tests/tokenizer/README.md | 107 |
1 files changed, 0 insertions, 107 deletions
diff --git a/tests/html5lib-tests/tokenizer/README.md b/tests/html5lib-tests/tokenizer/README.md deleted file mode 100644 index 66b81e8..0000000 --- a/tests/html5lib-tests/tokenizer/README.md +++ /dev/null @@ -1,107 +0,0 @@ -Tokenizer tests -=============== - -The test format is [JSON](http://www.json.org/). This has the advantage -that the syntax allows backward-compatible extensions to the tests and -the disadvantage that it is relatively verbose. - -Basic Structure ---------------- - - {"tests": [ - {"description": "Test description", - "input": "input_string", - "output": [expected_output_tokens], - "initialStates": [initial_states], - "lastStartTag": last_start_tag, - "errors": [parse_errors] - } - ]} - -Multiple tests per file are allowed simply by adding more objects to the -"tests" list. - -Each parse error is an object that contains error `code` and one-based -error location indices: `line` and `col`. - -`description`, `input` and `output` are always present. The other values -are optional. - -### Test set-up - -`test.input` is a string containing the characters to pass to the -tokenizer. Specifically, it represents the characters of the **input -stream**, and so implementations are expected to perform the processing -described in the spec's **Preprocessing the input stream** section -before feeding the result to the tokenizer. - -If `test.doubleEscaped` is present and `true`, then `test.input` is not -quite as described above. Instead, it must first be subjected to another -round of unescaping (i.e., in addition to any unescaping involved in the -JSON import), and the result of *that* represents the characters of the -input stream. Currently, the only unescaping required by this option is -to convert each sequence of the form \\uHHHH (where H is a hex digit) -into the corresponding Unicode code point. (Note that this option also -affects the interpretation of `test.output`.) - -`test.initialStates` is a list of strings, each being the name of a -tokenizer state which can be one of the following: - -- `Data state` -- `PLAINTEXT state` -- `RCDATA state` -- `RAWTEXT state` -- `Script data state` -- `CDATA section state` - - The test should be run once for each string, using it -to set the tokenizer's initial state for that run. If -`test.initialStates` is omitted, it defaults to `["Data state"]`. - -`test.lastStartTag` is a lowercase string that should be used as "the -tag name of the last start tag to have been emitted from this -tokenizer", referenced in the spec's definition of **appropriate end tag -token**. If it is omitted, it is treated as if "no start tag has been -emitted from this tokenizer". - -### Test results - -`test.output` is a list of tokens, ordered with the first produced by -the tokenizer the first (leftmost) in the list. The list must mach the -**complete** list of tokens that the tokenizer should produce. Valid -tokens are: - - ["DOCTYPE", name, public_id, system_id, correctness] - ["StartTag", name, {attributes}*, true*] - ["StartTag", name, {attributes}] - ["EndTag", name] - ["Comment", data] - ["Character", data] - -`public_id` and `system_id` are either strings or `null`. `correctness` -is either `true` or `false`; `true` corresponds to the force-quirks flag -being false, and vice-versa. - -When the self-closing flag is set, the `StartTag` array has `true` as -its fourth entry. When the flag is not set, the array has only three -entries for backwards compatibility. - -All adjacent character tokens are coalesced into a single -`["Character", data]` token. - -If `test.doubleEscaped` is present and `true`, then every string within -`test.output` must be further unescaped (as described above) before -comparing with the tokenizer's output. - -xmlViolation tests ------------------- - -`tokenizer/xmlViolation.test` differs from the above in a couple of -ways: - -- The name of the single member of the top-level JSON object is - "xmlViolationTests" instead of "tests". -- Each test's expected output assumes that implementation is applying - the tweaks given in the spec's "Coercing an HTML DOM into an - infoset" section. - |