remove html5lib-tests

author: Markus Unterwaditzer <markus-honeypot@unterwaditzer.net> 2021-11-26 13:17:39 +0100
committer: Markus Unterwaditzer <markus-honeypot@unterwaditzer.net> 2021-11-26 13:17:39 +0100
commit: e1cdb4a6ac40aa562605990d58425978a5dc295b (patch)
tree: 48ee360700f05443364b95bbaedcfdb809ada6ea /tests/html5lib-tests/tokenizer/README.md
parent: bb1e11cc9421c3096d82c4fceb74bb8f0aa82201 (diff)
1 files changed, 0 insertions, 107 deletions
diff --git a/tests/html5lib-tests/tokenizer/README.md b/tests/html5lib-tests/tokenizer/README.md
deleted file mode 100644
index 66b81e8..0000000
--- a/tests/html5lib-tests/tokenizer/README.md
+++ /dev/null
@@ -1,107 +0,0 @@
-Tokenizer tests
-===============
-
-The test format is [JSON](http://www.json.org/). This has the advantage
-that the syntax allows backward-compatible extensions to the tests and
-the disadvantage that it is relatively verbose.
-
-Basic Structure
----------------
-
-    {"tests": [
-        {"description": "Test description",
-        "input": "input_string",
-        "output": [expected_output_tokens],
-        "initialStates": [initial_states],
-        "lastStartTag": last_start_tag,
-        "errors": [parse_errors]
-        }
-    ]}
-
-Multiple tests per file are allowed simply by adding more objects to the
-"tests" list.
-
-Each parse error is an object that contains error `code` and one-based
-error location indices: `line` and `col`.
-
-`description`, `input` and `output` are always present. The other values
-are optional.
-
-### Test set-up
-
-`test.input` is a string containing the characters to pass to the
-tokenizer. Specifically, it represents the characters of the **input
-stream**, and so implementations are expected to perform the processing
-described in the spec's **Preprocessing the input stream** section
-before feeding the result to the tokenizer.
-
-If `test.doubleEscaped` is present and `true`, then `test.input` is not
-quite as described above. Instead, it must first be subjected to another
-round of unescaping (i.e., in addition to any unescaping involved in the
-JSON import), and the result of *that* represents the characters of the
-input stream. Currently, the only unescaping required by this option is
-to convert each sequence of the form \\uHHHH (where H is a hex digit)
-into the corresponding Unicode code point. (Note that this option also
-affects the interpretation of `test.output`.)
-
-`test.initialStates` is a list of strings, each being the name of a
-tokenizer state which can be one of the following:
-
--   `Data state`
--   `PLAINTEXT state`
--   `RCDATA state`
--   `RAWTEXT state`
--   `Script data state`
--   `CDATA section state`
-
- The test should be run once for each string, using it
-to set the tokenizer's initial state for that run. If
-`test.initialStates` is omitted, it defaults to `["Data state"]`.
-
-`test.lastStartTag` is a lowercase string that should be used as "the
-tag name of the last start tag to have been emitted from this
-tokenizer", referenced in the spec's definition of **appropriate end tag
-token**. If it is omitted, it is treated as if "no start tag has been
-emitted from this tokenizer".
-
-### Test results
-
-`test.output` is a list of tokens, ordered with the first produced by
-the tokenizer the first (leftmost) in the list. The list must mach the
-**complete** list of tokens that the tokenizer should produce. Valid
-tokens are:
-
-    ["DOCTYPE", name, public_id, system_id, correctness]
-    ["StartTag", name, {attributes}*, true*]
-    ["StartTag", name, {attributes}]
-    ["EndTag", name]
-    ["Comment", data]
-    ["Character", data]
-
-`public_id` and `system_id` are either strings or `null`. `correctness`
-is either `true` or `false`; `true` corresponds to the force-quirks flag
-being false, and vice-versa.
-
-When the self-closing flag is set, the `StartTag` array has `true` as
-its fourth entry. When the flag is not set, the array has only three
-entries for backwards compatibility.
-
-All adjacent character tokens are coalesced into a single
-`["Character", data]` token.
-
-If `test.doubleEscaped` is present and `true`, then every string within
-`test.output` must be further unescaped (as described above) before
-comparing with the tokenizer's output.
-
-xmlViolation tests
-------------------
-
-`tokenizer/xmlViolation.test` differs from the above in a couple of
-ways:
-
--   The name of the single member of the top-level JSON object is
-    "xmlViolationTests" instead of "tests".
--   Each test's expected output assumes that implementation is applying
-    the tweaks given in the spec's "Coercing an HTML DOM into an
-    infoset" section.
-
author	Markus Unterwaditzer <markus-honeypot@unterwaditzer.net>	2021-11-26 13:17:39 +0100
committer	Markus Unterwaditzer <markus-honeypot@unterwaditzer.net>	2021-11-26 13:17:39 +0100
commit	e1cdb4a6ac40aa562605990d58425978a5dc295b (patch)
tree	48ee360700f05443364b95bbaedcfdb809ada6ea /tests/html5lib-tests/tokenizer/README.md
parent	bb1e11cc9421c3096d82c4fceb74bb8f0aa82201 (diff)