Age | Commit message (Collapse) | Author |
|
|
|
|
|
* Extract methods to copy various elements of one URI from another.
* Push NormalizePath implementation into a private method.
* Simplify and consolidate checks for absolute paths.
* Extract methods out of individual steps of ParseFromString.
|
|
Add comments that link parts of the code back
to lines of the pseudocode in the RFC,
to make the code easier to understand.
|
|
The former algorithm was based on the pseuocode
from the RFC, which is hard to follow, more suitable
when the path is in a single string, not a sequence
of segments.
The new algorithm uses two flags:
* isAbsolute - recognize that if the path starts out
as an absolute path, it needs to stay that way.
* atDirectoryLevel - recognize that if we encounter
a "." or "..", then it will be reduced by simply
discarding it or going back/up one stop, but then
we will be in a "directory" context, meaning that
should we end the path at this point, there needs
to be an empty-string segment to mark that the
end of the path is reaching into a directory, not
just referring to the directory.
|
|
Path normalization is hideously broken for now.
|
|
|
|
Such a URI should be considered equivalent to a path of "/"
because in both cases the path is an absolute path.
|
|
For normalization "step 2C", if the output path was
empty, we don't want to pop the end of it off.
|
|
* Code the neat example in section 6.2.2 of the RFC.
* Add equality/inequality operators for Uri.
|
|
|
|
Extract methods that parse the query and fragment.
|
|
* Replaced the more formal "state machine" used in URI
elements that may have percent-encoded characters, with
a simpler loop with a flag and a few conditional logic
paths.
* Extracted the parsing of the above types of elements into
a common method, DecodeElement.
* Kept DecodeQueryOrFragment around, in order to prevent
having to repeat the name of the allowed character set which
is common between query and fragment; however the function
is now just a very thin wrapper.
|
|
* Remove IsCharacterInSet function
|
|
|
|
|
|
|
|
Added CharacterSet as a class to represent character sets,
allowing us to build singletons and composite character sets
more concisely.
|
|
* Extract IsCharacterInSet to its own module.
* Extract PercentEncodedCharacterDecoder to its own module.
|
|
Remove state 3 hole in host/port parsing state machine
|
|
Extract percent-encoded character decoding, so that
the logic is all in one class that is reused.
|
|
|
|
|
|
|
|
Path may also have colon, so make sure we don't scan
into the path element if there is one.
|
|
* Detect bad characters in host names.
* Incorporate splitting host and port into the state
machine that is parsing/decoding the host.
NOTE:
IPv6address is not checked for bad characters yet.
More research is needed to learn exactly what are
the various ways to write an IPv6 address.
|
|
A colon may be in the authority, if present, so limit
the search for scheme delimiter so we aren't scanning
the authority part, when parsing the scheme.
|
|
|
|
Extracted IsCharacterInSet function
|
|
|
|
Extract method ParseAuthority
|
|
Extract method that parses the path segments from
the whole path string.
|
|
* Extract function that parses 16-bit unsigned integers,
to use in parsing port element.
* Clean up and clarify what parts of the original URI
string are still being held onto at various points
in the code.
|
|
|
|
* Add IsRelativeReference.
* Add IsRelativePath.
* Add Query.
* Add Fragment.
* Add UserInfo.
* Fix parsing of URIs that have no scheme.
|
|
|
|
* Parts of a path are called "segments", not "steps",
in the RFC.
* The RFC specifies that path separators are always
forward slashes, so don't support other separators.
|
|
* Can now parse URIs from strings.
* This supports scheme, host, and path.
* Path separator defaults to "/" but may be customized.
|
|
|