URI Resource Pages
URI Generic Syntax
RFC 3986, "Uniform
Resource Identifier (URI): Generic Syntax", defines a grammar
pertaining to URIs in general, leaving scheme-specific details
to other specifications. It defines elements of the URI syntax
common to all (or many) URI schemes, for scheme-independent
parsing of URIs, allowing individual-scheme specs to focus on scheme-specific
details.
This is a very brief overview; for details consult the RFC.
URI Generic Components
A URI includes:
- A scheme name, that
refers to a specification for assigning identifiers within
that scheme.
- An optional authority component. This
is the naming authority, who is delegated responsibility for
governance of the name space defined by the remainder of the
URI.
- An optional path, a hierarchy of components which
generally occur in order of decreasing significance. For some
URI schemes,
the hierarchy
is explicit and visible to generic parsing
algorithms; for other schemes, hierarchy is visible only within the context
of the scheme
itself and is opaque to generic URI processing.
- An optional query.
- an optional fragment identifier.
The path together with the query identify a resource within the
scope of the URI's scheme and naming authority. (The fragment identifier,
if present, identifies a portion of the resource.)
Basic Structure
These components are put together as follows:
- Every URI begins with the scheme name.
- It is followed by a colon: ":",
- which is followed by double slash: "//" -- if and
only if the authority component (next) is present.
- The double slash if present is followed by
the authority
component (implicitly
terminated by "/", "?",
or "#", or by the end of the URI).
- The authority component if present (or if not, the
colon after the scheme name) is followed by the path; components
within the path
are separated
by "/".
- A question mark: "?",
follows the path, if and only if the query (next) is present.
- query.
- A pound sign: "#",
follows the query, if and only if the fragment id (next) is
present.
- fragment id.
So a whole URI (with all the parts, and with, say, two path components)
looks like this:
scheme://authority/path-component-1/path-component-2?query#fragment-id
|
For example, in the following URI:
http://srw.cheshire3.org/l5r?operation=searchRetrieve&version=1.1&query=sword&maximumRecords=10 |
- 'http" is the scheme name;
- "srw.cheshire3.org" is the authority component;
- ''l5r" is the path;
- "operation=searchRetrieve&version=1.1&query=sword&maximumRecords=10"
is the query. (See note below.)
- There is no fragment identifier.
Reserved Characters
A number of characters are “reserved” for
use as delimiters, most importantly, the slash ("/"), question mark
("?"), and number sign ("#"), as seen in the
above example.
In addition there are characters reserved as sub-delimiters,
which delimit sub-components. The query component
above (see note below), which follows '?' is itself composed of
sub-components which we informally refer to as query parameters. These
all take the form:
- parameter name,
- followed by "=",
- parameter value,
- followed by "&" (however the "&" is
omitted following the last parameter).
Thus "=" and "&" are also reserved
characters. Percent ("%") is also reserved, because
it is used for percent encoding (described below).
Note: In the example there is a parameter named "query",
within the query component. The parameter 'query' should
not be confused with the query component.
Escaping Reserved Characters
An escape mechanism
is used to represent a character within the reserved set when it
occurs within the URI not in a
reserved role. The character is encoded as a character
triplet: the
percent character "%" followed
by the two hex digits representing that character's numeric value.
For example, "%20" is
the percent-encoding for binary "00100000", corresponding
(in US-ASCII) to the space character.
Character Set/Encoding
A URI is a sequence of characters from a limited set: the letters of the basic
Latin alphabet, digits,
and a few special characters. Characters outside of this set may be percent encoded
as
described above. In general, no particular encoding is prescribed for URI characters;
a
URI is assumed to
be
in
the
same character
encoding
as the surrounding text.
|