Skip Navigation Links  The Library of Congress >> Standards
IFLA/CDNL Alliance for Bibliographic Standards - Library of Congress
  URI Resource Pages Home >>

URI Resource Pages

URI Generic Syntax

RFC 3986, "Uniform Resource Identifier (URI): Generic Syntax", defines a grammar pertaining to URIs in general, leaving scheme-specific details to other specifications. It defines elements of the URI syntax common to all (or many) URI schemes, for scheme-independent parsing of URIs, allowing individual-scheme specs to focus on scheme-specific details.

This is a very brief overview; for details consult the RFC.

URI Generic Components

A URI includes:

  • A scheme name, that refers to a specification for assigning identifiers within that scheme.
  • An optional authority component. This is the naming authority, who is delegated responsibility for governance of the name space defined by the remainder of the URI.
  • An optional path, a hierarchy of components which generally occur in order of decreasing significance. For some URI schemes, the hierarchy is explicit and visible to generic parsing algorithms; for other schemes, hierarchy is visible only within the context of the scheme itself and is opaque to generic URI processing.
  • An optional query.
  • an optional fragment identifier.

The path together with the query identify a resource within the scope of the URI's scheme and naming authority. (The fragment identifier, if present, identifies a portion of the resource.)

Basic Structure

These components are put together as follows:  

  • Every URI begins with the scheme name.
  • It is followed by a colon: ":",
  • which is followed by double slash: "//" -- if and only if the authority component (next) is present.
  • The double slash if present is followed by the authority component (implicitly terminated by "/", "?", or "#", or by the end of the URI).
  • The authority component if present (or if not, the colon after the scheme name) is followed by the path; components within the path are separated by "/".
  • A question mark: "?", follows the path, if and only if the query (next) is present.
  • query.
  • A pound sign: "#", follows the query, if and only if the fragment id (next) is present.
  • fragment id.

So a whole URI (with all the parts, and with, say, two path components) looks like this:

scheme://authority/path-component-1/path-component-2?query#fragment-id

For example, in the following URI:

http://srw.cheshire3.org/l5r?operation=searchRetrieve&version=1.1&query=sword&maximumRecords=10
  • 'http" is the scheme name;
  • "srw.cheshire3.org" is the authority component;
  • ''l5r"  is the path;
  • "operation=searchRetrieve&version=1.1&query=sword&maximumRecords=10" is the query. (See note below.)
  • There is no fragment identifier.

Reserved Characters
A number of characters are “reserved” for use as delimiters, most importantly, the slash ("/"), question mark ("?"), and number sign ("#"), as seen in the above example. 

In addition there are characters reserved as sub-delimiters, which delimit sub-components.  The query component above (see note below), which follows '?' is itself composed of sub-components which we informally refer to as query parameters.  These all take the form:

  • parameter name,
  • followed by "=",
  • parameter value,
  • followed by "&" (however the "&" is omitted following the last parameter).

Thus  "=" and "&" are also reserved characters.  Percent ("%") is also reserved, because it is used for percent encoding (described below).

Note: In the example there is a parameter named "query", within the query component. The parameter 'query' should not be confused with the query component.

Escaping Reserved Characters
An escape mechanism is used to represent a character within the reserved set when it occurs within the URI not in a reserved role. The character is encoded as a character triplet: the percent character "%" followed by the two hex digits representing that character's numeric value. For example, "%20" is the percent-encoding for binary "00100000", corresponding (in US-ASCII) to the space character.

Character Set/Encoding
A URI is a sequence of characters from a limited set: the letters of the basic Latin alphabet, digits,
and a few special characters. Characters outside of this set may be percent encoded as described above. In general, no particular encoding is prescribed for URI characters; a URI is assumed to be in the same character encoding as the surrounding text.

 

Top of Page Top of Page
  URI Resource Pages Home >>
  The Library of Congress >> Standards
  April 13, 2007
Contact Us