Organization of the Standard
Numbered Sections
The standard is organized in a hierarchy of data elements and compound elements that define the information content for metadata to document a set of digital geospatial data. The starting point is "metadata" (section 0). The compound element "metadata" is composed of other compound elements representing different concepts about the data set. Each of these compound elements has a numbered section in the standard. In each numbered section, these compound elements are defined by other compound elements and data elements. The section "contact information" is a special section that specifies the data elements for contacting individuals and organizations. This section is used by other sections, and is defined once for convenience.
Each section begins with the name and definition of the compound element that defines the section. The name and definition are followed by production rules (see below) that define this compound element in terms of data elements, either directly or by the use of intermediate compound elements. When intermediate compound elements are used, the production rules for these elements also are provided in this part of the section.
Additional information about the organization of the Standard follows:
- The production rules are followed by a list of names and definitions of compound elements and data elements used in the section.
- Section and element numbers are provided for user navigation of the standard. They are neither authoritative nor intended for use in implementation and are subject to change in future revisions of the standard.
Compound Elements
A compound element is a group of data elements and other compound elements. All compound elements are described by data elements, either directly or through intermediate compound elements. Compound elements represent higher-level concepts that cannot be represented by individual data elements. The form for the definition of compound elements is:
- Compound element name -- definition.
- Type: compound
Short Name:
The type of "compound" uniquely identifies the compound elements in the lists of terms and definitions.
Short names consisting of eight alphabetic characters or less are included to assist in implementation of the standard.
Data Elements
A data element is a logically primitive item of data. The entry for a data element includes the name of the data element, the definition of the data element, a description of the values that can be assigned to the data element, and a short name for the data element. The form for the definition of the data elements is:
- Data element name -- definition.
- Type:
Domain:
Short Name:
The information about the values for the data elements include a description of the type of the value, and a description of the domain of the valid values. The type of the data element describes the kind of value to be provided. The choices are "integer" for integer numbers, "real" for real numbers, "text" for ASCII characters, "date" for day of the year, and "time" for time of the day.
The domain describes valid values that can be assigned to the data element. The domain may specify a list of valid values, references to lists of valid values, or restrictions on the range of values that can be assigned to a data element.
The domain also may note that the domain is free from restrictions, and any values that can be represented by the "type" of the data element can be assigned. These unrestricted domains are represented by the use of the word "free" followed by the type of the data element (that is, free text, free date, free real, free time, free integer). Some domains can be partly, but not completely, specified. For example, there are several widely used data transfer formats, but there may be many more that are less well known. To allow a producer to describe its data in these circumstances, the convention of providing a list of values followed by the designation of a "free" domain was used. In these cases, assignments of values shall be made from the provided domain when possible. When not possible, providers may create and assign their own value. A created value shall not redefine a value provided by the standard.
Short names consisting of eight alphabetic characters or less are included to assist in user implementation of the standard.
Another issue is the representation of null values (representing such concepts as "unknown") in the domain. While this is relatively simple for textual entries (one would enter the text "Unknown"), it is not as simple for the integer, real, date, and time types. (For example, which integer value means "unknown"?). Because conventions for providing this information vary among implementations, the standard specifies what concepts shall be represented, but does not mandate a means for representing them.
In addition to the values to be represented, the form of representation also is important, especially to applications that will manipulate the data elements. The following conventions for forms of values for data elements shall be used:
Calendar Dates (Years, Months, and Days)
- A.D. Era to December 31, 9999 A.D. -- Values for day and month of year, and for years, shall follow the calendar date convention (general forms of YYYY for years; YYYYMM for month of a year (with month being expressed as an integer), and YYYYMMDD for a day of the year) specified in American National Standards Institute, 1986, Representation for calendar date and ordinal date for information interchange (ANSI X3.30-1985): New York, American National Standards Institute (adopted as Federal Information Processing Standard 4-1).
- B.C. Era to 9999 B.C. -- Values for day and month of year, and for years, shall follow the calendar date convention, preceded by the lower case letters "bc" (general forms of bcYYYY for years; bcYYYYMM for month of a year (with month being expressed as an integer), and bcYYYYMMDD for a day of the year).
- B.C. Era before 9999 B.C. -- Values for the year shall consist of as many numeric characters as needed to represent the number of the year B.C., preceded by lower case letters "cc" (general form of ccYYYYYYY...).
- A.D. Era after 9999 A.D. -- Values for the year shall consist of as many numeric characters as needed to represent number of the year A.D., preceded by the lower case letters "cd" (general form of cdYYYYYYY...).
Time of Day (Hours, Minutes, and Seconds)
Because some geospatial data and related applications are sensitive to time of day information, three conventions are permitted. Only one convention shall be used for metadata for a data set. The conventions are:
- Local Time. For producers who wish to record time in local time, values shall follow the 24- hour timekeeping system for local time of day in the hours, minutes, seconds, and decimal fractions of a second (to the precision desired) without separators convention (general form of HHMMSSSS) specified in American National Standards Institute, 1986, Representations of local time of day for information interchange (ANSI X3.43-1986): New York, American National Standards Institute.
- Local Time with Time Differential Factor. For producers who wish to record time in local time and the relationship to Universal Time (Greenwich Mean Time), values shall follow the 24-hour timekeeping system for local time of day in hours, minutes, seconds, and decimal fractions of a second (to the resolution desired) without separators convention. This value shall be followed, without separators, by the time differential factor. The time differential factor expresses the difference in hours and minutes between local time and Universal Time. It is represented by a four-digit number preceded by a plus sign (+) or minus sign (-), indicating hours and minutes local time is ahead of or behind Universal Time, respectively. The general form is HHMMSSSSshhmm, where HHMMSSSS is the local time using 24-hour timekeeping (expressed to the precision desired), 's' is the plus or minus sign for the time differential factor, and hhmm is the time differential factor. (This option allows producers to record local time and time zone information. For example, Eastern Standard Time has a time differential factor of -0500, Central Standard Time has a time differential factor of -0600, Eastern Daylight Time has a time differential factor of -0400, and Central Daylight Time has a time differential factor of -0500.) This option is specified in American National Standards Institute, 1975, Representations of universal time, local time differentials, and United States time zone reference for information interchange (ANSI X3.51-1975): New York, American National Standards Institute.
- Universal Time (Greenwich Mean Time). For producers who wish to record time in Universal Time (Greenwich Mean Time), values shall follow the 24-hour timekeeping system for Universal Time of day in hours, minutes, seconds, and decimal fractions of a second (expressed to the precision desired) without separators convention, with the upper case letter "Z" directly following the low-order (or extreme right hand) time element of the 24-hour clock time expression. The general form is HHMMSSSSZ, where HHMMSSSS is Universal Time using 24-hour timekeeping, and Z is the letter "Z". This option is specified in American National Standards Institute, 1975, Representations of universal time, local time differentials, and United States time zone reference for information interchange (ANSI X3.51-1975): New York, American National Standards Institute.
Latitude and Longitude
Values for latitude and longitude shall be expressed as decimal fractions of degrees. Whole degrees of latitude shall be represented by a two-digit decimal number ranging from 0 through 90. Whole degrees of longitude shall be represented by a three-digit decimal number ranging from 0 through 180. When a decimal fraction of a degree is specified, it shall be separated from the whole number of degrees by a decimal point. Decimal fractions of a degree may be expressed to the precision desired.
- Latitudes north of the equator shall be specified by a plus sign (+), or by the absence of a minus sign (-), preceding the two digits designating degrees. Latitudes south of the Equator shall be designated by a minus sign (-) preceding the two digits designating degrees. A point on the Equator shall be assigned to the Northern Hemisphere.
- Longitudes east of the prime meridian shall be specified by a plus sign (+), or by the absence of a minus sign (-), preceding the three digits designating degrees of longitude. Longitudes west of the meridian shall be designated by minus sign (-) preceding the three digits designating degrees. A point on the prime meridian shall be assigned to the Eastern Hemisphere. A point on the 180th meridian shall be assigned to the Western Hemisphere. One exception to this last convention is permitted. For the special condition of describing a band of latitude around the earth, the East Bounding Coordinate data element shall be assigned the value +180 (180) degrees.
- Any spatial address with a latitude of +90 (90) or -90 degrees will specify the position at the North or South Pole, respectively. The component for longitude may have any legal value.
With the exception of the special condition described above, this form is specified in American National Standards Institute, 1986, Representations of Geographic Point Locations for Information Interchange (ANSI X3.61-1986): New York, American National Standards Institute.
Network Addresses and File Names
Values for file names, network addresses for computer systems, and related services should follow the Uniform Resource Locator convention of the Internet when possible. See http://www.ncsa.uiuc.edu/demoweb/url-primer.html for additional details about the Uniform Resource Locator.
Optionality
The standard categorizes elements as being mandatory, mandatory-if-applicable, or optional as follows:
- Mandatory elements must be provided.
- Mandatory-if-applicable elements must be provided if the data set exhibits the defined characteristic.
- Optional elements are provided at the discretion of the metadata producer.
The optionality of a section or compound element always takes precedence over the elements that it contains. Once a section or compound element is recognized by the data set producer as applicable, then the optionality of its subordinate elements is to be interpreted. See Production Rules section for additional interpretive guidance.
Mandatory sections in the standard have some elements that are always required for all types of geospatial data sets. For comparison with other metadata standards, these elements are referred to as "core" elements.
Production Rules
A production rule specifies the relationship between a compound element, and data elements and other (lower-level) compound elements. Each production rule has a left side (identifier) and a right side (expression) connected by the symbol "=", meaning that the term on the left side is replaced by or produces the term on the right side. Terms on the right side are either other compound elements or individual data elements. By making substitutions using matching terms in the production rules, one can explain higher- level concepts using data elements. The symbols used in the production rules have the following meaning:
- Symbol Meaning
= | is replaced by, produces, consists of |
+ | and |
[|] | selection - select one term from the list of enclosed terms (exclusive or). Terms are separated by "|" |
m{}n | iteration - the term(s) enclosed is(are) repeated from "m" to "n" times |
() | optional - the term(s) enclosed is(are) optional |
Examples:
a = b + c | "a consists of b and c" |
a = [b | c | "a consists of one of b or c" |
a = 4{b}6 | "a consists of four to six occurrences of b" |
a = b + (c) | "a consists of b and optionally c" |
Interpreting the production rules:
- The terms bounded by parentheses, "(" and ")", are optional and are provided at the discretion of the data producer. If a producer chooses to provide information enclosed by parentheses, the producer shall follow the production rules for the enclosed information. For example, if the producer decides to provide the optional information described in the term:
- (a + b + c)
Only for terms bounded by parentheses does the producer have the discretion of deciding whether or not to provide the information.
The variation among the ways in which geospatial data are produced and distributed, the fact that all geospatial data does not have the same characteristics, and the issue that all details of data sets that are in work or are planned may not be decided, caused the need to express the concept of "mandatory if applicable." This concept means that if the data set exhibits (or, for data sets that are in work or planned, it is known that the data set will exhibit) a defined characteristic, then the producer shall provide the information needed to describe that characteristic. This concept is described by the production rule:
- 0{ term }1
Extensibility
Extended elements may be defined by a data set producer or a user community. Extended elements are elements outside the standard, but needed by the data set producer. If extended elements are created, they must follow the guidelines in Appendix D, Guidelines for creating extended elements to the Content Standard for Digital Geospatial Metadata.