ISO 19139 Identifiers

From NOAA Environmental Data Management Wiki

Jump to: navigation, search

The ISO 19115, 19115-2, 19119 metadata standards, and the FGDC CSDGM, are content standards. They describe the content that the metadata needs to include without defining the format or representation of that content. In the ISO cases, an XML representation is defined in the ISO 19139 Technical Specification (Note: 19139 is usually referred to as a "Standard" but it is really an ISO Technical Specification). This is the defacto standard for representing these metadata.

The ISO 19139 specification uses some advanced XML techniques to add capabilities to ISO metadata. An important set of these capabilities are related to identifying and referring to "objects" in the metadata. Those capabilities are described here.

Contents

Objects and References

Objects and References
ISO XML consists of tags, elements (with or without content), and attributes. An attribute is a name/value pair that exists within a start-tag or empty-element tag. Attributes provide additional information about an element which is not part of the data. Attribute values must contain either single or double quotes. This example shows a step element with one attribute named number with a value of “3”:

<step number="3">Connect A to B.</step>

Many of the XML attributes used in the ISO Standards fall into two groups: identifiers and references:

  • Identifiers: id and uuid
  • References: uuidref and xlink:href

Objects start with upper case letters have identifiers (id and uuid)

Roles start with lower case letters have references (uuidref and xlink:href)

The Figure illustrates this idea with two people: Jane and John Doe. They are items in the real-world, objects. In ISO metadata they have a type (CI_ResponsibleParty) and they have identifiers (JaneDoe and JohnDoe). They also have roles with respect to one another. Jane has a friend (the role) whose identifier is JohnDoe. This is indicated with the reference (xlink:href) to JohnDoe associated with the role of friend. John also has a friend (the role) named JaneDoe, as indicated by the reference to Jane associated with the friend role.

Understanding that roles start with lower case letters and are fulfilled by objects that start with uppercase letters and have id's is one of the simple secrets of ISO XML. Keep it in mind!

The UML

MD MetadataExtensionInformation
This Figure shows the UML model for the ISO MD MetadataExtensionInformation class. This class is used to describe extensions to the standard that are being used in some record. They can be described in an OnlineResource or in a series of extendedElementInformation objects. Each extendedElementInformation object includes thirteen elements that describe the extension. The final one in the UML is source. This is a CI_ResponsibleParty that describes the person or organization that created the extension.

The Schema

One of the most disconcerting aspects of the 19139 XML Schema, particularly if you are used to looking at FGDC metadata, is the incredible amount of attention paid to the types of the XML elements described in the schema. This schema is "strongly typed" like the C or Fortran programming languages. Everything has a known type. The PERL programming language, in contrast, is "loosely typed", as is FGDC metadata. Loosely typed XML is easier to write, but ultimately more ambiguous. You do get used to the strong typing, but it takes some time!

The type of the XML object that defines the extendedElementInformation object is MD_ExtendedElementInformation_Type (this isn't really true, but we have to start somewhere). The definition of this type is shown below. Note that it almost matches the MD_ExtendedElementInformation class in the UML, but all of the types in the schema have "_PropertyType" appended to their names. For example, the "source" element at the bottom has a type "gmd:CI_ResponsibleParty_PropertyType". This is the first indication that something is up.

<xs:complexType name="MD_ExtendedElementInformation_Type">
	<xs:annotation>
		<xs:documentation>New metadata element, not found in ISO 19115, which is required to describe geographic data</xs:documentation>
	</xs:annotation>
	<xs:complexContent>
		<xs:extension base="gco:AbstractObject_Type">
			<xs:sequence>
				<xs:element name="name" type="gco:CharacterString_PropertyType"/>
				<xs:element name="shortName" type="gco:CharacterString_PropertyType" minOccurs="0"/>
				<xs:element name="domainCode" type="gco:Integer_PropertyType" minOccurs="0"/>
				<xs:element name="definition" type="gco:CharacterString_PropertyType"/>
				<xs:element name="obligation" type="gmd:MD_ObligationCode_PropertyType" minOccurs="0"/>
				<xs:element name="condition" type="gco:CharacterString_PropertyType" minOccurs="0"/>
				<xs:element name="dataType" type="gmd:MD_DatatypeCode_PropertyType"/>
				<xs:element name="maximumOccurrence" type="gco:CharacterString_PropertyType" minOccurs="0"/>
				<xs:element name="domainValue" type="gco:CharacterString_PropertyType" minOccurs="0"/>
				<xs:element name="parentEntity" type="gco:CharacterString_PropertyType" maxOccurs="unbounded"/>
				<xs:element name="rule" type="gco:CharacterString_PropertyType"/>
				<xs:element name="rationale" type="gco:CharacterString_PropertyType" minOccurs="0" maxOccurs="unbounded"/>
				<xs:element name="source" type="gmd:CI_ResponsibleParty_PropertyType" maxOccurs="unbounded"/>
			</xs:sequence>
		</xs:extension>
	</xs:complexContent>
</xs:complexType>

The Objects

MD ExtendedElementInformation Type
This Figure shows all of the parts of the definition of the source element starting in the lower right and progressing up and to the left. The first is the AbstractObjectType. This object has no content, but it includes a group of attributes called gco:ObjectIdentification (gco is the namespace where this group is defined which I am ignoring for the sake of simplicity here). This group includes two identifiers: id and uuid (see note below). This means that any AbstractObject, regardless of the content, can be assigned one or both of these identifiers. Remember, objects start with upper case letters and have id's.

The CI_ResponsibleParty object is of type CI_ResponsibleParty_Type, which extends (builds on) the AbstractObjectType by adding the actual content of the object (individualName, organizationName, ...). Next we come to the CI_ResponsibleParty_Property_Type, the actual type of the source, which adds on the nilReason attribute and the ObjectReference Attribute Group. The nilReason attribute is used to explain why a required element is not included in a particular XML. It can have the values inapplicable, missing, template, unknown, and withheld. The ObjectReference Attribute Group includes uuidref and xlink:SimpleLink. These are used to refer to objects have have matching uuids or ids described earlier.

Finally, we come to the MD_ExtendedElementInformation_Type which is shown in the schema segment above. Notice that it also extends AbstractObject_Type, so it can also have a uuid or id assigned. The whole process is then repeated at the next level (and everywhere else in the 19139 schema).

The XML

Partial XML that complies with this schema is shown below for two extendedElementInformation objects. The first one includes a complete definition of Ted Habermann as the source for this extension. This CI_ResponsibleParty is labeled with id="Ted.Habermann" which has to be unique in the XML file. The second object comes from the same source, but there is no need to repeat the full definition. Instead, we can reference the local object called "Ted.Habermann".

 <gmd:extendedElementInformation>
        <gmd:MD_ExtendedElementInformation>
            <gmd:name>
                <gco:CharacterString>fileIdentifier</gco:CharacterString>
           </gmd:name>
            …
            <gmd:source>
                <gmd:CI_ResponsibleParty id="Ted.Habermann">
                    <gmd:individualName>
                        <gco:CharacterString>Ted Habermann</gco:CharacterString>
                    </gmd:individualName>
                    <gmd:organisationName>
                        <gco:CharacterString>DOC/NOAA > NOAA, U.S. Department of Commerce</gco:CharacterString>
                    </gmd:organisationName>
                    <gmd:contactInfo>
                        <gmd:CI_Contact>
                            <gmd:address>
                                <gmd:CI_Address>
                                    <gmd:electronicMailAddress>
                                        <gco:CharacterString>ted.habermann@noaa.gov</gco:CharacterString>
                                    </gmd:electronicMailAddress>
                                </gmd:CI_Address>
                            </gmd:address>
                        </gmd:CI_Contact>
                    </gmd:contactInfo>
                    <gmd:role>
                        <gmd:CI_RoleCode codeList="http://www.isotc211.org/2005/resources/Codelist/gmxCodelists.xml#CI_RoleCode" codeListValue="originator"/>
                    </gmd:role>
                </gmd:CI_ResponsibleParty>
            </gmd:source>
        </gmd:MD_ExtendedElementInformation>
    </gmd:extendedElementInformation>
    <gmd:extendedElementInformation>
        <gmd:MD_ExtendedElementInformation>
            …
            <gmd:source xlink:href="#Ted.Habermann"/>
        </gmd:MD_ExtendedElementInformation>
    </gmd:extendedElementInformation>

IDs and UUIDs

The identifiers used in the id attribute are XML Names which have significant restrictions. They must begin with a letter or an _, and, after the first character, be composed only of letters, digits, ., _, and -.

The UUIDs are Universally Unique Identifiers which also have special characteristics.

IDs and Labels

The ubiquitous nature of ids in the ISO standards make it attractive to think of them and use them as labels of objects. In many cases this does not work out very well because of the limitations on the character set and the requirement for uniqueness. This is particularly true in situations where objects include description elements. In those cases, use the description elements as labels instead of the ids.