ISO Keywords

From NOAA Environmental Data Management Wiki

Jump to: navigation, search
MD Keywords.png
Keywords are an important part of most metadata standards as they provide a mechanism for attaching shared vocabularies to metadata records. The ISO Keywords include three elements:
  1. a set of related keywords
  2. a type of the keywords
  3. a citation to the source (thesaurus) for the keywords

The IS keywords are implemented to allow communities to use keyword dictionaries that are appropriate for their community and to connect to those vocabularies using a citation in the keywordThesaurus. Communities can also define appropriate keyword types by extending the MD_KeywordTypeCode codelist.

Keyword Thesaurus

The ISO Metadata Standards include the capability to provide a citation to the source of the keywords. These citations are used when the keywords are coming from a shared vocabulary or keyword list. The NASA Global Change Master Directory (GCMD) provides a number of commonly used keyword lists. Citations to shared keyword vocabularies are good candidates for components.

A Keyword Question

Hello All, I am hoping that you can help me make a decision about this question!

I work with various communities to develop metadata in the ISO 19115/19115-2 model using ISO 19139 xml. Part of this process entails defining established or creating new vocabularies for free text fields in the metadata standard and I am looking for the best way to preserve information about the chosen vocabulary. It is possible in the ISO keywords section to identify the keyword authority in a citation, however, these vocabulary values may reside in ANY free text fields of the standard, such as organisationName, positionName, hierarchyLevelName...

In my studies and conversations so far, we've come up with a variety of possible but not quite perfect solutions (see below) and hesitate to choose on one method without consulting the collective you.

I would greatly appreciate it if you can share with me your experiences, best practices or thoughts on the question and potential options.

Thanks in advance and I look forward to your response, Anna.Milan@noaa.gov



The Question - How to preserve the information about a controlled vocabularies used for ANY free text field represented as <gco:CharacterString> in XML. The basics information pieces are vocabulary url, vocabulary id and vocabulary word.

This is what a snippet of xml might look like without identifying the vocabulary used and a base for the options below. In these examples, the vocabulary source is GCMD Data Center Keywords, the URL points to a PDF of keywords and the id value is fake.

<gmd:organisationName><gco:CharacterString>DOC/NOAA/NESDIS/NGDC > National Geophysical Data Center, NESDIS, NOAA, U.S. Department of Commerce</gco:CharacterString> </gmd:organisationName>

Solutions that don't extend the standard -

  1. gmd:LocalisedCharacterString
    • valid xml substitute of gco:CharacterString, this is my favorite options so far, but I think official purpose is to name the language used in that particular field. Can a keyword list be considered a language? Do you know of any metadata that uses this field?
    • <gmd:organisationName><gmd:LocalisedCharacterString id="1234" locale="http://gcmd.nasa.gov/Resources/valids/archives/GCMD_Data_Center_Keywords.pdf">DOC/NOAA/NESDIS/NGDC > National Geophysical Data Center, NESDIS, NOAA, U.S. Department of Commerce</gco:CharacterString</gmd:organisationName>
  2. gmd:LanguageCode
    • valid xml substitute of gco:CharacterString, but I think official purpose is the same as LocalisedCharacterString? Ideally, the vocabulary would be formatted like an ISO codelist.
    • <gmd:LanguageCode codeSpace="1234" codeList="http://gcmd.nasa.gov/Resources/valids/archives/GCMD_Data_Center_Keywords.pdf" codeListValue="DOC/NOAA/NESDIS/NGDC > National Geophysical Data Center, NESDIS, NOAA, U.S. Department of Commerce">DOC/NOAA/NESDIS/NGDC > National Geophysical Data Center, NESDIS, NOAA, U.S. Department of Commerce</gmd:LanguageCode>
  3. gmd:MD_KeywordTypeCode
    • valid xml substitute of gco:CharacterString, but I think official purpose is to identify the type of keywords such as theme, place.. Ideally, the vocabulary would be formatted like an ISO codelist.
    • <gmd:MD_KeywordTypeCode codeSpace="1234" codeList="http://gcmd.nasa.gov/Resources/valids/archives/GCMD_Data_Center_Keywords.pdf" codeListValue="DOC/NOAA/NESDIS/NGDC > National Geophysical Data Center, NESDIS, NOAA, U.S. Department of Commerce"> DOC/NOAA/NESDIS/NGDC > National Geophysical Data Center, NESDIS, NOAA, U.S. Department of Commerce</gmd:MD_KeywordTypeCode>
  4. append URN/reverse namespace to the beginning of the vocabulary value in the field
    • completely valid but may be challenging for html presentation, validation of the keyword value against the URL and i don't think the vocabulary id fits anywhere - benefit: this is a common practice advertised
    • <gmd:organisationName><gco:CharacterString>gov.nasa.gcmd.data_center:DOC/NOAA/NESDIS/NGDC > National Geophysical Data Center, NESDIS, NOAA, U.S. Department of Commerce</gco:CharacterString> </gmd:organisationName>
  5. make the entire vocabulary a url with the vocabulary value in the field
    • completely valid but may be challenging for html presentation, validation of the keyword value against the URL and i don't think the vocabulary id fits anywhere, works better for codelists.
    • <gmd:organisationName><gco:CharacterString>http://gcmd.nasa.gov/Resources/valids/archives/GCMD_Data_Center_Keywords.pdf#gov.nasa.gcmd.data_center:DOC/NOAA/NESDIS/NGDC > National Geophysical Data Center, NESDIS, NOAA, U.S. Department of Commerce</gco:CharacterString> </gmd:organisationName>