Medical Subject Headings | |
The Global Citation Maintenance (GCM) data in XML format makes available the annual changes which are made by NLM in the MeSH indexing of citations in PubMed and distributed MEDLINE. Users of other systems that use MeSH for subject indexing may also find the GCM data helpful for their indexed documents, but they must be aware of relevant differences from the NLM database. For example, the searches required by manual tasks are specific to PubMed syntax.
The MeSH vocabulary is updated annually. The primary goal of citation maintenance is ensure that the existing indexing with MeSH of the citations is consistent with the current version of the MeSH vocabulary while retaining the intent of the existing indexing. Changes in MeSH which may impact citations are: (a) deletions of MeSH headings, and (b) changes in the preferred term of a MeSH heading. Indexing terms which have been deleted or replaced in the MeSH vocabulary must themselves be removed or replaced in the citation in order to remain consistent with MeSH. Citation maintenance is concerned with how to appropriately replace the old reference.
Citations or other documents indexed with MeSH terms are usually indexed by the MeSH term or the MeSH Unique Identifiers (UIs) which refer to a MeSH vocabulary record. The GCM data are intended to provide sufficient information to allow systems using either terms or UIs to be updated correctly.
In the past, the MeSH Section has made available lists of Deleted Headings (deleted Descriptor records) and Replaced Headings (changes in Descriptor preferred terms). However, no information has been available for changes in Supplementary Concept Records (SCRs), nor has more detailed record information, such as the unique identifier, been included.
Citation maintenance is accomplished by "tasks" - database transactions which make a specific change in the indexing of a set of citations. See section 3 for types of tasks, section 6 for a detailed description of task elements. One of the essential features of executing these tasks is the relative order among tasks of different types as well as the order of required citation queries. See section 4 for a fuller account of the sequence of tasks and queries. A chart is also available which represents the maintenance procedure graphically.
GCM data represent annual changes in the MeSH vocabulary which are available in MEDLINE by January of each year. Annual changes in Supplementary Concept Records (SCRs), especially changes affecting Descriptors, are included in the data, though SCR changes made regularly throughout the year are not currently included.
For a detailed explanation of the GCM files format, see sections 4 and 5, below. The format of the task data and how they are to be used, depends on the type of task, which is explained in the following.
3.1 Updating the indexing - the MeSH preferred term and UI
Indexing with MeSH headings consists in the assignment to a citation of a reference to a MeSH Descriptor, Qualifier, or Supplementary Concept Record (SCR). The reference may be either: (a) the preferred term in the record, for example, 'Heart Arrest', or (b) an alpha-numeric unique identifier (UI) for the MeSH record, for example, 'D006323'. Citations in NLM's Medline XML, for example, use the preferred term in the <MeshHeading> , <NameOfSubstance>, and <QualifierName> . Other systems may index with only the MeSH UI and not the MeSH term. To accomodate both types of indexing, the GCM data include both a MeSH UI and the corresponding preferred term for every update action.
Specific "tasks" or transactions are created to change the MeSH indexing in a citation. A task either: (a) replaces an existing MeSH reference with another, (b) adds a reference, or (c) deletes a reference.
3.2 Main types of tasks
Maintenance tasks are divided into three categories that reflect the source of the task. This affects the order in which the task is executed and its scope.
When the preferred term in a MeSH record has changed, indexing by MeSH term must be replaced by the new preferred term. For example, in 2004 MeSH the preferred term for the heading Green Sulfur Bacteria was changed to 'Chlorobi'. This is essentially a name change and is usually the most transparent of indexing changes.
Preferred term tasks are applied to every citation in the database and always replace an existing preferred term with a different preferred term.
When a MeSH record is deleted, references to the record are usually replaced with references to a different MeSH record. For example, in 2004 MeSH the Descriptor record for Methanogens (UI = D008699) was deleted. Existing citation references were replaced with references to another record Euryarchaeota (UI = D019605). These tasks are called automatic because the replacement is determined by algorithm, though the replacement is originally specified by the MeSH subject specialist when the MeSH record is deleted.
Automatic tasks are applied to every citation in the database and either replace an existing value with a new value, or delete the old value altogether.
Note that the result of applying Automatic tasks is that every MeSH record referenced in the citations is valid in the New MeSH year. Combined with the application of Preferred Term changes, the result is that all citation references to MeSH records are valid MeSH terms or UIs for the New MeSH year. (Assuming that citation references prior to maintenance were valid for the previous MeSH year.)
This type of task is called "Manual" because a MeSH specialist determines the proper maintenance on a case-by-case basis. Manual tasks are often used to refine the results of a previously-run Automatic task. For this reason, Manual tasks must be run after Automatic tasks. (Thus a Manual task may apply to data introduced by a previously run Automatic task.)
While Automatic and Preferred Term tasks are applied to every citation in the database, Manual tasks apply only to citations identified by searches in GCM_SEARCH.XML. A Manual task may replace an existing value with a new value, but may also just add a value or just delete a value. Manual tasks are not essential for preserving valid MeSH references, but they are necessary for preserving the intent of the existing indexing.
The order in which the tasks and queries must be performed can be critical because a task or query may be affected by a previous task. This is especially true when the indexing is done with MeSH terms rather than by MeSH Unique Identifiers (UIs), since terms may be changed without a change in UI.
4.1 Queries for Manual tasks are run before maintenance.
Whether indexing by MeSH term or UI, if Manual tasks are to be used, the queries for the Manual tasks must be independent of later maintenance tasks. This is because the queries used to restrict the application of Manual tasks refer to MeSH terms in the previous year's MeSH and so could be affected by either the Automatic tasks or Preferred Term changes implemented after the queries are formulated. So the queries must be independent of these changes. There are at least two ways to do this. NLM uses the first method.
4.2 Automatic tasks
Automatic tasks are the principal maintenance tasks and the first tasks to be done. Manual tasks are run after the Automatic and Preferred Term tasks because the manual tasks are written to supplement or adjust those results. The order among Automatic tasks does not matter since one Automatic task cannot impact another Automatic task - the maintained-to Descriptor cannot be a deleted record.
4.3 Preferred Term tasks - run after Automatic tasks but before Manual tasks
When updating indexing by term, rather than indexing by UI, it is possible for a Preferred Term task to impact an Automatic task. Therefore, Preferred Term tasks must be run after Automatic tasks.
However, Manual tasks are written with the expectation that Automatic and Preferred Term tasks have already been run. Therefore, Preferred Term tasks must be completed before Manual tasks.
As noted earlier, changes in the MeSH preferred term are implemented only for systems that index by MeSH term rather than MeSH Unique Identifier (UI). However, systems that index with MeSH UI must have available a database of MeSH terms for the new MeSH year in order to display or otherwise produce the appropriate preferred term.
4.4 Manual tasks - run after Preferred Term tasks
Manual tasks are usually created to supplement Automatic tasks. They are therefore written with the assumption that the Automatic tasks have already run, and are therefore always run against the citation database after the automatic tasks. For similar reasons Manual tasks are run after Preferred Term tasks.
4.5 Summing up the order of processing
The following table summarizes the steps required for updating a term-indexed database . The processing will be the same for UI-indexed databases except that step (3) - PrefTerm tasks - will not be applicable. A chart is also available which represents the maintenance procedure graphically.
Process | Description | Sequence |
---|---|---|
1. Queries for Manual tasks | Retrieve sets of citations to be used to specify the range of Manual tasks to be run later. | Query results must be obtained first since later maintenance could impact the queries, written for the previous year's MeSH. |
2. Automatic tasks | Replace all references to deleted MeSH records with references to other MeSH records. | Must be run before Manual tasks since Manual tasks are written to supplement Automatic tasks. |
3. PrefTerm tasks | Replace MeSH preferred term with a different preferred term. | Must be run after Automatic tasks to avoid impacting these tasks. |
4. Manual tasks | Supplement Automatic tasks, usually by adding additional references. Applied to citations previously obtained by query. | Must be run after Automatic tasks, applied to citations identified earlier by queries for each Manual task. |
The <Sequence> element in the GCM XML is designed to ensure this order, as well as the order among Manual tasks.
GCM data are distributed in two files.
GCM.XML. The main file includes a list of every maintenance task, with the old and new values, MeSH UI, etc. See below for a more detailed description of the elements.
GCM_SEARCH.XML. Some maintenance tasks apply only to a specified subset of the database and so they require a search description that narrows the scope of the task. This file is a list of the searches (in PubMed format) for each of the Manual tasks.
In practice the file names will reflect the MeSH year of annual changes. So, for example, for 2004 MeSH, the files will be GCM2004.XML and GCM_SEARCH2004.XML.
The XML structure for GCM.XML is relatively simple, with only two element levels and two attributes. See the GCM2004.DTD and sample GCM2004.XML file. The GCM_SEARCH.XML file is even simpler, with a task ID mapping the search to the corresponding task in the GCM.XML file. See GCM_SEARCH2004.DTD and sample GCM_SEARCH2004.XML file. See also the more detailed data element descriptions for both sets of files, below.
Data are encoded in UTF-8 format. Currently the data are also compatible with 7-bit ASCII encoding.
Files are also available for all MeSH records in XML format. Medline and other NLM data in XML format are also available.
The following two tables list each XML element and attribute for the two files, with a brief description. Following the tables, there is a more discursive description of the elements, including examples in XML format.
6.1 Synopsis of XML elements
The following is a list of GCM elements in tabular format, with a brief description of each.
GCM.XML
Element/attribute | Value Range | Description |
CitMaintTaskSet | Set of all tasks. Root element. | |
CitMaintTask | Specific task to replace, add, or delete indexing data. | |
/Action | Replace, Add, Delete | Nature of the change to the citation. |
/TaskSourceType | Manual, Automatic, PrefTerm | Process by which task was created. |
MTaskID | M..., A...., P.... | Unique identifier for the task. Leading alphabetic, remainder numeric. |
MeSHYear | (YYYY) | Year when annual MeSH changes first appear in January. |
ExistingMeSHUI | D......, C......, Q...... | UI of the MeSH record reference being replaced or deleted. Null when Action is Add. Same value as NewMeSHUI for PrefTterm change. |
NewMeSHUI | D......, C......, Q..... | UI of the MeSH record reference replacing the old value, or being added. Null when Action is Delete. Same value as ExistingMeSHUI when only preferred term being changed. May include attached Qualifier UI. |
ExistingMeSHPrefTerm | (string) | Preferred term for ExistingMeSHUI. |
NewMeSHPrefTerm | (string) | Preferred term for NewMeSHUI. |
ExistingMeSHRecType | DESCRIPTOR, SCR, QUALIFIER | |
NewMeSHRecType | DESCRIPTOR, SCR, QUALIFIER | |
MajorTopicYN | Y, N | New value may be marked as the major topic of the citation. |
Sequence | (positive integer) | Order in which tasks must be run. |
GCM_SEARCH.XML
Element/attribute | Value Range | Description |
CitMaintSearchSet | Set of all searches for Manual tasks. Root element. | |
CitMaintSearch | Information needed to identify search which is needed to apply a Manual task in GCM.XML. | |
MTaskID | M..., A...., P.... | Maps search to Manual task in SEARCH.XML having the same <MTaskID>. |
MeSHYear | (YYYY) | Year when annual MeSH changes first appear in January. Not the MeSH year of the MeSH terms in the search, which is one year previous to <MeSHYear>. |
SearchPubMed | (free text) | Search limiting application of a Manual task. Must be run prior to any maintenance. |
6.2 Alphabetic List of XML elements
The following are the elements in the two XML files.
Action
Description: Nature of the change to the citation. One of the
following: Replace, Add, Delete.
Example:
<CitMaintTask Action="Replace" TaskSourceType="Automatic">
Subelement of: n/a; attribute of <CitMaintTask>
In file: GCM.XML.
Required element: yes
<CitMaintSearch>
Description: Information needed to apply a citation search to a
given Manual task. Used to restrict the application of a Manual
tasks to a given set of citations. The search applies to the
Manual task in the GCM.XML which has the same
<MTaskID>.
Subelement of: <CitMaintSearchSet>.
In file: GCM_SEARCH.XML.
Required element: yes
<CitMaintSearchSet>
Description: Set of all <CitMaintSearch> elements in
GCM_SEARCH.XML. Root element.
Subelement of: none; this is the root element of the
GCM_SEARCH.XML.
In file: GCM_SEARCH.XML.
Required element: yes
<CitMaintTask>
Description: Transaction consisting of all the information needed
to change an instance of MeSH-indexing in a citation
record.
Subelement of: <CitMaintTaskSet>
In file: GCM.XML.
Required element: yes
<CitMaintTaskSet>
Description: The set of all <CitMaintTask> elements in the
GCM.XML file
Subelement of: none; this is the root element of the
GCM.XML.
In file: GCM.XML.
Required element: yes
<ExistingMeSHPrefTerm>
Description: Preferred term in MeSH for <ExistingMeSHUI>.
Null when Action is Add. Critical for PrefTerm changes, may be
redundant for Automatic and Manual changes. May be the same as
<NewMeSHPrefTerm> in the same task when TaskSourceType is
Manual or Automatic. Example:
<ExistingMeSHPrefTerm>Aborigines</ExistingMeSHPrefTerm>
Subelement of: <CitMaintTask>
In file: GCM.XML.
Required element: no
<ExistingMeSHRecType>
Description: The MeSH record type of the <ExistingMeSHUI>.
One of DESCRIPTOR, QUALIFIER, SCR. Null when Action is Add.
Redundant in that the record type may be inferred from the
initial character of <ExistingMeSHUI> (D, Q, C). Designed
to make it easier for users of XML to extract actions pertaining
to only one record type. May be different from
<NewMeSHRecType> in the same task.
Example:
<ExistingMeSHRecType>SCR</ExistingMeSHRecType>
Subelement of: <CitMaintTask>
In file: GCM.XML.
Required element: no
<ExistingMeSHUI>
Description: UI of the MeSH record reference being replaced or
deleted. Matches the seven-character string in a
<DescriptorUI>, <SupplementalRecordUI>, or
<QualifierUI>. Null when Action is Add. Same value as
<NewMeSHUI> in the same task for PrefTterm change. Not
necessarily in the previous year of MeSH but could be an
intermediate value in the maintenance process.
Example:
<ExistingMeSHUI>C039562</ExistingMeSHUI>
Subelement of: <CitMaintTask>
In file: GCM.XML.
Required element: no
<MajorTopicYN>
Description: Medline indexing includes an optional indicator for
Descriptors representing a main point of a citation. So in a
maintenance task which adds a reference to a citation (Add or
Replace), major topic of the citations may be indicated by a "Y"
value. (Cf. Medline MajorTopicYN, which is an attribute of the
<DescriptorName>, rather than a separate element. The
GCM.XML uses a separate element for the MajorTopicYN rather than
make it an attribute of two elements - the
<NewMeSHPrefTerm> and the <NewMeSHUI>.)
Example:
<MajorTopicYN>Y</MajorTopicYN>
Subelement of: <CitMaintTask>
In file: GCM.XML.
Required element: no
<MeSHYear>
Description: Year when annual MeSH changes first appear in
January. All "new" data in the XML will be consistent with MeSH
data in that <MeSHYear>. In the GCM_SEARCH.XML it has this
meaning as well and does not mean the MeSH year of the MeSH terms
in the <Search> element, which will be the year prior to
the <MeSHYear>
Example:
<MeSHYear>2004</MeSHYear>
Subelement of: <CitMaintTask>
In file: GCM.XML, GCM_SEARCH.XML.
Required element: yes
<MTaskID>
Description: Unique identifier for each <CitMaintTask>. For
PrefTerm tasks the value begin with 'P', for Automatic tasks 'A',
and for Manual tasks 'M'. Will be unique across years. The
numeric portion has no inherent significance.
Examples:
<MTaskID>A2</MTaskID> <MTaskID>M1107</MTaskID>
Subelement of: <CitMaintTask>
In file: GCM.XML; GCM_SEARCH.XML.
Required element: yes
<NewMeSHPrefTerm>
Description: Preferred term in MeSH for <ExistingMeSHUI>.
Null when Action is Delete. Critical for PrefTerm changes, may be
redundant for Automatic and Manual changes. May be the same as
<ExistingMeSHPrefTerm> in the same task when TaskSourceType
is Manual or Automatic. Example:
<NewMeSHPrefTerm>Oceanic Ancestry Group</NewMeSHPrefTerm>
Subelement of: <CitMaintTask>
In file: GCM.XML.
Required element: no
<NewMeSHRecType>
Description: The MeSH record type of the <NewMeSHUI>. One
of DESCRIPTOR, QUALIFIER, SCR. Null when Action is Delete.
Redundant in that the record type may be inferred from the
initial character of <NewMeSHUI> (D, Q, C). Designed to
make it easier for users of XML to extract actions pertaining to
only one record type. May be different from
<ExistingMeSHRecType> in the same task.
Example:
<ExistingMeSHRecType>DESCRIPTOR</ExistingMeSHRecType>
Subelement of: <CitMaintTask>
In file: GCM.XML.
Required element: no
<NewMeSHUI>
Description: UI of the MeSH record reference replacing the
existing value, or being added. Matches the seven-character
string in a <DescriptorUI>, <SupplementalRecordUI>,
or <QualifierUI>. Null when Action is Delete. Same value as
<ExistingMeSHUI> in the same task for PrefTterm change.
When a <DescriptorUI>, the value may include an
attached<QualifierUI>. (See example.)
Examples:
<NewMeSHUI>D043203</NewMeSHUI> <NewMeSHUI>D008628/Q000627</NewMeSHUI>
Subelement of: <CitMaintTask>
In file: GCM.XML.
Required element: no
<Sequence>
Description: Number indicating order in which tasks for a given
year are executed. The order in which the tasks must be performed
is: (a) Automatic, (b) Preferred Term, and (c) Manual. In
addition, a specific order may be required within the Manual
tasks. To guarantee this order, the <Sequence> values are
assigned in the follow way:
All Automatic tasks have a value of 1.
All PrefTerm tasks have a value of 2.
All Manual tasks have a value of 3 or greater, depending on the
order specified by the analyst creating the Manual task.
Example:
<Sequence>1</Sequence>
Subelement of: <CitMaintTask>
In file: GCM.XML.
Required element: yes
<SearchPubMed>
Description: A citation search used to restrict the application
of a Manual task specified in GCM.XML. PubMed format - see
http://www.ncbi.nlm.nih.gov/entrez/query/static/help/pmhelp.html.
Example:
<SearchPubMed>biota [nm] AND+MEDLINE+[sb]</SearchPubMed>
Subelement of: <CitMaintSearch>
In file: GCM_SEARCH.XML.
Required element: yes
TaskSourceType
Description: Process by which task was created. One of the
following: PrefTerm, Automatic, Manual.
Example:
<CitMaintTask Action="Replace" TaskSourceType="Automatic">
Subelement of: n/a; attribute of <CitMaintTask>
In file: GCM.XML.
Required element: yes
Last updated: 01 September 2004
First published: 26 August 2004
Metadata| Permanence level: Permanence Not Guaranteed