Home | Data | News | Events | Articles | Nodes | Preferences | Help | About | Press | Site map
SITE SEARCH: 
    
GBIF Data
Browse
Search
How to search
Providers
Data policy
About GBIF
Press
GBIF Q&A
GBIF Data Sharing
GBIF Symposia, etc.
Ebbe Nielsen Prize
GBIF Posters
GBIF Publications
GBIF Documents
GBIF Membership
GBIF Nodes
GBIF Directory
Tools and services
Newsletters
Mailing lists
Wiki
UDDI registry
Standards
CIRCA
GBIF tools download
Support
Become a data provider
GB documents [login]
GB15
Helpdesk
Training
Travel guidelines
FAQ
Programmes
DADI
DIGIT
ECAT
OCB
Home Stories centre

Story: Infrastructure for the Global Compositae Checklist in Place and Working


Click on the image to enlarge

GBIF 2005-2006 Seed Fund award results in software for integrating checklists of taxa, and a solid beginning on a Global Species Database for the Compositae (Asteraceae) family of plants.
Released on: 28 May 2008
Contributor: Meredith Lane
Language: English
Spatial coverage: Not applicable
Keywords:
Source of information: GBIF Secretariat, from Final Report on the Seed Fund project
Concerned URL: http://www.gbif.org/Stories/STORY1141828042

This GBIF Seed Fund project was established to develop three outputs as initial steps to achieving a truly global checklist for the Compositae (which has been identified as the largest single gap - 10% of all flowering plants - in a potential checklist for the vascular flora of the world):

  • the informatics tools necessary to manage the integration of multiple existing checklists into a single consensus view, while maintaining linkages to multiple taxonomic opinions and individual data provider records,
  • a baseline integration of some key datasets from identified project partners, and
  • the beginnings of a network of taxonomists necessary to use the resulting baseline data to actually develop a global checklist.

The C-INT provides a managed and automated workflow for integrating different checklists, including a number of checklist data components:

  • nomenclatural objects,
  • multiple taxon concepts, and
  • literature references.
C-INT then identifies commonalities of these components among the imported checklists, and outputs consensus checklist records. It also provides an editorial interface to these derived checklist records for use by human editors.

C-INT is now fully functional and has (as of April 2008) been used to integrate 24 checklists (a number of other datasets are in the pipeline). From these, C-INT has generated consensus taxon-concept records for 5,000 genera, 35,000 species, and 5,000 infraspecific taxa. These consensus records are derived from a total of 60,000 consensus taxon-concept records integrated from 90,000 provider taxon-concept records. The consensus taxon concepts include 39,000 synonyms integrated from 53,000 synonyms present in data-provider records.

These figures contain a signal that there is substantial overlap of names recorded in different checklists, but less overlap among the synonymies recorded in those checklists. This means that regional checklists concentrate on regionally relevant synonyms. Thus, the process of integrating separate regional checklists is providing the intended functions of creating a baseline database (proto-GSD) from different sources, and approaching the totality of concepts that are potentially related to each taxon.

The 35,000 species concepts generated by C-INT for the Compositae to date already exceeds the current global estimate of species numbers for the family, most likely because of numerous redundancies. The number of redundant consensus concepts will be reduced as a human editor checks entries generated by the software and identifies them as such.

Christina Flann, the current checklist editor, has secured a 3-year postdoctoral position at Wageningen University to continue the editorial work on the baseline (proto-GSD) Compositae Checklist.

It is important to understand that the algorithmic rule-based integration provided by C-INT has significantly reduced the work that Dr. Flann will have to carry out, and has provided a robust data management environment within which to track both automated and human-edited integration of multiple taxon concepts.

C-INT will continue to be developed, and will be used in a number of new projects, such as a regional ‘catalogue of life’ for New Zealand (NZOR - New Zealand Organisms Register). Thus, GBIF Seed Fund award will have achieved the purpose of this GBIF funding mechanism: it will have "seeded" developments and projects that go beyond the work directly supported by the monies provided by GBIF.

Please note that this story expired on 2008/06/27

Contact info | Webmaster | Webmaster login | Printable page