Javascript Menu by Deluxe-Menu.com
Skip to content Skip to navigation

Federal Geographic Data Committee

Sections
 
Personal tools
You are here: Home Data & Services Clearinghouse Concepts Q&A
Document Actions

Clearinghouse Concepts Q&A

This document describes the context of the National Geospatial Data Clearinghouse Network and details of its construction and operation.

What is Clearinghouse?

The Clearinghouse Activity, sponsored by the FGDC, is a distributed system of servers located on the Internet which contain field-level descriptions of available digital spatial data and services. This descriptive information, known as metadata, is collected in a standard format to facilitate query and consistent presentation across multiple participating sites. Clearinghouse uses readily available Web technology for the publication and discovery of available geospatial resources through the geodata.gov portal.

The fundamental goal of Clearinghouse is to provide access to digital spatial data and related online services for data access, visualization, or order. The Clearinghouse Network functions as a detailed catalog service with support for links to spatial data and browse graphics. Clearinghouse sites are encouraged to include hypertext links to online resources (e.g. map services, data download locations, data access services, applications) within their metadata entries to enable access to all facets of the described resource. Where digital data are too large to be made available through the Internet or the data products are made available for sale, linkage to an order form can be provided in lieu of a data set. Through this model, Clearinghouse metadata provides low-cost advertising for providers of spatial data, both non-commerical and commercial, to potential customers via the Internet.

Clearinghouse allows individual agencies, consortia, or geographically-defined communities to band together and promote their available digital spatial data through a metadata service. These servers may be installed at local, regional, or central offices, as dictated by the organizational and logistical efficiencies of each organization. All Clearinghouse servers are considered "peers" within the Clearinghouse activity -- there is no hierarchy among the servers -- permitting direct query by any user on the Internet with minimum transactional processing. When these Clearinghouse services are registered with the geodata.gov portal, the portal will harvest and cache a copy of the metadata for rapid retrieval, enabling search through a single interface to all registered assets in the U.S.

Back to top

Why promote a Clearinghouse Activity?

The development of the Clearinghouse among U.S. Federal agencies was motivated by a desire to minimize duplication of effort in the collection of expensive digital spatial data and foster cooperative digital data collection activities. By promoting the availability, quality, and requirements for digital data through a searchable on-line system a Clearinghouse facility would greatly assist in coordination of data collection and research activities. Clearinghouse also provides a primary data dissemination mechanism to traditional and non-traditional spatial data users.

Federal participation in the Clearinghouse is directed by Executive Order 12906 through its official creation of the National Spatial Data Infrastructure. Compliance with this Order and its notions of data sharing has gained Cabinet-level interest; the former chair of the FGDC is the Secretary of Interior, Bruce Babbitt. Today the FGDC is co-chaired by senior officials in the Department of Interior and the Office of Management and Budget.

Back to top

Why not just use Internet search engines?

Digital spatial data and metadata are stored in many forms and systems which make their discovery on the Internet difficult. Structured metadata is typically exchanged in XML format with significant meaning stored in 'fields' or XML elements rather than the HTML documents typically indexed in search engines. Use of current web indexing technology offers literal text search and matching for metadata which happen to be stored in HTML, but do not generally provide the indexing required for search of coordinates, dates and times, and other numeric values. In addition, some entire collections of metadata are being managed within dynamic databases whose content is not accessible to search engines. The Clearinghouse functionality as implemented in the geodata.gov portal augments existing search engine technology (Google search appliance) to include spatial query and permit simple search of metadata based on location and full-text search. Field-level search is also available to refine searches based on topical classification, geography, time, and other key fields in ways not possible with off-the-shelf search engine technology.

The general trend toward connectivity of spatial data producers, vendors, and users on the Internet coupled with the provision of online data via web services indicate a long-term public commitment to not only on-line data discovery but direct data access by client processes across internal and public networks. Clearinghouse provides one standards-based solution to catalog interoperability on the Internet today.

Back to top

Who should participate in Clearinghouse?

Although initially targeted at federal agencies, participation in Clearinghouse prototypes has included federal, state, university, and vendor participants in the United States and abroad. Over 200 clearinghouse servers are also in operation outside the United States supporting the same interoperability standards. In short, any domestic group regardless of size may publish their metadata to the Clearinghouse and make it visible in geodata.gov. Similar publishing portals exist in other countries for the coordination and publication of geographic resources outside the U.S.

The role of the FGDC in Clearinghouse is to develop prototype software, provide reference implementations, facilitate discussions among Clearinghouse participants, develop and present training materials, and operate a registry service of conforming spatial data servers. It is not the intent of the FGDC to create a centralized data system but to facilitate access to distributed stores of spatial metadata, data, and services on the Internet.

Back to top

What are the requirements for being a Clearinghouse provider and user?

A prospective spatial data publisher must have access to an Internet-connected computer on a dedicated connection with a persistent public IP address and name. It is recommended that Clearinghouse metadata services be co-located on hosts with spatial data collections to encourage synchronization between the spatial data, services, and the metadata being served. Organizations not yet connected to the Internet or who have firewall or security restrictions on being directly connected may elect to contract with an existing Internet Service Provider or partner with a local Clearinghouse node in a different organization to provide an off-site host computer for Clearinghouse. An online registry is operated by the FGDC to track the operating details of existing Clearinghouse metadata services. Prospective users of Clearinghouse must have access to a current Web browser. The user interface at geodata.gov uses standard javascript capabilities found in all current Web browsers. User access is supported via dial-up or broadband connection to the Internet. Both simple and advanced interfaces exist at geodata.gov to provide custom levels of search access.

Back to top

What information is accessible through Clearinghouse?

A "digital geospatial data set" is the primary item being described with metadata in the Clearinghouse activity. The definition of a data set can be adjusted to meet a given agency's requirements but it generally corresponds to the smallest an identifiable data product (e.g. file) for which metadata are customarily collected. This may equate to a specific satellite image or vector data set that is managed by a data producer or distributor. Collections of data sets (e.g. flight lines, satellite "paths", map or data series) may also have generalized metadata that could be inherited by individual data sets.

Other geospatial resources may be described in the FGDC metadata, including online services (Web Map Service, Web Feature Service), data download locations, interactive web applications, documents, and other web-accessible resources. The Geospatial Data Presentation Form field in the metadata record can store this information, though other context can be inferred from the style of the URL. Also, FGDC metadata allows for multiple online linkages to be maintained in a metadata record, so multiple facets of the geospatial resource may be described.

Back to top

How does Clearinghouse work?

To provide search interoperability among different servers of geospatial metadata, the search and retrieve protocol known as ANSI Z39.50-1995 (ISO 23950) was selected by the FGDC Clearinghouse activity. The Z39.50 protocol uses client and server software to establish a connection, pass a formatted query, return query results, and present identified documents to the client in one of several formats. The Z39.50 protocol was initially developed by the library community to discover bibliographic records using a standard set of attributes, that would allow any Z39.50 client to present information from different yet similarly-structured servers. On the host (server) computer, Z39.50 server software typically communicates with an appropriate search engine (data base or indexing software) to process the query and formulate the results. In this way, the Z39.50 protocol can provide an alternative access method to existing geospatial data bases or metadata collections without requiring redesign of existing data systems through use of a single, standards-based protocol.

The geodata.gov portal also allows the publication of metadata using three additional web-based methods. For groups interested in entering one or two metadata records interactively, an online form system exists to capture and store the metadata at the portal. For a small collection of metadata records in XML, a user may upload the XML representations of the metadata for management at the portal. For larger collections of metadata to be served in environments that are unable to run the Z39.50 protocol, a "Web Accessible Folder" is available; this is a browse-enabled directory on a host organization's web server that holds the XML metadata for direct harvest by the portal. The geodata.gov portal and catalog provides search access to all four types of publication (online form, XML upload, WAF, and Z39.50) yet presents the results in consistent ways.

Back to top