A Vision for a Digital Library for the Physical Sciences

Presentation to the President's Information Technology Advisory Committee (PITAC) Digital Libraries Panel

Dr. Walter L. Warnick, Director
DOE Office of Scientific and Technical Information

September 19, 2000

 

It is an honor for me to appear before the PITAC Digital Libraries Panel.

My office - the Office of Scientific and Technical Information (OSTI) - has managed DOE's technical information program since the Manhattan Project of World War II. Our mission is to collect, preserve, and disseminate scientific and technical information (STI) for the research community. The manner in which we do this has changed drastically in recent years. We must deploy new IT every day. I believe that our exploitation and deployment of information technologies compares quite favorably to anything that any other executive agency has done.

I am here to present a vision of the future. I hope that the Digital Libraries Panel tells us to get to work.

An Information Infrastructure for the Physical Sciences: Workshop Report

The DOE Office of Science held a workshop May 30-31, 2000, at the National Academy of Sciences to assess the utility and need for an Information Infrastructure for the Physical Sciences (IIPS), which is a vision for a digital library for physical sciences. For decades, researchers have expressed a need for such an infrastructure, and, today, information technology has raised the expectations of researchers for immediate, online access to information in the physical sciences. We in DOE believe that much of the researchers' expectations could be met by low-cost deployment of new, already existing technology. But we needed to hear from the physical science community.

Workshop Panel/Findings

The workshop was chaired by Dr. Alvin Trivelpiece and included experts predominantly from the physical sciences community.

Findings support the need for a common knowledge base, providing comprehensive access and facilitating re-use of worldwide sources of physical science information. It was noted that DOE/OSTI could well serve as the needed point of convergence and lead a collaborative effort to encompass government, academia, professional organizations, and industry.

It also called for test beds, as noted in other PITAC reports.

Forming a Strategy

While the Workshop laid out a vision, developing a strategy to achieve that vision was largely beyond the scope of the Workshop. But the Workshop participants did make clear that the strategy must be interagency.

Each agency must do that which is consistent with its strengths and mission. For example:

Building Knowledge Assets

DOE has made progress in laying the foundation for a digital library in the physical sciences.

As you know, researchers communicate their results in three ways:

DOE has created a product that extends the state of the art of dissemination for each of the three ways: For reports or gray literature, we deployed the Information Bridge in April 1998. It includes essentially all DOE gray literature output since January 1995. It includes 60,000 full-text reports, over 5 million pages, all searchable by every word, available to the public at no cost to the patron, thanks to a partnership with the Government Printing Office (GPO). We think this product is the best gray literature product.

For journal literature, we deployed PubSCIENCE in October 1999 in partnership with journal publishers, who now number 34. Publishers provide us bibliographic data for their articles, we host them and make them searchable and available to the public at no cost to the patron, again thanks to a partnership with GPO. PubSCIENCE includes over 2 million articles from over 1400 journals. PubSCIENCE is modeled closely after PubMed; PubSCIENCE does uniquely for the physical science community what PubMed does uniquely for the medical community.

For preprints, we deployed the PrePRINT Network in January 2000. My office does not operate any preprint servers; rather the Network allows patrons to search across other people's preprint servers. It makes searchable over 1500 preprint servers. We are not aware of any other product that performs this service.

GrayLIT Network

One of the things that the Workshop Report called for was to demonstrate near-term successes. It also called for interagency cooperation, as IIPS covers more than disciplines important to DOE's mission. Two products we've introduced since the Workshop fulfill both the goal to work collaboratively with interagency partners and the need to demonstrate near-term successes.

The GrayLIT Network, at www.osti.gov/graylit, was launched in August. It is a new interagency product developed by DOE/OSTI in collaboration with several other government agency participants. GrayLIT Network is the largest single source of Federal agency gray literature, with over 100,000 items easily accessible over the Internet. Single-query public access is provided to full-text scientific and technical report literature sponsored by several Federal agency participants, including the Environmental Protection Agency (EPA), the National Aeronautics and Space Administration (NASA), the Department of Defense (DOD), and DOE.

Federal R&D Project Summaries

Another web-based interagency tool made available in August 2000 is the Federal R&D Project Summaries. This tool, available at www.osti.gov/fedrnd, was developed in conjunction with the National Science Foundation (NSF) and the National Institutes of Health (NIH) to improve access to federally-sponsored research and development. It allows the researcher to search across over 240,000 research and development summaries and awards from NSF, NIH, and DOE. Its significance rests in the interdisciplinary nature of science discoveries.

Both of these systems depend upon collections which individual Agencies developed for their own purposes. A distributed search engine pulses the search engines already present in each Agency's collection, compiles the results, and presents them to the patron. The collections thus accessed were never intended by their designers to be combined with any other collection. Nevertheless, none of the collections needed to be modified in any way to develop the interagency system. Indeed, the owners of the individual Agency collections might not even be aware that their collections are part of the interagency system had my office not communicated and coordinated with them.

"The Warnick Strategy"

The umbrella under which DOE's resources are linked and accessible electronically is called EnergyFiles, Virtual Library Collections of Energy Science and Technology.

Key Scientific Content Organizations in U.S. Government

The stage is already set for the kind of interagency collaboration envisioned by the Workshop Report. Ten Federal agencies have an information operation like OSTI. Each of these operations are members of an interagency group called CENDI. CENDI is an acronym of the first letters of the participating agencies.

All these operations, except NLM, have one thing in common: they are little utilized as implementers of the results of IT R&D. They have little or no funding to deploy prototypes or to act as a test-bed. A future Information Infrastructure for the Physical Sciences should have among its goals to address this chasm between deployment and research.

Conclusions

The PITAC Digital Libraries Panel should consider: