PowerPoint presentation

The Science Information Infrastructure: An Integrated Network for Finding and Using Information About Our Physical World

Dr. Walter L. Warnick
Office of Scientific and Technical Information
US Department of Energy


[morning session]

I am honored to be here today. I want to share with you the vision that several of my colleagues and I have for a science information infrastructure. We are creating a network of government science information for researchers, students, and consumers to find and use information. This vision is not new, but is now made practicable for the first time ever thanks to new information technology.


[Slide 2 - Historical Foundation]

The need for a comprehensive science resource has been well documented for decades. A few key studies are noted on this chart. One of the earliest was the 1945 report of Dr. Vannevar Bush. He called for scientists to make the vast store of knowledge more accessible. Bush saw great potential for focusing scientific knowledge in new directions. He noted that scientific progress was essential for the good of the country.

He advanced the notion that science was a proper concern of government. From the 1950s through the 1990s, several other studies supported and expanded upon these views. Today, the government has enormous information collections to share.

[Slide 3 - Recent Support]

Information technology has enabled new ways to deliver information. But IT has also raised expectations. Researchers want immediate, online access to data and scientific resources. Students now conduct most of their research on the Internet. They need the right information in the right format at the right time. Consumers want answers quickly, too, with Government organizational barriers removed.

These factors and others have driven several current initiatives, such as FirstGov and others shown here.

[Slide 4 - Shared Knowledge]

It is self-evident that shared knowledge is the enabler of scientific progress.

The U. S. Department of Energy is part of a vast public enterprise of scientific research, totaling $80 billion across the U.S. Government. This research discerns the laws of nature so that they can be applied for the improvement of human kind. The principal deliverable coming from research is scientific and technical information (STI). In the minds of students and scientists, information becomes knowledge. Science cannot progress and be made to improve the human condition unless knowledge is first shared. The message is simple. To attain scientific progress and prosperity and to improve the human condition, we must first promote the communication of STI.

The Federal R&D enterprise is a huge success story of the last half century. Many of the technologies that are now securing our nation and preserving our health and welfare were built upon government investments of the 1960's and 1970's. DOE and its predecessor agencies have been the proud sponsor of science-driven growth through the combined efforts of the national laboratories, 73 Nobel Laureates, and thousands of outstanding university and industry-based researchers nationwide.

Science agencies share specific responsibilities for disseminating scientific information. Such responsibilities have been codified in law. For example, enabling legislation for my office, in part, instructs us to conduct an information program "to provide free interchange of ideas and criticism which is essential to scientific and industrial progress and public understanding and to enlarge the fund of technical information." Yes, the results of Federally funded research need to be easily accessible. But the dissemination function for each Agency goes beyond the research it has funded. The purpose of the R&D program is advance key disciplines; information dissemination programs at the science agencies are performed for the same reasons. Today, agencies recognize that the web is the tool of choice to make the dissemination of scientific and technical information happen as never before.


[Slide 5 - PSII Workshop]

My office began developing the concept for a future information infrastructure a couple of years ago. We believe that the deployment of current technology could provide an integrated network of dispersed science resources. We sponsored a workshop last year.

The Workshop Report yielded a high-level vision for a comprehensive "Physical Sciences Information Infrastructure." It also recommended that a strategy be formed for development and implementation. And, the report called for an interagency project to be undertaken.

[Slide 6 - Workshop Panel]

The May 2000 workshop was chaired by Dr. Alvin Trivelpiece. Experts came from both scientific disciplines as well as information professions familiar with scientific communications. This group of experts strengthened the vision that we held for a Science Information Infrastructure.

[Slide 7 - Workshop Findings]

The workshop findings were consistent with historical and recent studies. The infrastructure envisioned would be a convergence of content, technology, and tools.

Following the 2000 Workshop, it was agreed that an interagency strategy was needed to achieve the vision. Various science agencies had much to offer. Each agency would contribute content and other resources consistent with its mission. For DOE, OSTI was the office to lead participation in such an effort. I will next address highlights of DOE's contributions.

[Slide 8 - OSTI Web-Based Products]

Since 1947, OSTI's mission has been to collect, preserve, disseminate, and manage STI for the Department. Once relying primarily on paper-based processes, OSTI's business has been transformed by the deployment of digital technologies. Featured on this slide are a few of the key web-based products offered by my office.

Researchers communicate their results to others in three main ways:

DOE has created a product that extends the state-of-the-art of dissemination for each of the three types of scientific literature. Each product is available to the public at no cost to the patron.

For gray literature, we deployed the DOE Information Bridge <http://www.osti.gov/bridge> in April 1998. It currently has over 70,000 full-text reports, each word searchable, produced by DOE laboratories, university researchers, and other facilities since January 1995.

For journal literature, OSTI launched PubSCIENCE <http://www.osti.gov/pubscience> in October 1999 in partnership with journal publishers, now numbering over 40 partners. The publishers provide bibliographic citation data for their articles; we host them and make the citations searchable and available to the public at no cost to the patron. PubSCIENCE includes about 2 million citations from close to 1000 journal titles. PubSCIENCE is modeled closely after PubMed, a similar service offered by the National Institutes of Health (NIH). What PubMed does for journal literature in the medical sciences, PubSCIENCE does for journal literature in the physical sciences.

Both PubMed and PubSCIENCE have hyperlinks that automatically are live links to the actual full-text articles if the patron has an electronic journal subscription or site license.

Today, all of the mission R&D agencies have put their bibliographic records on the web for free, e.g., the Department of Agriculture's AGRICOLA, the Department of Education's Educational Resources Information Center (ERIC), the Department of Transportation Research Information Services (TRIS), and others. However, PubMed and PubSCIENCE represent a new way of compiling bibliographic databases through partnerships with the journal publishers that are the sources of the full-text articles. Using these two systems, students, researchers, and science-attentive citizens can quickly conduct journal literature searches via the Internet.

For preprints, OSTI deployed a service called the PrePRINT Network <http://www.osti.gov/preprint> in January 2000. We do not operate any preprint servers; rather, the Network allows patrons to search across other people's preprint servers. The PrePRINT Network is a searchable gateway. We estimate that there are currently about 8000 preprint sites covering scientific and technical disciplines of concern to DOE. The PrePRINT Network now makes 6000 of these sites searchable, numbering over 375,000 preprints. The remaining 2000 sites will be included in the PrePRINT Network by year end.

While we have a number of other products that we believe to be especially innovative, time does not permit me to describe them.

[Slide 9 - GrayLIT Interagency Model]

A new tool was developed in summer 2000 using the Distributed Explorer Directed Query Engine to search and combine different sets of Federal information. Developed in response to recommendations from the Physical Sciences Information Infrastructure (PSII) Workshop <http://www.osti.gov/physicalsciences> held in May 2000, the site signifies a new collaboration among Federal agencies to enable convenient access to government information by the American public. It supports an interdisciplinary view of science by providing the opportunity to look beyond one's specialty and to access and combine relevant information from other disciplines, unbounded by organizational lines.

With this tool, it is no longer necessary for a science-attentive citizen to know which agency is working in a particular area or discipline.

The GrayLIT Network <http://www.osti.gov/graylit> was created as a result of an agreement in principle between DTIC, NASA, the Environmental Protection Agency (EPA) and DOE. Each of these agencies had already developed one or more extensive electronic collections of full-text gray literature freely available at the agency's web site. The GrayLIT Network provides a single interface to search across the combined deep web information, currently numbering over 130,000 documents, without placing any burden on the participating agencies. It met the call of the PSII Workshop participants for an early success and provides an interagency model for disseminating STI in ways that are not agency-centric.

After the Workshop on the Information Infrastructure for the Physical Sciences, a dialogue began on the nature of a collaborative site -- and the potential forms it could take. Using a FirstGov-like approach, scientific information could be directed more effectively toward the science-attentive citizen.

[Slide 10 - PITAC Quote]

The PITAC 2001 Digital Library Panel report states their vision for universally accessible collections of human knowledge, expressed this way:

"No matter where the digital information resides physically, sophisticated search software can find it and present it to the user. In this vision, no classroom, group, or person is ever isolated from the world's greatest knowledge resources."

The Digital Library Panel report advances the notion that digital library and information technologies will have a transformational impact on government services and information.

[Slide 11 - Strengthening the Public Information Infrastructure for Science]

Planning began Spring 2001 with a number of agency and organization representatives to form an interagency science portal. Because DOE cannot and should not do a science information infrastructure alone, several science agencies are involved.

A workshop was convened April 18-19, 2001, at the National Institute of Standards and Technology (NIST) for Federal agencies, academic experts, and other information professionals to explore means for improving access to science information. Sponsored by DOE, Workshop organizers included the University of Maryland Center for Information Policy, the CENDI Information Managers Group, NIST, and NSF.

The face of science is changing as it adapts to the new opportunities of the electronic world to conduct research. Science is not bounded by organization, although organizations sponsor science. The Web offers a way to bring organizational information and resources into a distributed digital environment where it can be reorganized and used by researchers to meet specific needs.

Over 60 workshop participants discussed the profound effect that the World Wide Web has had on the conduct and communication of science. Agency representatives also recognized the tremendous opportunities for making Federal agencies' science information more accessible and the value of making science information resources more useful to researchers, teachers, and learners wherever they are located.

[Slide 12 - Shared Premise]

The web is used by all the science agencies now to offer services to their patrons, although most of the activities are agency-centric.

This is as it should be. Each agency is funded first and foremost to support its own mission. However, science is not bounded by organization or geography. More effective access to scientific information can be achieved by taking a logical, user-oriented approach. A number of approaches and perspectives have been advanced to support the vision of "borderless" information.

The goal now is to transcend organizational boundaries and to develop a portal for STI. The science.gov domain has been reserved for an interagency portal to support the vision of a science-centric model, rather than the traditional agency-centric model. It is not only possible to overlay technology onto geographically dispersed content, but it is possible to do so at minimum cost and with minimum requirements.

The result will be a vision realized: a comprehensive source for finding, understanding, and using information about our physical world. All made possible through a vast network of partners in a multi-organizational collaboration.

In short, agency tools will be combined and integrated into interagency tools. The deep web includes many vast collections, which may be viewed as the building blocks of one wing of science.gov.

Resource linking and deploying directed query engines are likely near-term measures of success. In the mid-term, tremendous possibilities exist. We know the technologies are coming that will provide an integrative mechanism for this content - to enable a patron to access science.gov and find paths to this scientific content, along with the ability to search across it, and to sort it according to his or her particular needs.

It is an exciting prospect from the standpoint of both an information provider and a user. And it is achievable if the science agencies work together to provide mechanisms for universal access to scientific knowledge. This approach enables participants to take advantage of what agencies already do for their own purposes, but with the portal acting as a service to the agencies, for the benefit of the American people.

[Slide 13 - science.gov participants]

At the April 2001 workshop, the concept of science.gov was endorsed as the interagency science portal or gateway whereby the agencies would collectively serve the science-attentive citizen. Recognizing that "the building blocks are available now" and there is "no need to wait - no need to experiment," the principal science agencies listed here agreed to form the Science.gov Alliance.

[Slide 14 - Workshop Report]

Science.gov will maximize use of the science, regardless of the agency or organization that conducts it, where it resides, or the form or format in which it resides.

A number of Federal science agencies have formed an alliance to work together and to make an interagency science portal a reality. This comprehensive science portal is now being referred to as Science.gov. The DOE Office of Scientific and Technical Information is one of the principals of the alliance. It is built upon a foundation that the science agencies have created from the long tradition of ensuring access to STI.

[Slide 15 - FirstGov for Science prototype website]

Much progress has been made since the April workshop. A session this afternoon at 1:30 will provide more information about the science.gov initiative. This is a measure about how rapidly things change.

[Slide 16 - Universal Access]

In summary, the year 2002 will hopefully see the official launching of an interagency science gateway, science.gov, to serve the needs of the research community and provide universal access to government science information for the science-attentive citizen.