Workshop report now available - http://www.osti.gov/physicalsciences

A Future Information Infrastructure for the Physical Sciences:
Concept and Assumptions

Dr. Walter L. Warnick, Director
DOE Office of Scientific and Technical Information
May 30, 2000, Workshop
National Academy of Sciences

An Idea of Consequence?

Information technology has raised the expectations of researchers for access to information in the physical sciences. We in DOE believe that researchers' expectations could be met by low-cost deployment of new, already existing technology.

This deployment has been called the Future Information Infrastructure for the Physical Sciences. It has also been called a National Library Initiative for the Physical Sciences, and a variety of other names.

We have convened this workshop to assess the utility and need for such an Infrastructure. Would it be welcomed by and useful to the science community? Is this an idea of consequence?

The Concept: What the Initiative Includes

What is included in this Initiative? It will be a gateway to comprehensive information in the physical sciences. The emphasis is on content, and this really makes the phrase "Information Infrastructure" not quite appropriate. Yes, the information would require "infrastructure," but it is the information itself that is of concern to patrons, not the infrastructure that brings the information to the patron. While "Infrastructure" is not quite the right word as it misses the primacy of content, it is the means to deliver a host of linked information resources. Thus I will continue to use it today for the purposes of this discussion.

Today, we can share knowledge as never before possible. The very concept of an information collection is being revised. No longer need an information collection be actually collected in one physical location. Information can reside at multiple sites; it can be a virtual collection.

Similarly, the concept of a library is being revised. No longer need a library be a physical place. We now speak of digital libraries that are accessible from almost any place. They can have all the advantages of the Internet to which we have now become accustomed: almost instantaneous access, no cost to patrons, full text information.

The Infrastructure would bring information to the desktops of scientists, engineers, students, and others through the Internet. The concept is to deliver RELIABLE information in world-class products using new information technology, which is available now for the first time ever.

The Assumptions

The new technology allows for the first time for information centers like ours to be reached directly by researchers. The Infrastructure envisioned will continue to work with libraries and intermediaries, who traditionally have been our primary audience and who still have an important role, but one that is changing too. Libraries guide their patrons to information resources; today they have a growing role in pointing their patrons to sources on the Internet. Already, the EPA has created the position of Internet Librarian and hired 24 of them.

Technology allows us to rapidly disseminate full text, as opposed to bibliographic information only. Already, our center at DOE has vast virtual collections of full text grey literature, vast virtual collection of hyperlinks to full text journal literature, and searchable access to as many of the preprint servers in the physical sciences as we can find in the world. These web-based systems, all searchable, introduced over the last three years, have surpassed all expectations in terms of use and access by scientists. If you would like to see any of these systems, please see me or one of my staff for a demonstration while we are here at this Workshop. These systems serve as a foundation on which to build a comprehensive library capability.

The web makes it possible for information to be searched, for the first time, at essentially zero marginal cost. However, getting the information loaded on a server, making it searchable, and providing a gatekeeper function to promote reliability all require capital investment. That is the key role of the FII.

The web needs an economic model. The FII offers one. Scientific and technical information (STI) is the principal deliverable coming from the Federal government's $70 billion R&D program. Government is in the R&D business. What is the proper role of government in the dissemination of this STI? Let me offer up an analogy--between STI and the information contained in the white pages of the phone book. The phone book is disseminated to phone company customers at no incremental cost to the customer. The cost of the white pages is bundled together with the costs of the phone company's mainline business, which is operating the phone system. Why does the phone company do this? Presumably, the phone company charges no marginal cost for the white pages because doing so makes good business sense.

Like the white pages of the phone company, it is in the interest of the government to disseminate STI from its R&D program without incremental cost to customers. In this way, the government gets maximum mileage out of its R&D enterprise, which consists of $7 billion at DOE alone. It makes good business sense for the government to not charge for STI.

In one sense, the analogy breaks down--no analogy is perfect. For many years, the government has insisted on recouping the marginal cost for disseminating STI. For example, NTIS charges $40 and more to send out a paper copy of a DOE report. But with the advent of the web, the marginal cost is essentially zero. In this sense, giving away government-generated STI is consistent with long-standing government policy. Government will continue to charge the marginal cost, but the marginal cost is near zero.

While we are here these two days, you will hear about path-breaking information systems from several stakeholders from DOE and others around the room. The best of the systems deal with text: grey literature, journal literature, and preprints.

More needs to be done with text: digitizing the repository of historic physical science literature (which only exists in paper or microfiche); making the availability of electronic journal literature truly comprehensive; and making the physical sciences more user friendly for education, business, and communities.

But text is just the beginning. Once text is conquered, there are domains of images, video, and audio. Text is but the low-hanging fruit. Other media are on the frontier being explored by R&D programs. Among its functions, the Information Infrastructure would be a test bed bringing the results of IT R&D to practical application.

Additionally, the Infrastructure is envisioned to fulfill real-time communication needs of scientists as they are in the midst of the research process. Today Greg Wood will talk about the advances in communication to speed the sharing of knowledge.

Meeting the Need - Realizing the Vision

The Infrastructure will be the realization of a long-held vision of scientists: access to a comprehensive collection of information. The Infrastructure would satisfy the expectations for access on demand. The goal of having such a comprehensive collection easily available has been expressed repeatedly for decades.

In the read-ahead package you received, one paper is entitled "History of the Vision." It summarizes a few of the calls for a comprehensive collection of information in the physical sciences. In brief, it has long been recognized that shared knowledge is the enabler of scientific progress.

What is new today is the technology to realize the vision at an affordable cost.

Collaborators Are Another Key To Success

To bring this vision to fruition requires extensive collaboration. The new technology allows both distributed users and distributed resources. It invites collaboration among resource owners. A distributed and collaborative environment would have roles for:

There is ample evidence that productive collaborations can be formed. The operative word is "productive." For example, in the DOE system, OSTI works closely with our National Labs and universities to compile our vast virtual collection of grey literature. They would continue to provide much of the content - or scientific information - for the Infrastructure. We work closely with publishers to compile our vast virtual collection of searchable hyperlinks to articles. The Infrastructure would not displace primary publishers, but rather complement their role by aiding the scientist in locating the information at the publisher's site. We work with libraries and will continue to as their services transition to support more Internet search and retrieval. We work with Government agencies to deliver our products to the public through Depository libraries and other venues.

What we need now is to look into the future and build on these successes. Collaboration allows us to unify distributed sources and make them all searchable, with the sources apparent to the user.

In looking forward, we must ensure that we use the knowledge of the past. For Federal government-sponsored information, which includes the great bulk of deliverables coming from basic research in the U. S., government institutions with an information mission are the best option for ensuring preservation. Beyond mere preservation, such organizations also have a responsibility to promote permanent public access. The FII would ensure preservation and permanent public access to physical science resources.

Instituting the Vision

To become a reality, the information infrastructure must be funded. We submit that it should be funded by the Federal government, just as the National Library of Medicine, whose focus is the life sciences. It is our best national success story. Kent Smith will discuss it later.

Several Federal Agencies, in addition to DOE, are involved in the physical sciences, and all have a role to play in a national library for the physical sciences. We need to get together, but how do we do it? In any such enterprise, the first step is often the hardest. What can experience tell us? I suggest that the situation with launching the Information Infrastructure is reminiscent of the human genome project.

In 1986, DOE - not NIH - launched the Human Genome Project. Specifically, it was initiated by the DOE Office of Energy Research (then headed by Alvin Trivelpiece). Now it is a great research project, involving NIH and others. Observe that it was only after the project was launched by DOE that NIH became involved. The lesson is that, for an interagency program to be formed, it is sometimes necessary for one agency to take the first step. For the FII, DOE may be ready to take that first step, anticipating that other Agencies will join later.

Summary: Primed for Action

We have heard the calls for a comprehensive collection in the physical sciences.

We know where the content is, and we can get to it.
We have the technology and have proven that it works on the kind of scales needed.
We build upon a 50-year tradition - a solid foundation on which to build the future.
We are poised to take the first step and to act now.

We need to understand what else is needed and what the next steps might be.

For the first time ever, the vision of an Information Infrastructure for the physical sciences or a National Library is feasible. Would it be welcomed by and useful to the science community? Is this an idea of consequence?