The PrePRINT Network: An Element of the Future Information Infrastructure for the Physical Sciences


Walter L. Warnick, Ph.D., Director
Office of Scientific and Technical Information
U.S. Department of Energy

Society for Scholarly Publishing Annual Conference
Baltimore, MD
June 1, 2000

The Scientific and Technical Information Program

Thank you for the opportunity to speak to you. It is an honor to be here today.

Before I get into the heart of my presentation and the preprint topic, let me give you some background on the organization I represent.

For 53 years, the Office of Scientific and Technical Information (OSTI) within the Department of Energy has been managing DOE's technical information program. The origin of this program was the Manhattan Project during World War II. From the beginning the fundamental purpose of this program was to ensure that research results were reported and made available to the agency and the broader scientific community.

The mission of OSTI today is still the same: to collect, preserve, disseminate, and leverage the scientific and technical information (STI) resources of the Department of Energy. OSTI provides access to national and global STI for use by DOE, the scientific research community, academia, US industry, and the public.

Though the mission remains the same, the manner in which it is met is very different.

Our deployment of Information Age technologies has radically changed OSTI's service to our customers. OSTI has no choice; we must remain modern.

As Secretary Richardson stated:

"For science to rapidly advance at the frontiers, it must be open.
And shared knowledge is the enabler of scientific progress."

Scientific research and the knowledge and technologies that follow have been credited with about half of the productivity growth of the United States' economy in the past fifty years. The Department of Energy has been a proud sponsor of science-driven growth through the combined efforts of the National Laboratories, 70 Nobel Laureates and thousands of other outstanding university and industry-based researchers nationwide. Well over 100,000 people are directly involved.

Information: Fueling the Science Mission

Today, almost all basic research in the U.S. is funded by the Federal government. The Department of Energy invests $7 billion annually in R&D.

The principal deliverable from R&D is scientific and technical information. STI serves the science mission of the Department as well as serving researcher needs. It is in the vital interest of all research agencies that STI be disseminated as broadly and as quickly as possible.

This is the driving factor in our push to make STI more accessible. A vision has emerged of the great potential that advanced digital technologies offer. By tapping into the Information Age, we can place STI right at the desktop, ready for use by DOE scientists and program managers to fuel the Department's science mission. I'd like to share with you today our recent achievements.

OSTI-Developed Information Age Products

Until recently, the method of disseminating DOE research results was through bibliographic databases. First was the Nuclear Science Abstracts (NSA). NSA is an historic record of nuclear research from the early 1940s through June 1976. The scope was then broadened by the Department of Energy, and the Energy Science and Technology Database (EDB) covers from 1974 to the present. It is a comprehensive source of worldwide energy-related bibliographic information, both nuclear and non-nuclear. Both databases contain "information about information," which we now call "metadata." Together these databases offer more than 5 million records in energy science and technology.

Then along came the Information Age. Several vast virtual collections have been compiled to meet the needs of DOE's research and development (R&D) community. Researchers communicate their findings in three main ways.

The PrePRINT Network

DOE's most recent Web-based product is the PrePRINT Network, launched on January 31 of this year. The PrePRINT Network is a seamless gateway to preprint servers dealing with scientific and technical disciplines such as physics, materials, chemistry and other disciplines of concern to DOE. My office does not operate any of the preprint servers. Rather the PrePRINT Network is a gateway to 1,000 preprint servers run by other folks. These servers host 330,000 preprints. Much depends on the field of science.

Patrons have several search options, including the ability to query multiple preprint servers and to browse by subject. One way is when the patron places a query, then PPN accesses several selected databases, causes searches to be done by their search engines, and then compiles the results for the patron. Essentially, the network is acting as a PARALLEL PROCESSOR, uniquely created for searching across multiple sources that do not have standardized data formats and are geographically dispersed. The user no longer has to know ahead of time which preprint server holds the information he seeks.

Another way is indexing web pages. We work with the owners of these servers. A new service is an "Alert" service that will be operational by August. If you know about servers that we have over overlooked, let me know.

The parallel processor searching capability is the same information technology used in another OSTI product, EnergyPortal search, a special search feature within our Virtual Library Collections of Energy Science and Technology. The implications for building inexpensive distributed digital libraries are truly profound.

PubSCIENCE

"PubSCIENCE" was developed to facilitate searching and accessing peer-reviewed journal literature in the physical sciences and other energy-related disciplines.

Following the path forged by the National Library of Medicine with its life sciences product, PubMed, OSTI determined that the new Web technology could be used to integrate citations and abstracts into a searchable database, utilizing hyperlinks to take patrons to the publishers' doorstep where full-text information could be obtained. In assessing the need for such a collection in the physical sciences, OSTI worked closely with the American Physical Society. No comparable commercial product was available. PubSCIENCE filled a void.

An exciting feature of PubSCIENCE is that its citations are compiled in a new way! Collaborating publishers contribute their citations based on agreements negotiated with my Office.

PubSCIENCE allows the patron to search across abstracts and citations of multiple publishers at no cost to the patron. The patron need not know ahead of time which journal has the information she seeks. Once the patron has found an interesting abstract, a hyperlink provides access to the publisher's server to obtain the full text article. The article will come up immediately if the patron or his/her organization has a subscription to the journal. If the patron lacks such a subscription, access to the full text can be obtained by pay per view, by special arrangement with the publisher, library access or through commercial providers.

OSTI's primary patrons are scientists at the DOE system of National Laboratories. PubSCIENCE is particularly attractive to such large institutions, as they are increasingly using site licenses to bring full-text journals to their scientific staffs. For example, Los Alamos National Laboratory has site licenses to well over 2,000 journals. At any institution that has a site license hosted at a publisher's server, the hyperlinks to full-text in PubSCIENCE are automatically live.

Right now, PubSCIENCE has 1,048 different journals with bibliographic records for 1.8 million articles.

It is available to the public by a collaboration with the Government Printing Office.

Working in partnership with GPO and 21 publishers, PubSCIENCE was unveiled at a ribbon-cutting event with the support of Energy Secretary Bill Richardson and the Superintendent of Documents Fran Buckley.

For the future, OSTI plans to continue to partner with journal publishers to add more titles to PubSCIENCE, consistent with the scope of the DOE R&D program.

Last year, the President's Information Technology Advisory Committee (PITAC) presented a report envisioning the ways in which information technology could transform how we conduct research. The committee foresaw a time when all scientific and technical journals will be available online and completely searchable. PubSCIENCE is a step toward realizing that committee's vision.

DOE Information Bridge

The progression to electronic dissemination has been under way at DOE for some time. The first significant advance actually occurred in late 1997 with grey literature. That was when we expanded access from bibliographic data to full text for grey literature and then made available it on the Internet free-of-charge. I am speaking of the DOE Information Bridge which was introduced to the public in April 1998 in partnership with GPO. By scanning full text of DOE-sponsored grey literature including technical reports and conference papers, each word of each report is searchable. There are now over 55,000 digital items and over 3.5 million searchable pages. In 1998 DOE was among the first to undertake such a significant change in direction. Other agencies are doing this, too. In particular, NASA is now routinely making documents available on the NASA Technical Report Servers, and EPA and DTIC have web sites available as well. We are now working together to form a "Gray Literature" network that would provide one-stop access to these as well as DOE's.

Science Communication Trilogy

With the addition of preprints to OSTI's suite of Web products and services, the trilogy of ways by which researchers make their results known are now accessible on the Web:

grey literature (noncommercially published literature), through the DOE Information Bridge;

journal literature, through PubSCIENCE; and

preprints, through the PrePRINT Network

Each of these is a vast virtual collection. Whereas a few years ago, scientists communicated their findings primarily by two methods: grey literature and journal literature, they now have preprints as an increasingly popular third way to communicate.

My personal view is that this mix of three ways by which scientists communicate their findings will persist far into the future.

I like to tell folks that our aims are simple. We aim to be FIRST in grey literature, FIRST in journal literature, FIRST in preprints, and FIRST in the hearts of our researchers.

Each way has its own set of strengths and weaknesses. That is why we at DOE have determined not to mix products. Journals are kept separate from grey literature, and both are kept separate from preprints. Users have a distributed search system to pulse all systems with one query, if they choose, but we do not want users to lose sight of the type of literature they are viewing.

If users desire, we do offer a distributed search. OSTI recognized early on that distributed searching was the "Holy Grail" (a quote taken from a Federal Computer News article on EnergyPortal) of the Information Age. The 25 most popular of EnergyFiles' 500-plus databases, including DOE databases, are integrated into the EnergyPortal search, thus providing a real one-stop shopping interface to diverse collections residing around the world. Try it sometime ... You'll be amazed at the quick response time and functionality.

Where Is This All Leading?

The progress we have achieved so far has made us think about institutionalizing all this. We are exploring a concept for a Future Information Infrastructure for the Physical Sciences. The topic is being discussed by our various stakeholders and partners. Many attributes of the currently established National libraries are already in place at DOE in regard to the collection and management of scientific and technical information. We would like to establish the importance of this information to the nation and guarantee information preservation for future generations.

A key concern of the information community is permanent public access to information.

This is the best way to promote permanent public access to government information, a place where researchers, educators, students and citizens can come for answers. Our vision is to have DOE National Laboratories, Program Offices, GPO, other federal agencies, universities, publishers, and libraries working together to accomplish mutual goals in the advancement of science. Given our recent history and our dedication and drive, I believe this can be done.