Building Knowledge Assets for the Advancement of Science

Dr. Walter L. Warnick, Director
DOE Office of Scientific and Technical Information

April 12, 2000

E-Gov Knowledge Management Conference
Washington, DC

[Building Knowledge Assets for the Advancement of Science]

My Office of Scientific and Technical Information, OSTI, has been managing DOE's technical information program for over 53 years. The primary focus at OSTI is collecting, disseminating, and preserving scientific and technical information for DOE and the scientific community.

I confess that I had some difficulty at first figuring out what the phrase "knowledge management" means. The term has not yet caught on in scientific circles, at least as far as I know.

As I read more and more about knowledge management, I came to realize that it is a new name for what the science community has been doing for a long, long time. In fact, a working definition of science might be, simply, the management of knowledge resulting from observational and experimental evidence. One could well argue that the science community has been doing knowledge management for centuries.

When scientists do science, the principle deliverable is information. The more science is done, the more information is accumulated.

But science knowledge is more than the mere accumulation of information. The information must be communicated to the people who need it.

But science knowledge is still more. There had to be a way to evaluate the reliability and importance of information. This necessitated careful methods of peer review, and the birth of scholarly publishing. This began, literally, centuries ago.

Only that information that survives peer review is deemed worthy, and it is only this fraction of information that is assimilated and integrated into knowledge.

Yes, communication of reliable information is the building block of science knowledge, but to develop knowledge, information must be integrated. That poses a logistical difficulty. The trouble has always been too much information to digest. No single person could be expected to be aware of all, or even a significant portion, of science output. There had to be a way for those who need specific information to find those specific items from among the huge mass. Thus were born careful methods to create subject categories, key words, abstracts, indices, which, altogether, are now termed metadata. The careful compilation of metadata has been going on for many, many decades, although the term metadata is relatively new.

So science knowledge management involves accumulation of information, communication of information, evaluation of information, and some mechanism by which it can be organized and retrieved so that it could be integrated. These are precisely the functions described by Joe Firestone in the previous session.

Science knowledge increases over time, monotonically, so that each generation has access to, and uses, more science knowledge than the previous generation.

The system worked.

Enter the Information Age. In no way has the purpose of scientific knowledge management changed, but the Information Age is very much changing everything about the mechanisms by which science information is accumulated, communicated, evaluated and organized and retrieved so that it may be integrated to become knowledge.

I am here today to tell you how Information Age technology is transforming science knowledge management at DOE. In many ways, we think we are at the forefront of change.

[Bill Richardson Quote]

First, let me tell you a bit more about the Department of Energy. Many people do not realize how diverse the Department of Energy is and how vital it is to the research base of the United States.

The Department invests $7 billion annually in its R&D programs. The primary thrust is the physical sciences, such as physics, materials, and geology.

The R&D programs are performed by a system of National Laboratories - including Brookhaven, Argonne, and Oak Ridge National Labs. There is a group here from ORNL.

OSTI's mission has been the same throughout the years - to collect, disseminate, preserve and manage this information. It has always been the case that the more people who have access to our R&D results, the better. However, the manner in which we do business has changed drastically. We are adopting Information Age technologies as rapidly as we possibly can.

The change in technology has allowed us to reach a much greater share of our audience. In the old Paper Age, the audience for our R&D results was primarily DOE itself, with small forays into the public. Now, we have gone from being DOE-focused to sharing knowledge with everyone.

[The universe of DOE scientific and technical information collections]

In the past, DOE produced a comprehensive bibliographic database, Nuclear Science Abstracts. Here is a copy of the NSA publication.

Bibliographic data in Nuclear Science Abstracts not only led the user to specific information, but RELIABLE information, suitable to be integrated and become knowledge.

OSTI's entire operation at one time revolved around this one product. It was comprehensive in the nuclear sciences.

Today, OSTI has an entire universe of information dissemination collections They have all been developed over the past few years and are available to the public over the Internet.

[The trilogy of scientific and technical communication]

There are three main ways by which scientists communicate their findings: journal literature, preprint literature and technical report literature also called grey literature. For each of the three ways, we have developed a vast virtual collection and made it available free on the Web.

We call this the trilogy of scientific and technical communication. For journal literature, we have PubSCIENCE. For preprint literature, we have developed the PrePRINT Network. And for grey literature, we have developed the DOE Information Bridge. Using these three products, either individually or collectively through a distributed search engine, researchers and other patrons can access research results in their field.

We are close to achieving what was only recently an impossible dream: conquering text in the physical sciences. We are not there quite yet, but we are very close. The foundation has already been laid.

[PubSCIENCE]

PubSCIENCE is a large compendium of peer reviewed journal literature with a focus on the physical sciences and other related disciplines. We copied the National Library of Medicine with its life sciences product, PubMed. PubSCIENCE allows patrons to search across a collections of journal articles. Right now, there are 1037 different journals with bibliographic records for 1.7 million articles. The patron need not know ahead of time which journal has the article he wants. PubSCIENCE receives the bibliographic records free from publishers. The PubSCIENCE database integrates publisher-submitted citations and abstracts into one searchable database. The database then utilizes hyperlinks to take users to the publishers' doorstep for the full text.

PubSCIENCE thus makes peer-reviewed journal literature accessible to the desktops of researchers as well as to the general public. PubSCIENCE was unveiled in October 1999 at a ribbon-cutting ceremony in Washington by Secretary of Energy Richardson. It has been well received by patrons.

[The PrePrint Network]

Preprints are widely used in certain scientific fields, particularly in physics and math. They are routinely posted directly by the author on a preprint server. My office does not operate any preprint servers. Rather, we have developed the PrePRINT Network which is a searchable gateway to 800 preprint servers run by other folks. Often, preprints are not peer-reviewed. In fact, their creation and posting often begins the peer comment process in a public forum. Our PrePrint Network went live on January 31, 2000. As opposed to PubSCIENCE where publishers send their citations to a central location, the PrePrint Network is a gateway to diverse preprint servers around the world. These servers are then searched in parallel by a distributed search engine. Our approach to preprints is to leave the information hosted at its own site but provide an integrated search and retrieval system across the sites. More about this search tool in a few minutes.

[The DOE Information Bridge]

The progression to electronic dissemination has been underway at DOE for some time. The most significant advance occurred when access was expanded from bibliographic data to full text for grey literature and then made available on the Internet free-of-charge. I am speaking of the DOE Information Bridge which was introduced in April 1998 in partnership with GPO. Through scanning full text of DOE-sponsored grey literature including technical reports and conference papers, each word of each report is searchable. There are now over 67,000 digital items and over 4 million searchable pages. In 1998 DOE was among the first to undertake such a significant change in direction. This project established DOE as a trendsetter in the dissemination of government information. It also began leading us down a path from which we have never looked back ... the path of reinvention.

Our aims are simple. We aim to be FIRST in grey literature, FIRST in journal literature, FIRST in preprints, and FIRST in the hearts of our researchers.

[The EnergyFiles Virtual Library Collections of Energy, Science, and Technology]

Finally we have brought all of this together with other electronic collections under our EnergyFiles Virtual Library Collection of Energy, Science, and Technology. The library was built several years ago, but a giant step was taken one year ago when we made it searchable through the distributed search engine I mentioned earlier. This is a Portal as discussed this morning by Ed Vitalos. To our knowledge EnergyPortal, the EnergyFiles distributed search, was the first distributed search applied to a government collection. OSTI recognized early on that distributed searching was the "Holy Grail" (a quote taken from a Federal Computer News article on EnergyPortal) of the Information Age. The 25 most popular of EnergyFiles' 450 plus databases, including DOE databases, are integrated into the EnergyPortal search providing a real one-stop shopping interface to diverse collections residing around the world. Try it sometime ... You'll be amazed at the quick response time and functionality.

[Others have noticed]

We at OSTI, and indeed those within the Department of Energy, believe we have responded proactively and effectively to the new electronic world while remaining true to the needs of the scientific community. Others have noticed as well. This viewgraph shows some of the awards and recognition OSTI has achieved for its multiple initiatives. Just to highlight a couple, OSTI has won 5 of Vice President Al Gore's Hammer Awards for re-inventing government. I quote, "Hammer Awards go to teams who have shown large impacts on customer service, bottom-line results, streamlining government, saving money and exemplary achievements in government problem-solving." In addition, OSTI was invited to submit 3 articles to the inaugural issue of Access America, Vice President Gore's online publication for re-inventing government. The Government Printing Office has bestowed two commendations to OSTI, another record, for improving public access to governmental information.

[Where Is This All Leading?]

The progress we have achieved so far has made us think about institutionalizing all this. We are calling this exploratory initiative DOE's National Library Initiative focused on Energy, Science, and Technology. Many attributes of the currently established National libraries are already in place at DOE in regard to the collection and management of scientific and technical information. We would like to establish the importance of this information to the nation and guarantee knowledge management for future generations. A national library is the best way to promote permanent public access to government information, a place where researchers, educators, students and citizens can come for answers. In support of this, our vision is to have DOE National Laboratories, Program Offices, GPO, other federal agencies, universities, and libraries working together to accomplish mutual goals in the advancement of science. Given OSTI's recent history, I believe this can be done.