Enabling Scientific Discovery through a Science Information Infrastructure

Walter L. Warnick, Ph.D, Director
DOE Office of Scientific and Technical Information
http://www.osti.gov
March 18, 2002
AITC 2002

Slide Presentation


[Slide 1/Intro - 'right info, place, time' quote]

[Slide 2 - 'shared knowledge' quote]

Traditional Mission

All scientists well appreciate that scientific progress is enabled only if knowledge is shared. Isaac Newton said it best 300 years ago, "If I have been able to see further, it was only because I stood on the shoulders of giants." Newton credited his own incredible advances to the knowledge he obtained from others. Today, the same notion is reflected in numerous laws that make the sharing of knowledge a mission of DOE. My Office of Scientific and Technical Information (OSTI), in collaboration with many of you here today, does its best to advance that part of the DOE mission.

Sometimes we in DOE tend to lose sight of the fact that, at the most fundamental level, the purpose of the DOE R&D enterprise is not to fund researchers. Rather, the fundamental purpose of the DOE R&D enterprise is to make progress on key science topics. Funding researchers is but the main mechanism by which the more fundamental goal of science progress is pursued.

Researchers explore science questions and make discoveries. But the R&D cycle is incomplete until the knowledge thus gained is shared. The STI community here at this conference assists in the sharing of knowledge.

New Way

Periodically throughout history, there have come along technological developments that have transformed society: the railroads, the telephone; electricity, the automobile. I believe that we are now on the front edge of one of those transformational technologies-the Internet.

I say "front edge," for what we have seen so far is barely the beginning. As broadband becomes ubiquitous--and by broadband I mean 100 megabits per second, not the 1.5 megabits per second which is the speed of DSL--its impact will be profound in ways that cannot now be fully anticipated. It is impossible to predict how millions of intelligent people will apply this wonderful new technology to the problems they face. All kinds of human interactions will be changed, from the location of the workplace, to the design of the workplace, to all manner of education and entertainment. The way that scientists share knowledge will also be transformed.

This is where all of us come in. The scientific and technical knowledge community is working toward the new future. We are building on our traditions and proceeding one step at a time. Together, we are making progress that I will review with you today.

We hold these truths to be self-evident: Scientific progress is our common goal; and science cannot progress unless knowledge is first shared. Today I will talk about ways that OSTI and the Department's scientific and technical information (STI) community are working together to share knowledge, taking advantage of new technology. The nature of our business is changing faster and more profoundly than any other part of the DOE mission.

[Slide 3 - science.gov's Web debut]

Science.Gov: A Shared Vision

Earlier this month, March 2002, DOE and nine other Federal science agencies took a major step in information technology deployment when the interagency science.gov Web site went online in test mode. Championed by OSTI, science.gov provides a gateway to authoritative science information of U.S. Government agencies, including research and development results. It serves as the official FirstGov for Science portal in support of the Administration's e-government initiatives.

A formal launch is scheduled later this spring. Science.gov will raise the visibility and use of the results of DOE's R&D.

Science.gov began as a vision and gained momentum as ten Federal science agencies first began collaborating and then persevered in making the vision a reality. This is how it happened.

[Slide 4 - foundation/workshops]

In 2000, OSTI sponsored a workshop chaired by Alvin Trivelpiece and hosted at the National Academy of Sciences for furthering the vision for a future physical sciences information infrastructure. We at DOE believed that the deployment of current technology could create an inexpensive network of dispersed science resources. From that workshop, in which experts from science as well as information professions familiar with science communications participated, came a report that yielded a high-level vision.

Following the workshop, it was agreed that an interagency strategy was needed to achieve the vision. As various science agencies had much to offer, the vision shifted from the 'physical sciences' to the broader 'science'. Each agency would contribute content and other resources consistent with its mission. For DOE, our strength is the wealth of scientific and technical information, mostly in the physical sciences, that the DOE STI community, of which many of you are a part, has contributed over the last 50 years. DOE also is a leader in deploying new Internet tools.

To translate the Trivelpiece vision into a working strategy, a workshop was held at NIST in the spring of 2001. Federal agency representatives recognized the tremendous opportunities for making Federal agencies' science information more accessible and the value of making science information resources more useful to researchers, teachers, and learners wherever they are located. In particular, the concept of science.gov was endorsed as the interagency science portal or gateway whereby the agencies would collectively serve the science-attentive citizen. No longer would patrons need to know ahead of time which agency held which information. Recognizing that "the building blocks are available now" and there is "no need to wait - no need to experiment," the principal science agencies agreed to form the Science.gov Alliance.

Present time. As noted previously, Science.gov was launched in test mode earlier this month. Hosted and maintained by OSTI, the result is a science information infrastructure vision realized. Science.gov is the new interagency gateway to the science information of several Federal agencies. It is the recognized FirstGov for Science portal.

But the current incarnation of Science.gov is but the latest step toward a science information infrastructure. There is much more to be done.

[Slide 5 - Nature quote]

What We Have Accomplished

Evidence has shown that usage increases when access is more convenient. As Nature magazine noted:

"Articles freely available online are more highly cited. For greater impact and faster scientific progress, authors and publishers should aim to make research easy to access."

Nature, Vol. 411, No. 6837, p. 521, 2001

Making research easy to access has been our major motivation in the development of Science.gov.

I pause here a moment to add that the 9/11 attack has made us more aware than ever that there is a downside to information accessibility. Whereas we once asked whether information might be useful to hostile governments, and, if it was, we classified it, now we ask if it might be useful to terrorists.

Such concerns have caused DOE to limit public access to a quantity of information from the Internet. There is much controversy about this. Various senior elements of the government are working on giving us guidance.

In balance with these concerns is the Department's ongoing commitment to providing access to useful scientific and technical information resulting from the Agency's enormous investment in R&D. In honoring this commitment, we are pleased to celebrate success.

[Slide 6 - Success]

Once relying primarily on paper-based processes, the DOE STI community, of which many of you are a part, has successfully deployed digital technologies. We no longer use paper and microfiche for sharing R&D results. Now we use the Internet and digital technologies to assure STI is freely available at the desktop, free of charge and accessible to not only librarians, but also to scientists, researchers, students and teachers, industry, and the science-attentive public.

The transition to an electronic environment was adopted as a DOE goal in 1997, not just for OSTI but for the whole Departmental complex that creates STI. In that year, an agency-wide panel prepared a strategic plan. In addition to developing Web accessible information, we pushed for the input-to-output process to become fully electronic.

All of the DOE laboratories, over 25 in all, now provide their technical reports in electronic formats. DOE has also adopted fully electronic reporting for the financial assistance grants and contracts, with over 7000 active awards annually.

We have worked aggressively and collaboratively to achieve this goal. It was not a serendipitous occurrence; it required focused, coordinated efforts. I am proud to announce that, last year - in 2001, the Department achieved the "all electronic" goal 3 years ahead of the original target date. In fact, while here in Denver, those who have achieved this great accomplishment will be acknowledged at an event following this conference.

Furthering our electronic success, OSTI has streamlined metadata elements through the use of the Dublin Core Metadata Elements. Later this session, Mary Donahue of the National Renewable Energy Laboratory and Rita Hohenbrink of OSTI will talk more about how these streamlined elements are now being captured from NREL by OSTI using a harvesting approach (see presentation).

[Slide 7 - OSTI Web Tools Collage]

OSTI has developed and deployed a series of "E-Government" tools. Along with the DOE STI community, OSTI annually increases the Department's vast store of science and technology information accessible via the Web. In FY 01, we added over 200,000 new records to Web tools, including searchable full text of an additional 16,000 scientific and technical documents reporting DOE-sponsored research results.

Time does not permit me to review OSTI's Web Tools, nor those of the Labs and other DOE organizations, but you may see the former at OSTI's Exhibit during lunches and breaks in the Colorado Ballroom, corner space 10.

I will, however, mention one of the technologies OSTI is using to improve access, retrievability and discovery of science information in several of its Web resources.

[Slide 8 - deep web searching]

While most search tools focus on those web pages that are easily found and indexed by traditional web crawlers (called the "surface web"), OSTI is primarily focused on the "deep web". The term "deep web," refers to a vast repository of underlying content, such as documents in online databases, that general-purpose web crawlers cannot reach. Estimated to be 500 times larger than the surface web, deep web content has remained mostly untapped due to the limitations of traditional search engines.

An estimated 100 surface web pages are included in OSTI's web resources, but over 8 million deep web pages! To support the search and retrieval of deep web content, OSTI applied a novel distributed search tool. Distributed Explorit, a directed query engine application, was developed by Abe Lederman of Deep Web Technologies in collaboration with OSTI. Mr. Lederman's Explorit has since served as the cornerstone for additional OSTI Web tools requiring deep web searches and also serves as the search tool for Science.gov.

Bill Arms, who is one of the leading figures in information management, has noted that simple search algorithms applied to enormous collections can be a tremendous aid to human thought. Explorit is just such an algorithm.

By using this innovative technology, it no longer matters where the information resides nor what format it is in, and the patron need not know which agency posted the information. These factors no longer pose barriers to the process of information discovery. In a presentation tomorrow afternoon, Valerie Allen of OSTI will also discuss Deep Web searching, and tell about its application in science.gov (see presentation).

[Slide 9 - OAI]

What Next?

An advance looming on the horizon are arrangements like the Open Archives Initiative (OAI) which enlists simple metadata standards and Extensible Markup Language (XML) to allow information to be repurposed easily.

The next presenter in this session, Jackie Stack of Los Alamos National Laboratory, will provide more background and technical information about OAI (see presentation). However, I offer this about OAI's purpose and promise.


OAI provides an easy way to implement and deploy mechanisms to harvest and disseminate information. Further OAI is designed to provide a low-barrier approach to interoperability among publishers and numerous repositories of information.

By agreeing to common standards, one repository can be part of a larger community of repositories, thus increasing the value of the individual repository and the entire community of repositories.

Historically, each science.gov federal agency has used different formats and technologies intended mainly for their own digital libraries. Typically, these agency-specific systems are not designed to facilitate inter-agency information discovery, nor information sharing. This is a fundamental interoperability problem. It will not be resolved until we find a way where content owners-the federal agencies-can work together to harvest, share and archive our information resources. We need to adopt standards that will promote required interoperability.

Thus my interest in OAI and its usefulness in furthering the discovery of scientific information through the Science.gov alliance.

OAI deserves a serious look as a standard protocol for government agencies to harvest, share, and reuse information to enable good science and promote our Departmental and national interests.

[Slide 10 - 'discovery' quote]

In Closing

Having a lot of data is not the same thing as having the answers you need. Science.gov, the comprehensive information infrastructure for the physical sciences, is an important step toward fulfilling the need to have the right information at the right time and in the right format. Again, making this happen is no fluke. It takes collaboration and input on the part of many. The DOE STI partnership, the DOE IT community, and the Office of Scientific and Technical Information are making their contributions in advancing the state-of-the-art using the web to disseminate information resulting from DOE's R&D.

We have already come a long way. We have made progress by working together, both within DOE and with other agencies. On the other hand, the knowledge sharing business is changing fast, and there is much more to be done. Now that the first major steps are complete, our challenge now is to properly select among the myriad choices for our future direction. The promise of greater access, retrievability, and discovery of more and more knowledge is too great to be denied. While none of us in the STI community have perfect foresight, our prospects for the future are best if we continue to work together. I pledge OSTI's support to this end.