Building Energy Science and Technology Digital
Collections
for a Physical Sciences Information Infrastructure
Karen J. Spence
Abstract
The U. S. Department of Energy Office of Scientific
and Technical Information provides a suite of innovative digital information
resources for a stronger America.
Included in these resources are world-class products that address the
three main ways by which researchers disseminate their findings: the DOE
Information Bridge (gray literature), PubSCIENCE (journal literature),
and the PrePRINT Network (preprints).
These products are key components of the suite of resources provided
through EnergyFiles, a virtual library of energy-related scientific and
technical information. Each product can
be searched individually or in parallel with other energy-related resources
using EnergyPortal, which is the groundbreaking distributed search
mechanism of EnergyFiles. This
history of success lays the foundation for OSTI’s new initiative, a future Physical
Sciences Information Infrastructure.
Introduction
The Department of Energy (DOE) is among the leading research agencies in the world, investing $7 billion annually in research and development (R&D). It is vitally important for research agencies to disseminate their information as broadly and as quickly as possible, providing access to data and information that fuels essential knowledge. For over 50 years DOE’s Office of Scientific and Technical Information (OSTI) has been collecting, preserving, and disseminating the Department’s scientific and technical information (STI). By utilizing Information Age technologies, OSTI has radically changed its information services and has developed a suite of award-winning Internet resources that bring science information to the desktop at no cost to the user. These resources provide easier, faster, cheaper, more complete, and more convenient means of accessing and using global STI by scientists, researchers, academia, industry, and the public.
One-stop shopping access to this suite of resources is
provided through EnergyFiles, a Web-based virtual library that provides
easy access to collections of both DOE and worldwide energy-related scientific
and technical information. EnergyFiles
contains a Deep Web search mechanism, EnergyPortal, that is easy to use,
integrates parallel searching, and retrieves information from heterogeneous and
geographically dispersed databases and Web sites. Key components of EnergyFiles are DOE R&D Project
Summaries (current research), DOE R&D Accomplishments (outcomes
of past research), the DOE Information Bridge (gray literature),
PubSCIENCE (peer-reviewed journal literature), and the PrePRINT Network
(preprints). These products have been
designed to provide remote access to billions of dollars of energy‑related
research performed by DOE and its collaborators.
DOE Information Bridge
The DOE Information Bridge (http://www.osti.gov/bridge) made
available in April 1998 in collaboration with the U.S. Government Printing Office
(GPO), contains DOE report literature from 1995 forward. It incorporates over 60,000 full-text
reports comprising almost 5 million pages.
It provides free, convenient, and quick access to full‑text DOE
research and development reports in physics, chemistry, materials, biology,
environmental sciences, energy technologies, engineering, computer and
information science, renewable energy, and other topics. Users remotely access and download the
reports free of charge and in significant volume.
The DOE Information Bridge focuses on providing
access to scientific and technical reports produced by DOE, DOE national
laboratories, and DOE contractors. New
reports processed by OSTI are added routinely and legacy reports are added as
resources permit. Since its
introduction, the content of DOE Information Bridge has more than
doubled.
DOE Information Bridge search options include a basic approach that can be
concentrated on specific data fields and an advanced approach that includes
Boolean operators to increase search precision. Users can search the entire collection (full text and
bibliographic data), or they can search portions of it. Three formats, GIF, PDF and TIFF, are available
for viewing the full‑text page images.
Two formats, PDF (image only) and the original input format, are
available for downloading full‑text documents. This makes reports far easier to use and eliminates the
cumbersome and time-consuming practices associated with searching traditional
media.
Through the use of unique identifiers known as
Persistent URLs (PURLs), DOE Information Bridge makes it possible for
educators, students, scientists, and engineers to directly access individual
documents and to easily direct others to them.
Other new capabilities include a variety of search-result sorting
options, extensive date-range searching, and an option to view or download
full-text documents in Web or native formats.
The DOE Information Bridge is made available to
the public through a partnership between OSTI and the Government Printing
Office (GPO) on GPO Access (http://www.access.gpo.gov/su_docs). Building
and expanding the DOE Information Bridge reinforces DOE’s and GPO’s
commitment to make available DOE research reports and to move Federal programs
and activities into the ever-expanding world of the Information Age.
PubSCIENCE
PubSCIENCE (http://www.osti.gov/pubscience) was
developed to facilitate searching and accessing peer-reviewed journal
literature in the physical sciences and other disciplines of interest to
DOE. Made available in collaboration
with the Government Printing Office (GPO) in October 1999, it provides for
quick, easy, and free searching of a compendium of peer-reviewed journal
citations and abstracts about the physical sciences and other energy-related
disciplines. Hyperlinks provide access
to publisher servers to obtain full-text articles if the user or organization has
a subscription to the journal. If the
user lacks such a subscription, access to the full text can be obtained by pay
per view, by special arrangement with the publisher, by library access, or
through commercial providers.
Forty-one publisher agreements provide PubSCIENCE
users the capability to search and access almost 2 million records in more than
1300 journal titles of peer-reviewed scientific and technical information.
PubSCIENCE
is the convergence of recent advances in information technology tools (as
evidenced by the Internet), the re-engineering of traditional DOE products and
services, the awakening interests of scientific journal publishers to utilize
the Internet, the information needs of the DOE research community, and the
desire of the GPO to work with other agencies to make electronic government
information and tools available to the public.
Not only is the Internet changing the way publishers
are thinking about publishing, but it has impacted how government views its
role in the dissemination of scientific and technical information as well. PubSCIENCE is an outstanding example
of converging interests of the user’s desire to access current scientific and
technical literature, the Department’s desire to facilitate the flow of
peer-reviewed scientific and technical information, and publishers’ interest in
obtaining the widest possible visibility for their published materials.
PrePRINT Network
The PrePRINT Network (http://www.osti.gov/preprint)
was unveiled in January 2000 and is a searchable gateway to preprint sites that
contain information about scientific and technical disciplines of concern to
DOE. Such disciplines include physics,
materials, chemistry, and portions of biology, environmental sciences, and
nuclear medicine. Collections and
resources in the PrePRINT Network are provided by academic institutions,
government research laboratories, scientific societies, private research
organizations, and individual scientists and researchers.
The PrePRINT Network expedites the dissemination
of scientists’ research results. It is
Web-based and provides access to energy-related papers, draft journal articles,
and other electronic materials produced by researchers. It provides links to more than 2000 preprint
sites housing over 340,000 documents.
Over twenty heterogeneous preprint databases are available for
distributed cross searching via a single query. In addition, the PrePRINT Network provides links to over
600 related scientific societies and associations.
The PrePRINT Network offers users three options
for locating information. Users can
browse or search one specific preprint site or a selected set of sites. The Browse option allows users to view an
alphabetical listing of all of the sites included in the system and to visit
any of the individual sites listed. A
second option for searching within the PrePRINT Network, Search Selected
Sites, allows users to pulse the search engines of selected preprint sites with
a single query. This search capability
then compiles the results and returns them to the users. Thirdly, the Subject Pathways option offers
users the ability to choose a subject area and browse preprint collections
including preprints posted by individual scientists on their own sites. In most cases, access to the full-text
information on the target sites is open, accessible, and free of charge.
OSTI recently launched PrePRINT Alerts, a
component of the PrePRINT Network and the first alert service that
harvests information from the Deep Web.
The underlying content of select Web sites and databases is searched
rather than only surface pages of Web sites.
This new capability allows users to register, create their personalized
profiles, and automatically receive weekly notification via e-mail of new
preprint information fitting the profile of interest.
Additional Digital Collections
In addition to this trilogy of products that addresses the three main ways by which researchers disseminate their findings, OSTI has built and developed complementing digital collections. These include:
•
DOE R&D Project
Summaries, which provides brief
descriptions of over 20,000 R&D projects currently ongoing within the DOE
•
DOE R&D
Accomplishments, which showcases
outcomes of past DOE research and development that have had significant
economic impact, have improved people’s lives, or have been widely recognized
as a remarkable advance in science
•
OpenNet, which covers the DOE legacy collection of
declassified documents and has been developed and maintained by OSTI for the
DOE’s Office of Declassification
•
Subject Portals, which are electronic subject-based collections
sponsored by DOE Programs
•
Federal R&D
Project Summaries, which was
developed as a proof of principle to demonstrate the value of having a portal
to information about Federal research projects
•
GrayLIT Network, which provides access to full-text technical reports
from DOE and partnering
Federal agencies
DOE R&D Project Summaries (http://www.osti.gov/rnd)
was unveiled in June 1997 to provide the public with access to key corporate
information on over 20,000 research and development projects performed since
1995 by the Department’s laboratories and other research facilities. It includes DOE research activities in a
wide variety of energy-related scientific disciplines. R&D Project Summaries enables DOE
to educate and inform the general public of its current research and
development activities and provides a mechanism for public access to
information about Departmental research capabilities and activities.
The DOE R&D Accomplishments (http://www.osti.gov/accomplishments) Web site showcases the
proud heritage of the Department’s research and development and highlights
benefits that are being realized now.
It was unveiled in March 1999 as a central forum for providing the
public with information about outcomes of past DOE‑sponsored or generated
research and development. The outcomes
featured have had significant economic impact, have improved people’s lives, or
have been widely recognized as a remarkable advance in science. The core of the Web site is the DOE
R&D Accomplishments Database, consisting of searchable full-text and
bibliographic citations of documents reporting accomplishments from DOE and DOE
contractor facilities. Complementing
the Database is a page of "Snapshots." It contains links to items or articles that contain information
about or identify at least one research and development accomplishment. Snapshots are quick pictures, introductions,
overviews, or synopses. When more information about a Snapshots topic is
available via the DOE R&D Accomplishments Database, links to
full-text reports are identified and provided.
OpenNet (http://www.osti.gov/opennet)
provides easy, timely access to recently declassified DOE information,
including information declassified in response to Freedom of Information Act
requests, and makes it more readily available to the public. It includes references to all documents
declassified and made publicly available after October 1, 1994, and supports
the processes envisioned by the Openness Initiative of Public Awareness, Public
Education, Public Input, and Public Access.
Subject Portals, also known as ECAPs (Electronic Current Awareness
Publications) (http://www.osti.gov/ecaps), are a collection of
bibliographic citations, broken out by subject area, from the Energy Science
and Technology Database (EDB). For DOE
reports, links are provided to full text.
These long-standing paper publications were recently transitioned to a
searchable Web product and are now being re-directed into specific Subject
Portals with additional features. This
migration is scheduled for completion in Summer 2001.
Federal
R&D Project Summaries (http://www.osti.gov/fedrnd) was released in 2000 and provides a unique window to
the Federal research community, allowing Agencies to better understand the
research and development efforts of their counterparts in government. It provides insight to the public in how its
investment in research and development is being used and supports full-text
single-query searching across more than 240,000 research summaries and awards
in databases residing at the Department of Energy, the National Institutes of
Health (NIH), and the National Science Foundation (NSF). The Federal databases available via this
tool are the DOE R&D Project Summaries; the NIH CRISP (Computer Retrieval
of Information on Scientific Projects) Current Awards; and the NSF Award Data.
GrayLIT Network (http://www.osti.gov/graylit) provides a portal for over
100,000 full-text technical reports located at the Department of Energy,
Department of Defense, Environmental Protection Agency (EPA), and National
Aeronautics and Space Administration (NASA).
Collections in the GrayLIT collaboration include the DOE Information
Bridge; the Defense Technical Information Center (DTIC) Report Collection; the
EPA National Environmental Publications Internet Site (NEPIS); the NASA Jet
Propulsion Lab Reports; and the NASA Langley Technical Reports.
EnergyFiles
The umbrella for this suite of resources is EnergyFiles
(http://www.osti.gov/energyfiles),
which was released in May 1997. It is a
Web-based virtual library that provides easy access to over 500 widely diverse
collections of both DOE and worldwide energy-related STI. EnergyFiles is a dynamic information
system that offers users, participants and contributors the opportunity to
leverage collections and capabilities and to maximize use of energy-related
scientific and technical information.
The EnergyFiles search mechanism, EnergyPortal
Search, provides for increased site efficiency and ease of knowledge
discovery. EnergyPortal has
conquered a major obstacle confronting multi-source virtual libraries. It is a unique search capability that
provides distributed searching across the decentralized, heterogeneous
databases and web sites linked to EnergyFiles. The user no longer needs to select individual links to sift
through available information in pursuit of what is relevant. Words or phrases are entered in a single
query box and the query is distributed in parallel to the user-selected
multiple databases and Web sites residing at diverse locations.
EnergyPortal Search continues to represent a breakthrough in information retrieval. It enables users to search across 26
databases and 500 Web sites using a single query. The sites are maintained by various agencies, are geographically
dispersed, and require no standardization in terms of format, software or
metadata. EnergyPortal will
search full text when available; DOE databases and collections; databases of
other agencies such as the Defense Technical Information Center (DTIC), the
National Aeronautics and Space Administration (NASA), the National Library of
Medicine (NLM) and the Environmental Protection Agency (EPA); and other
resources. When the individual database
supports it, the searched word or phrase is highlighted for easy access.
This distributed search capability demonstrates an
essential next step in information technology - the integration of parallel
searching and retrieving of information from disparate and geographically
dispersed databases and Web sites. The EnergyPortal
distributed search transcends other government agencies’ full-text information
sources. Since it includes only unlimited, unclassified energy-related
information, users in government, industry, academia and the public benefit
from the addition of this capability. Time is saved through more efficient and
effective information retrieval since the information is accessible on the
Internet in an organized, searchable format.
Physical Sciences Information Infrastructure
For more than 50 years, studies have called for a comprehensive national resource for finding, understanding, and using information about our physical world. A workshop was convened May 2000 at the National Academy of Sciences for representatives from scientific agencies, academia, professional societies and the private sector to address issues and gaps in communicating and using information in the physical sciences.
The Workshop resulted in the endorsed vision of a Physical Sciences Information Infrastructure to integrate the whole of science to provide a basis to improve society, the economy, and the environment. Findings in the Workshop report (http://www.osti.gov/physicalsciences) support the need for:
· An Integrated Network to provide comprehensive access and facilitate the reuse of worldwide sources of physical sciences information,
·
A Point of
Convergence for ensuring the
awareness, availability, use, and development of information technologies and
tools, and
· An Openly Available Source of information to serve all users with tools to assist them in their quest for information and ultimately knowledge.
The report also emphasized the need for an interagency, academia, professional society, and private sector collaborative effort. OSTI’s leadership in the collection and sharing of worldwide scientific and technical (S&T) information and OSTI’s expertise in information, technology, and the scientific disciplines have laid the foundation for the Physical Sciences Information Infrastructure. OSTI is now pursuing partnerships and consortia agreements for a collaborative implementation strategy for the Physical Sciences Information Infrastructure, which will provide comprehensive access to reliable worldwide information; a managed set of information resources and tools; timely access to ongoing research; and support of multi-organizational collaborations. Resource requirements, partnership arrangements, and numerous other planning activities are currently being explored as support for this initiative continues to grow.
for a Physical Sciences Information Infrastructure
Author Contact Information
Karen J. Spence
Assistant Director
Office of Program Integration
U. S. Department of Energy
Office of Scientific and Technical Information
Oak Ridge, TN 37831
Phone: (865) 574-0295
Fax: (865) 241-3826
spencek@osti.gov
Biographical Sketch for Karen J. Spence:
Karen J. Spence is the Assistant Director for the Office of Program Integration for the U. S. Department of Energy Office of Scientific and Technical Information (OSTI). Among her responsibilities, she provides leadership and coordination for the Department‑wide Scientific and Technical Information Program and assures access to the energy-related information which supports the Department of Energy mission. Her duties include coordination with appropriate Departmental organizations, other government agencies, and private and domestic organizations. Ms. Spence is also responsible for OSTI strategic and operational planning, policy development, product management, promotions/marketing, and customer advocacy. She has more than 15 years of government and private industry experience. Throughout her career, she has held numerous management positions related to scientific and technical information. Her knowledge of information science and information management activities has contributed to the development of innovative programs and mechanisms for public access to Government information. She also actively participates in national organizations and groups that promote better management of government information. She has received numerous honors and awards during her career and continues to be a leader in promoting community service activities. She holds a Masters Degree in Library Science from the University of Alabama.