An Interagency Model for Collaboration and Operation
Background - Relationship to CENDI - Funding - Operations - Milestones
PowerPoint file of presentation (11-MB file) and Print View
-
An Interagency Model for Collaboration and Operation
Background - Relationship to CENDI - Funding - Operations - Milestones
CENDI Meeting, Nov. 4, 2010
Sharon Jordan
Assistant Director
DOE Office of Scientific and Technical Information
(Operating Agent for Science.gov) -
What Is Science.gov?
A Unique Collaboration with Tangible Results!
• An interagency science discovery tool, providing single-query access to multiple government-sponsored R&D results and other S&T information
• A cross-agency search that integrates and simplifies access to 200 million pages of content from 14 U.S. science agencies
• The "USA.gov" science portal (formerly "FirstGov for Science")
• A voluntary large-scale collaboration of U.S. government agencies
Drills down to selected databases and websites in parallel, then presents relevancy-ranked search results -
How Did It Begin?
• Two workshops spawned origin:
• 2000: Blue-ribbon panel explored concept of a physical science information infrastructure. http://www.osti.gov/physicalsciences/wkshprpt.pdf This prompted interagency involvement.
• 2001: "Strengthening the Public Information Infrastructure for Science" http://www.science.gov/workshop/index.html Here the interagency Science.gov Alliance was formed
• Participants included federal agencies, academia, information professionals and science experts.
• Science.gov gained approval as "Firstgov for Science" in early 2002
• Science.gov was launched in December 2002. -
Founding Agencies in 2001
• Department of Agriculture
• Department of Commerce
• Department of Defense
• Department of Education
• Department of Energy
• Department of Health and Human Services
• Department of Interior
• Environmental Protection Agency
• National Aeronautics and Space Administration
• National Science Foundation
New Alliance Members
• Department of Transportation
• Library of Congress
• United States Government Printing Office
• National Archives and Records Administration
Alliance only
• United States Forest Service
• National Institute of Standards and Technology
Support and coordination by CENDI – an interagency forum of senior information managers -
Shared Premises
• Science is not bounded by agency, organization or geography
• Each agency has vast stores of information that fulfill its mission
• A single web gateway is the tool of choice*
• A commitment to voluntary collaboration is necessary
*In OCLC Perceptions of Library and Information Resources, it was reported that 84% of public began search using search engines; only 1% began with online databases. Thus a "Google-like" easy search of authoritative sources with relevant results was desired.
-
Integration Challenges
• Broad scope of Federal science and technology research and development missions
• Wide-ranging interest of potential audiences
• Information organization (taxonomy) issues given the broad scope of disciplines and audiences
• Blending information resources from different agencies into cohesive functionality and page design
• Politics, human resources, funding, sustainability -
Guiding Principles for Content
√ Select authoritative web-based government-sponsored information resources
√ Rich science content, not merely organization pages
√ Databases contain primarily R&D results in the form of STI (bibliographic data and/or full documents)
√ Supplemented by websites for currency
√ Only freely available content that is well maintained
√ Our audience is "the science-attentive citizen"! -
Agency Potluck
• Agencies brought to the Internet table their unique information specialties and resources
• Flagship service a commitment
• Notable contributions of many:
• Science.gov Alliance and CENDI - seized opportunity without mandate
• FirstGov.gov - supported the early stages with advice and two grants
• Member agencies - provided participation of 200 staff members to working teams
• NLM – provided usability testing prior to initial launch
• USGS – managed original website search engine (surface web search)
• NTIS - created initial catalog of S&T websites
• IIa Inc. – provided secretariat support (CENDI special task)
• DOE/OSTI - conceived idea, developed technologies/deep web search and hosted website
• NAL and USGS – provided Science.gov Alliance co-chairs -
Collaboration Is Key
• Alliance enjoyed extraordinary voluntary collaboration
• Vision and strategic direction provided by Alliance principals
• Administration provided by Chair(s) selected from Alliance
• Technical team provided original technical direction and recommendations
• Major support provided by CENDI
• Additional task groups formed as needed
• Science.gov taxonomy
• Content guidance and development
• Website management and redesign
• Outreach activities
• Enhancement development
• Subject expansion
• Image library -
The Funding Approach
• Built and maintained with "in-kind" contributions: each agency's staff time and existing information resources
• Initial development benefitted from CIO Council e-gov grants for catalog + initial deep web search
• Alliance annual dues help fund routine operations
• CENDI support leverages resources
• In-kind contributions supported special events
• SBIR R&D resulted in innovations that were implemented in subsequent versions
• "Pass the hat" contributions to take advantage of an opportunity, such as Version 3.0 development
-
Science.gov Funding
Doing "a lot with a little" by implementing creative funding methods
• 2001: Cross agency portal grants: $170,000
• 2002: DOE SBIR conducts relevancy ranking research
• 2003-2004: Voluntary Pass-the-Hat contributions: $200,000
• 2001-Present: Participating agencies and in-kind support develop and maintain Science.gov. Average since 2005 = approx $180K annually (fees plus in-kind support) -
CENDI
• CENDI promotes the productive intersection of science content, technology and interrelationships
• The Alliance, made up of CENDI agencies plus others, provides direction and support for this intersection in the form of Science.gov
• Through financial and in-kind commitments from its agencies, CENDI provides the ongoing infrastructure needed to offer a large-scale
collaboration across organizational boundaries -
Overview of CENDI Finances
Total Membership Funds Are Combined into One "Pot"
CENDI Reserve
Executive Secretariat for CENDI Includes Science.gov Support
Maintenance Costs include Alliance Only dues*
A portion of Secretariat effort is used for Science.gov Tasks
*Science.gov Alliance Only dues are deposited into the CENDI treasury, with option of being used for direct costs/purchases for Science.gov (such as exhibit expenses) or being included in funding for overall Secretariat support of Science.gov. -
Content Management Is Distributed
• NTIS developed the original "catalog" with input from agencies
• CENDI Secretariat now maintains catalog with agency participation
• Agency content managers submit and edit their information via a web form
• Websites identified in the catalog were indexed by USGS; now done by OSTI
• Deep web databases are identified by agencies and reviewed by team for suitability
• Real-time search of content in large databases is maintained by OSTI, which continues to host the website and serve as operations manager -
The Alliance Members' Page
Provides administrative information, meeting minutes, usage statistics, content selection and cataloging guidelines, subject category information, and outreach materials such as presentations and flyers.
-
Metadata Input System: For Websites in Searchable Index ("Surface Web" portion of Science.gov)
Provides Alliance members and content managers a secure tool to quickly retrieve Agency metadata, add or edit resource records, and expedite the maintenance and quality control of the metadata and URLs. -
Development Milestones
• Science.gov Phase 1 (2001-2002)
• Established policy & governance, technical design teams
• Agreed on goals, policies, website look & feel
• Created taxonomy
• Selected, cataloged and indexed agency resources
• Version 2.0 launched May 2004
• Introduced relevancy ranking of metasearch results
• One-step search across ALL databases
• Added advanced search
• Version 3.0
• Enhanced precision searching, metarank & boolean/fielded searching
• Other types of science content explored
• Version 4.0
• Enhanced relevancy ranking, also full-text relevancy ranking -
Development Milestones
• Version 5.0 (Sept 2008)
• Clustering of results by subtopics or dates to help target your search
• Wikipedia results related to your search terms
• EurekaAlert News results related to your search terms
• Mark-and-send option to email results to friends and colleagues
• More science sources for a more thorough search
• Enhanced information related to your real-time search
• New look and feel
• Updated Alerts Service
• Standardized citation formats available for download
• Version 5.1 Aggregated news feeds from 11 science agencies
• Internships and Fellowships section made searchable
• Image Search Library (Coming soon!) -
Science.gov Today
Science.gov: Finds Content from 200 Million Pages at 2100+ Websites and 42 Databases with One Query
• Searches selected websites ("surface web") and databases ("deep web") from one search point
• Combines results from all sources, ranks and displays by relevance and clusters
• Sends weekly "alerts" for user-defined topics of interest
• Displays related Wikipedia and EurekAlert items
• Provides browsing of selected websites
• Displays an integrated news feed from science agencies
• Links to special collections and other information
• Featured search and sites highlight hot topics -
42 Large Scientific Databases
-
Easy-to-Use Search
Get the simplicity of a "Google-type" search box; get results that are not "Google-like" at all.
Less than 1% overlap with Google; approximately 3.2% overlap with Google Scholar -
Precise, Accurate Results
-
More About Science.gov You May Not Know
• Goes where traditional search engines cannot go. Full-text documents if searchable on the target site are searchable via Science.gov.
• Real-time search: If a target database adds a document or record, it is available on Science.gov immediately
• During the query, the most-relevant documents or records from each source are gathered – approx 100-200 from each source – and then the combined set is relevancy ranked
• Topic and date clusters for search results – subtopics, publication years displayed on-the-fly to enable efficient drilling down -
Usage Continues to Grow
Science.gov Page View Totals (Dec 02 - Sep 10)
FY10 - 5,166,126
FY09 - 4,074,747
FY08 - 2,946,801
FY07 - 2,591,717
FY06 - 2,593,449
FY05 - 1,793,483
FY04 - 965,146
FY03 - 751,180 -
Notable Achievements
• Large voluntary collaboration between agencies is often cited as a model
• Collaboration AND infrastructure served as model for WorldWideScience.org; then Science.gov became U.S.'s contributed content
• Also a model for ScienceEducation.gov
• A top 10 Google result for "science" with other major science outlets
• Provides core project for spin-offs such as Science Internships, Aggregated Science News, Science Image Search – and more!
-
Science.gov In the News
Science.gov is among 10 government websites "meeting and exceeding" the Obama Administration's transparency goals, according to a special report by Government Computer News, released July 27, 2009. -
Real Time Search?Relevancy Ranked?All Govt. Science?Known Sources?Scholarly Info?Ads?
science.gov 5.0 X X X X X WorldWideScience.org X X X X X Google Scholar BETA X X X Google X X -
Content and Purpose: Science.gov vs Data.gov
• Searches for science topics at the full record level
• Ease of searching, with immediate, useful results
• For the science-attentive citizen including researchers, teachers, students, business people, and the general public
• A Google-like interface with an advanced option for power users
• Drills down into the "deep web"
Examples:
• 2668 results for diabetes from 35 sources;
• 2772 results for climate change from 38 sources
• Searches at the source level only, not at the record level
• Interface with search results pointing only to sources or databases
• Emphasizes machine-readable datasets, available in raw formats; some files are quite large, ranging up to hundreds of megabytes
• Data generally requires additional manipulation; of limited use to general public. Expect public interest groups, reporters, academics, and others to review information, build interfaces, and report on findings
Examples:
• Zero results for specific terms such as diabetes
• One result (database pointer) for climate change -
science.govdata.govReady to use info. with user friendly interface?
X Record level information?X Science research and results only?X Information from multiple agencies?X X Repository of datasets and tools?X Provides pointer to database/source?X X -
√ A perfect platform on which to launch new technologies
• Access to new forms of STI
• Translation
• Precision searches
• Image searching
Current Science.gov
Prototype
Thumbnail panels: