The emergence of systems biology as a research paradigm and approach for DOE missions has resulted in dramatic increases in data flow from a new generation of genomics-based technologies. The heterogeneous mix of data and information emanating from the Genomic Science program includes functional descriptions assigned to DNA sequence, molecular interactions, images of molecules or physical structures within a microbe or plant, and details about the environment in which these organisms live. The Genomic Science program's ultimate goal of achieving a predictive understanding of biological systems will require integrating and comparing this immense amount of data, which span diverse environmental conditions, spatial scales (nanometers to kilometers), and temporal scales (nanoseconds to decades). To address these data-intensive computing challenges and serve the research community, DOE funded the development of a Systems Biology Knowledgebase in 2011. (See KBase.)
A knowledgebase is a cyberinfrastructure consisting of a collection of data, organizational methods, standards, analysis tools, and interfaces representing a dynamic body of knowledge. A knowledgebase will support open community science by serving as a freely available computational environment for sharing and integrating diverse biological data types, accessing and developing software for data analysis, and providing resources for modeling and simulation. It will leverage community-wide capabilities, experimental results, and modeling efforts and bring together research products from many different projects and laboratories to create an extensible, comprehensive cyberinfrastructure focused on DOE scientific objectives related to microbes, plants, and metacommunities (complex communities of organisms). Several recently completed and ongoing projects are contributing to knowledgebase development [see figure above (click to enlarge)].
A fully functional knowledgebase is envisioned not only to include storage, retrieval, management, and integration of systems biology data, but also to enable new knowledge acquisition and management through free and open access to data, analytical software, modeling tools, and information for the research community. The vision and justification for a knowledgebase were described in detail in the Systems Biology Knowledgebase for a New Era in Biology workshop report (see also the DOE Systems Biology Knowledgebase brochure PDF). Envisioned capabilities included:
Learn more about the anticipated benefits of the DOE Systems Biology Knowledgebase on the Why KBase? page.
The success of a knowledgebase will rely largely on its ability to meet the dynamic information needs of different user communities and the willingness of these communities to support open sharing of data, science, and software (see figure above, click to expand). When research data and information are not publicly available to the scientific community, a corresponding price is paid in missed opportunities, barriers to innovation and collaboration, and lost productivity resulting from inadvertent repetition of similar work.