Research, Condition, and
Disease Categorization
Brief Description
On a yearly basis, NIH reports to the Congress and the
public how much money is allocated to approximately 360 research and disease
categories such as Parkinson's, mental health, diabetes, and cancer. Congress
and the NIH Office of the Director use this data to better understand NIH
research spending and priority areas. In the past, each institute and center
(IC) assigned their grants to these categories based on their own interpretation
of the category definition, and this led to inaccurate and incomplete reporting.
Congress, advocacy groups and the public have recently requested that the agency improve its manner for reporting NIH funding levels. As a result, the Research, Condition, and Disease Categorization (RCDC) project was created to develop a knowledge management infrastructure that standardizes and facilitates budget reporting by research topic. Text mining techniques have been implemented to classify NIH grant applications into proper research and disease categories. HPCIO collaborates with RCDC project team to identify and develop technologies and methodologies that can improve the RCDC categorization tool and enhance internal NIH communication processes.
The end result of this project will improve reliability and consistency of categorization across ICs.
Using previously labeled grants, a neural network
may be applied to find best-fitted category fingerprints, which can be refined
by domain experts if necessary. A dashboard has been implemented to monitor the
status of each category in the fingerprint development or maintenance process.
Each category has a file folder in the RCDC SharePoint site, where documents
can be stored to facilitate collaborations among ICs.
1.2.2
List of Collaborators
• Timothy Hays, Ph.D.; Project Director, RCDC; Chief, Portfolio Analysis and Scientific Opportunities Branch; Office of Portfolio Analysis and Strategic Initiatives; Office of the Director
• Sonja Gardner-Clarke, Portfolio Analysis and Scientific Opportunities Branch, Office of Portfolio Analysis and Strategic Initiatives, Office of the Director
• Deb Kassilke, Portfolio Analysis and Scientific Opportunities Branch, Office of Portfolio Analysis and Strategic Initiatives, Office of the Director
• Archna Bhandari, Noblis, Inc.; Portfolio Analysis and Scientific Opportunities Branch; Office of Portfolio Analysis and Strategic Initiatives; Office of the Director
Major
Accomplishments in FY 2007
HPCIO developed several systems and prototypes in FY 2007 to address various needs of the RCDC community. To increase the transparency of the fingerprint development process, we implemented a Web-based dashboard using .NET technologies to track the status of the fingerprints. The information conveyed by the dashboard allows IC coordinators to more easily determine what their IC experts need to do. In addition, the RCDC Portal has been migrated to a SharePoint 2003 site to facilitate file sharing and collaboration. We have conducted initial experiments to investigate the use of machine learning techniques to generate category fingerprints. HPCIO’s findings indicate that a perceptron can leverage labeled data to produce high-quality fingerprints that are comparable to the fingerprints generated by human experts. It takes considerably less time to generate a fingerprint using machines than by a committee of experts. This research has the potential to alleviate the labor-intensiveness of the fingerprint generation process and help ICs to develop fingerprints that they not otherwise have resources to make it possible.
Anticipated Major Accomplishments in FY 2008
In FY 2008, HPCIO intends to develop a neural network that utilizes the MATLAB Distributed Computing Engine to process large sets of fingerprints. More category fingerprints will be generated using the neural network. In addition, we shall develop better methodologies to evaluate the quality of the system-generated fingerprints. If the results are satisfactory, the network will be integrated into the RCDC system.
We shall implement additional functionalities to the dashboard and work the RCDC project team to incorporate it into the RCDC site. The RCDC SharePoint site will be migrated to 2007 version. Customizations will be made to the site to improve usability.
Metrics
Fingerprint
Status Dashboard |
|
|
324 |
RCDC
SharePoint |
|
|
108 |
|
3053 |
Automatic
Fingerprint Generation |
|
|
9 |
|
0.7469 |
|
< 1 min |