Research, Condition, and Disease Categorization

 

Brief Description

 

On a yearly basis, NIH reports to the Congress and the public how much money is allocated to approximately 360 research and disease categories such as Parkinson's, mental health, diabetes, and cancer. Congress and the NIH Office of the Director use this data to better understand NIH research spending and priority areas. In the past, each institute and center (IC) assigned their grants to these categories based on their own interpretation of the category definition, and this led to inaccurate and incomplete reporting.

Congress, advocacy groups and the public have recently requested that the agency improve its manner for reporting NIH funding levels. As a result, the Research, Condition, and Disease Categorization (RCDC) project was created to develop a knowledge management infrastructure that standardizes and facilitates budget reporting by research topic. Text mining techniques have been implemented to classify NIH grant applications into proper research and disease categories. HPCIO collaborates with RCDC project team to identify and develop technologies and methodologies that can improve the RCDC categorization tool and enhance internal NIH communication processes. 

The end result of this project will improve reliability and consistency of categorization across ICs.

 

Using previously labeled grants, a neural network may be applied to find best-fitted category fingerprints, which can be refined by domain experts if necessary.  A dashboard has been implemented to monitor the status of each category in the fingerprint development or maintenance process.  Each category has a file folder in the RCDC SharePoint site, where documents can be stored to facilitate collaborations among ICs.  

 

1.2.2  List of Collaborators

         Timothy Hays, Ph.D.; Project Director, RCDC; Chief, Portfolio Analysis and Scientific Opportunities Branch; Office of Portfolio Analysis and Strategic Initiatives; Office of the Director

         Sonja Gardner-Clarke, Portfolio Analysis and Scientific Opportunities Branch, Office of Portfolio Analysis and Strategic Initiatives, Office of the Director

         Deb Kassilke, Portfolio Analysis and Scientific Opportunities Branch, Office of Portfolio Analysis and Strategic Initiatives, Office of the Director

         Archna Bhandari, Noblis, Inc.; Portfolio Analysis and Scientific Opportunities Branch; Office of Portfolio Analysis and Strategic Initiatives;  Office of the Director

 

 

Major Accomplishments in FY 2007

 

HPCIO developed several systems and prototypes in FY 2007 to address various needs of the RCDC community. To increase the transparency of the fingerprint development process, we implemented a Web-based dashboard using .NET technologies to track the status of the fingerprints. The information conveyed by the dashboard allows IC coordinators to more easily determine what their IC experts need to do. In addition, the RCDC Portal has been migrated to a SharePoint 2003 site to facilitate file sharing and collaboration. We have conducted initial experiments to investigate the use of machine learning techniques to generate category fingerprints. HPCIO’s findings indicate that a perceptron can leverage labeled data to produce high-quality fingerprints that are comparable to the fingerprints generated by human experts. It takes considerably less time to generate a fingerprint using machines than by a committee of experts. This research has the potential to alleviate the labor-intensiveness of the fingerprint generation process and help ICs to develop fingerprints that they not otherwise have resources to make it possible.

 

 

Anticipated Major Accomplishments in FY 2008

 

In FY 2008, HPCIO intends to develop a neural network that utilizes the MATLAB Distributed Computing Engine to process large sets of fingerprints. More category fingerprints will be generated using the neural network. In addition, we shall develop better methodologies to evaluate the quality of the system-generated fingerprints. If the results are satisfactory, the network will be integrated into the RCDC system.

 

We shall implement additional functionalities to the dashboard and work the RCDC project team to incorporate it into the RCDC site. The RCDC SharePoint site will be migrated to 2007 version. Customizations will be made to the site to improve usability.

 

 

 

Metrics

 

Fingerprint Status Dashboard

 

  • Number of categories being tracked

324

RCDC SharePoint

 

  • Number of active users

108

  • Number of items stored

3053

Automatic Fingerprint Generation

 

  • Number of category fingerprints generated

9

  • Average F-measure of the fingerprints

0.7469

  • Average time to generate a fingerprint

< 1 min