eRA Working Group Explores Technology-Assisted Disease Coding |
eRA is collaborating with representatives from seven Institutes and Centers (ICs) to evaluate the use of advanced text-mining technology to improve NIH’s reporting on funding by disease. In his testimony to the House Appropriations Labor/Health and Human Services (HHS) Subcommittee on April 22, NIH Director Elias Zerhouni said that NIH would implement “intelligent data mining” to provide better accounting to Congress and the public on NIH’s investment by disease. Currently, the NIH Office of the Budget must prepare agency-wide reports on more than 230 diseases and conditions. With the dual objectives of standardizing definitions and automation, the NIH Director’s Steering Committee requested that eRA form a working group to test the feasibility of using specific text-mining tools to accurately and consistently assist with disease coding. The Knowledge Management Disease Coding (KMDC) Working Group, which began meeting in April, presented its findings to the Steering Committee on October 21. About Collexis® Technology Collexis® refers to a family of intelligent, text-searching tools for examining vast quantities of data to identify patterns and establish relationships. As bio-medical data grows to petabytes (millions of gigabytes), managing this data becomes increasingly important. Intelligent text mining holds promise for promoting health research and accelerating discoveries by automating the integration of multiple data bases to find linkages and make hypotheses. In January 2004, after evaluating available systems, NIH procured a site license for Collexis software. This software is based on the principle of “fingerprinting” each piece of text that contains relevant information, such as an article in a scientific journal. The fingerprinting process makes use of the professional terminology of a particular field. For example, the system can fingerprint an article based on the National Library of Medicine Medical Subject Headings (MeSH®) Thesaurus. Collexis then can condense the fingerprints of all of the researcher’s publications into a knowledge profile of that individual. Once Collexis has completed the fingerprinting/profiling of all sources of input, the system can make associations based on criteria established by the user. Consider this application. A busy Helpdesk receives several hundred e-mails daily that require responses from an expert. KM helps the Helpdesk by building knowledge profiles of all its employees. From then on, routing an incoming email is a matter of matching its fingerprint with the catalog of employee knowledge profiles. Progress of Disease Coding Working Group The KMDC Working Group, led by Richard Morris, has made significant headway over the past four months. The group completed the following tasks:
The KMDC policy group, comprised of representatives from the participating ICs and led by Izja Lederhendler, began meeting to consider basic principles of operation and options for governance. With technical assistance from Patti Gaines, Archna Bhandari, and Chanath Ratnanather who prepared the data, the policy group will present its findings to the NIH Steering Committee. Throughout the process, Norka Ruiz-Bravo (NIH deputy director for Extramural Research) and Richard Turman (director, NIH Office of the Budget) worked with Richard Morris, Izja Lederhendler, Della Hann, and Lee Pushkin to guide the effort. Previous NIH Pilots Several earlier pilots at NIH demonstrated proof of concept of KM’s promise for optimizing eRA knowledge assets and shortening grant cycle times. According to a statement by CSR Division of Biologic Basis of Disease Director Elliot Postow on May 17, the introduction of electronic grant applications and referral technologies could reduce the review cycle by six to eight weeks.
For more information about the eRA KM initiative, contact Richard Morris at RMorris@niaid.nih.gov. |