Home Projects Publications Presentations Repositories Photo Gallery Career Staff Favorites
  • Turning The Pages Online
  • MyMorph
  • Medical Article Records GROUNDTRUTH (MARG)
  • MD on Tap
  • AnatQuest
Links to Feeds:
PublicationsRSS  RSS
CEB NewsRSS  RSS

Last updated: January 13, 2009

  Print this Print this  E-mail this E-mail this

The Communications Engineering Branch (CEB) is part of the Lister Hill National Center for Biomedical Communications, an R&D division of the U.S. National Library of Medicine. Our mission is to conduct research and development directed toward mission-critical tasks at NLM and NIH, such as cancer research, document delivery, digital preservation, and automated ways of building resources such as MEDLINE®. All software products developed by our researchers are freely available.

A major focus at CEB is image engineering, both for biomedical as well as document images. This includes research in image processing, compression and enhancement. Other research is targeted toward text and data mining. Our work in these areas employs machine learning and natural language processing techniques. Scientific papers by staff appear under publications.

Biomedical Imaging 

The goal of our work in Biomedical Imaging is two-fold: One, to develop advanced imaging tools for biomedical research in partnership with the National Cancer Institute and other organizations. Secondly, to conduct research in Content Based Image Retrieval (CBIR) to index and retrieve medical images by image features (e.g., shape, color and texture), augmented by textual features as well. This work includes the development of the CervigramFinder for retrieval of uterine cervix images by image features, SPIRS for retrieval of digitized x-ray images of the spine from NHANES II and a distributed global system (SPIRS-IRMA) for image retrieval by both high-level and detailed features of medical images, in collaboration with Aachen University, Germany.

CBIR is also an aspect of the Image Text Indexing (ITI) project that seeks to automatically index illustrations in medical articles by processing text in figure captions and mentions in the article, as well as image features in the illustrations.

Document Image Analysis 

Research in document image analysis and understanding is directed toward the automated extraction of bibliographic data from scanned and Web documents (medical journals) to populate MEDLINE®. This research focuses on the design of rule-based as well as machine learning algorithms (e.g., Support Vector Machine, Hidden Markov Model) relying on geometric, OCR-generated and contextual features in the documents. These algorithms are for page segmentation, zone labeling and named entity extraction. Based on this research we have developed and maintain production systems, MARS (Medical Article Records System) and PDR (Publisher Data Review) in operation at NLM.

Digital Preservation Research 

Document image analysis and machine learning are also foundational to our Digital Preservation Research. We have developed a System for Preservation of Electronic Resources (SPER) that possesses key attributes for affordable long term preservation, such as automated metadata extraction and bulk migration. SPER is currently applied to the preservation of 70,000 historic documents from the Food and Drug Administration.

Document Processing & Delivery 

Document processing and document delivery are the goals for a suite of systems developed in the DocView project. In this project, we have developed DocView, client software used by libraries to receive documents sent by interlibrary loan services using Ariel® and similar systems. DocMorph is a system allowing users to use a Web browser to upload files in any of 50 formats for automatic conversion to PDF, TIFF or OCR-converted text. As an alternative to a browser, users may use MyMorph, a client that employs DocMorph for bulk conversion of thousands of files. MyDelivery is a system for the secure delivery of very large, Gigabyte-sized, files.

Interactive Publications 

To address the increasing use of multimedia in scientific publishing, our research in Interactive Publications (IP) has resulted in prototype interactive documents integrating text with medical multimedia and large data tables. We have developed Panorama, a tool for viewing and manipulating medical images and video, analyzing tabular data, and converting tables to graphs and back.

Clinical Informatics 

The Repository for Informed Decision Making (RIDeM) project focuses on developing data mining tools to automatically create a repository of key facts extracted from the biomedical literature. The key facts are those needed for informed clinical decision making, and the tools rely on machine learning and natural language processing. NLM InfoBot is a related project that aims to develop a system that automatically augments a patient's electronic medical record (EMR) with pertinent information extracted from NLM resources. The InfoBot software would run as background agents, both at NLM and at a clinical site. The latter would use our APIs to integrate the search setup and to display and store results in their existing EMR system.

Visualizing Rare Books 

Turning The Pages (TTP) has been developed to display photorealistic animations of high resolution scans of rare historic books in biomedicine in NLM's collection. The kiosk version located at the Library, equipped with touchscreen monitors, allows patrons to 'touch and turn' the pages, zoom into details and hear voice annotations. TTP Online allows users on the Internet to 'click and turn' the pages. Included are 16th century books by Vesalius, Gesner and Paré in both kiosk and online versions. In the kiosks we also show the world's oldest surviving surgical text, the Edwin Smith Papyrus written around the 17th century BCE.

Nursing Home Screener 

Research into systems that provide useful information for the public includes the development of the Nursing Home Screener, a Web 2.0 system that integrates databases of quality information and Google Maps. It provides the public a means to narrow a search for suitable nursing homes by geography " and measures of quality.

Visible Human Project 

Another system for the public, AnatQuest, allows users to view Visible Human images and 3D renderings of anatomic objects.

Datasets & Repositories 

In addition to conducting research, we create and maintain ground truth databases (human validated data) for computer science and medical informatics research. These collections include annotated spinal x-rays and uterine cervix images.

MARG, Medical Article Records Groundtruth, is a collection of bitmapped images of medical articles, validated OCR, zoned and labeled regions of the pages. It is used for developing and testing algorithms for page segmentation and labeling by bibliographic entity (article title, author names, affiliations and abstract.)


CEB Paper Award at AMIA 2008

CEB Paper Award at AMIA 2008 CEB paper on Nursing Home Screener wins a Distinguished Paper Award at the American Medical Informatics Association (AMIA) Symposium, November 2008.

Abstract | Full Article (PDF) | View Citation

Read More Read More

NIH Research Festival 2008 Photos

NIH Research Festival 2008 NIH Research Festival on October 14 - 17, 2008 at Natcher Conference Center

Read More Read More

National Institutes of Health (NIH)National Institutes of Health (NIH)
9000 Rockville Pike
Bethesda, Maryland 20892

U.S. Dept. of Health and Human ServicesU.S. Dept. of Health
and Human Services

USA.gov Website