Research Menu

.
Skip Search Box

Method of Extracting Text Present in a Color Image

Aliases:

None

Technical Challenge:

Extracting text from a color image, especially where the text is integrated with a graphic, is useful for optical character recognition and for conducting a text search. Color images that integrate text and graphics communicate in an immediate and effective manner and are widely used. However, such images are often a complex mixture of shapes and colors arranged in unpredictable ways, which make it difficult to automatically extract or separate the text from the rest of the color image.

Description:

In this method, text integrated with a graphic means that the text and the graphic are not located in separate regions of the image but are combined somehow (e.g., overlaid). This is a method of extracting text from a color image by receiving a color image made up of pixels in any color component system, converting the color image to a grayscale image by performing one of several conversion methods, comparing the grayscale images to a user-definable threshold, and turning the grayscale images into binary images that may be further processed by an optical character reader or a search engine.

The processing method presented here allows automatic recovery of text from color images by operations that reduce alphanumeric characters in the image to black-and-white, followed by recognition in commercially available OCR software. Although a parallel approach is advocated to handle the wide class of images expected in practice, the mathematical simplicity of the operations should present little complexity in implementation. When tied to a dictionary of key words and phrases, textual output may be used to rank the value of an image and draw attention to the color original as deserving further examination.

Demonstration Capability:

MATLAB demo can be prepared easily.

Potential Commercial Application(s):

Inclusion in COTS OCR engines (e.g., OMNI Page).

Patent Status:

Issued - United States Patent Number 6,519,362

Reference Number: 1098

If you are interested in exploring this technology further, please call 443-445-7159 or express your interest in writing to the:

National Security Agency
NSA Technology Transfer Program
9800 Savage Road, Suite 6541
Fort George G. Meade, Maryland 20755-6541

 

Date Posted: Jan 15, 2009 | Last Modified: Jan 15, 2009 | Last Reviewed: Jan 15 2009

 
bottom

National Security Agency / Central Security Service