NIST

Aho-Corasick

(algorithm)

Definition: A multiple string matching algorithm that constructs a finite state machine from a pattern (list of keywords), then uses the machine to locate all occurrences of the keywords in a body of text.

Generalization (I am a kind of ...)
string matching.

See also Commentz-Walter works from the back, like Boyer-Moore.

Note: Construction of the machine takes time proportional to the sum of the lengths of the keywords. The machine is used to process the text string in a single pass. The number of transitions made by the machine is independent of the number of keywords.

Author: VO

Implementation

Bruce Watson's SPARE Parts, a string pattern recognition toolkit (C++).

More information

Alfred V. Aho and Margaret J. Corasick, Efficient string matching: an aid to bibliographic search, CACM, 18(6):333-340, June 1975.


Go to the Dictionary of Algorithms and Data Structures home page.

If you have suggestions, corrections, or comments, please get in touch with Paul E. Black.

Entry modified 21 September 2006.
HTML page formatted Mon Jul 16 11:56:53 2012.

Cite this as:
Vadim Okun, "Aho-Corasick", in Dictionary of Algorithms and Data Structures [online], Paul E. Black, ed., U.S. National Institute of Standards and Technology. 21 September 2006. (accessed TODAY) Available from: http://www.nist.gov/dads/HTML/ahoCorasick.html

to NIST home page