Skip Navigation
Lister Hill Center Logo  

Search Tips
About the Lister Hill Center
Innovative Research
Publications and Lectures
Training and Employment
LHNCBC: Document Abstract
Year: 2006Adobe Acrobat Reader
Download Free Adobe Acrobat Reader
LHNCBC-2006-041
dTagger: A POS Tagger
Divita G, Browne AC, Loanne R
AMIA 2006 Symposium Proceedings pp 201-203
The Lexical Systems Group at the National Library of Medicine (NLM) has developed a Part-of-Speech (POS) tagger1 to be freely distributed with the SPECIALIST NLP Tools. dTagger is specifically designed for use with the SPECIALIST lexicon but it can be used with an arbitrary tag set. It is capable of single or multi-word chunking. It is trainable with previously annotated text and in development is a version that is tunable with untagged text. The tagger allows users to add local lexicon content. It can report likelihoods for each sentence tagged. New words seen while tagging (the unknowns) are handled by shape identification including heuristics based on suffix statistics gleaned during the training. The performance of the supervised training is noted to be 95% on a modified version of the MedPost hand annotated Medline abstracts. Eight percent of the terms within this corpus were multi-word entities.
PDF