Skip Navigation
Lister Hill Center Logo  

Search Tips
About the Lister Hill Center
Innovative Research
Publications and Lectures
Training and Employment
LHNCBC: Document Abstract
Year: 2004Adobe Acrobat Reader
Download Free Adobe Acrobat Reader
2004-005
Exploiting Parallel Text in Word Sense Disambiguation
Philip Resnik
2004-05-27
The last decade has taught computational linguists that high performance in natural language processing tasks is best obtained using supervised learning techniques, which require annotation of large quantities of training data. But annotated text is hard to obtain. Some have emphasized making the most out of limited amounts of annotation. Others have argued that we should focus on simpler learning algorithms and find ways to exploit much larger quantities of text, though those efforts have tended to focus on linguistically shallow problems. In this talk, I describe my efforts to exploit larger quantities of data while still focusing on linguistically deeper problems, with a focus on word sense disambiguation. The key idea is to take advantage of text in parallel translation, using a second language as evidence about the first. When two distinct English words can be translated as the same word in a second language, it often indicates that the two are being used in senses that share some element of meaning. For example, 'bank' may be ambiguous, and so might the word 'shore' (it can refer both to a shoreline and to a piece of wood used as a brace or support). But both 'bank' and 'shore' can be translated into French as 'rive', and this fact suggests that the two senses corresponding to that translation have something in common. I will describe an algorithm that takes advantage of this observation in order to induce preferences over word sense alternatives on the English side, without benefit of any manually sense-annotated data.