Advanced Technology Program ATP Home Page NIST Home Page

Project Brief


Open Competition 1 - Information Technology

Syntax- and Rule-Based Decoding for Statistical Machine Translation Systems


Develop an integrated, statistical phrase-based and syntactic rule approach to machine translation to improve grammaticality and accuracy of translated materials, enabling broader use of machine-based translation by national security, government and business organizations.

Sponsor: Language Weaver, Inc.

4640 Admiralty Way
Suite 1210
Marina del Ray, CA 90292
  • Project Performance Period: 12/1/2004 - 11/30/2007
  • Total project (est.): $3,344,318.00
  • Requested ATP funds: $1,972,557.00

High-quality machine translation of natural languages has been a dream of researchers for over 50 years, but has yet to reach a level of sophistication and reliability sufficient for widespread application. The dominant approach for the past 30 years has been to use handcrafted linguistic rules, but this approach is very expensive to build, requiring the manual entry of large numbers of "rules" by trained linguists. This approach does not scale up well to a general system. Such systems also produce translations that are awkward and hard to understand. In recent years, a newer approach based on statistical models - a word or phrase is translated to one of a number of possibilities based on the probability that it would occur in the current context - has achieved marked success. The best examples substantially outperform rule-based systems. Statistics-based machine translation (SMT) also may prove easier and less expensive to expand, if the system can be taught new knowledge domains or languages by giving it large samples of existing human-translated texts. Despite some success, however, severe problems still exist: outputs are often ungrammatical and the quality and accuracy of translation falls well below that of a human linguist - and well below demands of all but highly specialized commercial markets. Language Weaver, a leader in SMT research, proposes to overcome these limitations with a hybrid system that would still be fundamentally statistics-based, but would incorporate higher level abstract syntax rules to arrive at the final translation. Such hybrids have been explored in the research community, but without any real success because it is difficult to merge the fundamentally different approaches. Language Weaver proposes a more complex and tightly integrated approach. The company will develop new algorithms that exploit knowledge of how words, phrases and patterns should be translated; knowledge of how syntax-based and non-syntax based translation rules should be applied; and knowledge of how syntactically based target structures should be generated. Cross-lingual parsers of increasing complexity will be developed, as well as methods to choose different syntactic orderings in different situations. Language Weaver is a small company attempting to expand its existing business in translation software and does not have the resources to pursue this far-reaching research track without ATP support. If the company is successful where others have failed, it will open up a significantly larger share of the $10 billion translation business to high-quality machine translation. Beyond that, there are far reaching social and economic benefits: quicker translation of intelligence information will aid the war on terrorism; U.S. businesses will be able to boost export sales by translating more sales and product literature; governments worldwide will be better able to provide support and services to non-native language populations; and the translation costs of doing international business will be lowered.

For project information:
Beth Walsh, (858) 724-2500
Beth@clearpointagency.com

ATP Project Manager
Jack Boudreaux, (301) 975-3560
jack.boudreaux@nist.gov


ATP website comments: webmaster-atp@nist.gov
Privacy Statement / Security Notice NIST Disclaimer NIST Information Quality Standards
NIST is an agency of the U.S. Commerce Department