A relationship: word alignment, phrase table, and translation quality

ScientificWorldJournal. 2014:2014:438106. doi: 10.1155/2014/438106. Epub 2014 Apr 16.

Abstract

In the last years, researchers conducted several studies to evaluate the machine translation quality based on the relationship between word alignments and phrase table. However, existing methods usually employ ad-hoc heuristics without theoretical support. So far, there is no discussion from the aspect of providing a formula to describe the relationship among word alignments, phrase table, and machine translation performance. In this paper, on one hand, we focus on formulating such a relationship for estimating the size of extracted phrase pairs given one or more word alignment points. On the other hand, a corpus-motivated pruning technique is proposed to prune the default large phrase table. Experiment proves that the deduced formula is feasible, which not only can be used to predict the size of the phrase table, but also can be a valuable reference for investigating the relationship between the translation performance and phrase tables based on different links of word alignment. The corpus-motivated pruning results show that nearly 98% of phrases can be reduced without any significant loss in translation quality.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Language
  • Linguistics / methods
  • Natural Language Processing*
  • Translating*