Institutsversammlung am 15.06.2011, Raum M 12.21

Statistical Machine Translation with Weighted Grammars

Talk by Matthias Büchse, PhD student in computer science, Technische Universität Dresden

Abstract

Weighted grammars have a firm place in the research of statistical machine translation (SMT). Recent examples of such grammars are synchronous context-free grammars (Chiang, 2007), synchronous tree-insertion grammars (Nesson, Shieber, and Rush, 2006), and synchronous tree-adjoining grammars (DeNeefe and Knight, 2009). Each of the respective systems achieves significant BLEU scores.

One benefit of grammar-based models is that many results from formal-language theory and automata theory can be transferred immediately to SMT, or with little effort. Examples of this transfer are the problems of intersecting languages and finding shortest paths, which frequently occur in decoding. In addition, translation systems specified in such a framework can run on any platform which offers a corresponding toolkit.

In this talk we first briefly recall four main tasks of SMT: modeling, training, decoding, and evaluation. Then, guided by an example, we approach these tasks from the perspective of weighted grammars. If time permits, we quickly review the state of the art in this setting.

References

David Chiang, 2007. Hierarchical phrase-based translation. In Comp. Ling. 33(2):201–228. http://www.mitpressjournals.org/doi/pdf/10.1162/coli.2007.33.2.201

Rebecca Nesson, Stuart M. Shieber, and Alexander Rush, 2006. Induction of probabilistic synchronous tree-insertion grammars for machine translation. In Proc. AMTA 2006. http://www.eecs.harvard.edu/~shieber/Biblio/Papers/Nesson-2006-IPS.pdf

Steve DeNeefe and Kevin Knight, 2009. Synchronous Tree Adjoining Machine Translation. In Proc. EMNLP 2009. http://www.isi.edu/natural-language/mt/adjoin09.pdf