Mixture-Modeling with Unsupervised Clusters for Domain Adaptation in Statistical Machine Translation

European Association for Machine Translation Conferences/Workshops Pub Date : 2012-05-29 DOI:10.5167/UZH-62826

Rico Sennrich

引用次数: 23

Abstract

In Statistical Machine Translation, in-domain and out-of-domain training data are not always clearly delineated. This paper investigates how we can still use mixture-modeling techniques for domain adaptation in such cases. We apply unsupervised clustering methods to split the original training set, and then use mixture-modeling techniques to build a model adapted to a given target domain. We show that this approach improves performance over an unadapted baseline, and several alternative domain adaptation methods.

查看原文本刊更多论文

统计机器翻译领域自适应的无监督聚类混合建模

在统计机器翻译中，域内和域外的训练数据并不总是被清晰地描绘出来。本文探讨了如何在这种情况下仍然使用混合建模技术进行领域适应。我们采用无监督聚类方法对原始训练集进行分割，然后使用混合建模技术构建适应给定目标域的模型。我们展示了这种方法在未适应的基线和几种可选的领域适应方法上提高了性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

European Association for Machine Translation Conferences/Workshops

自引率

0.00%

发文量