统计机器翻译领域自适应的无监督聚类混合建模

European Association for Machine Translation Conferences/Workshops Pub Date : 2012-05-29 DOI:10.5167/UZH-62826

Rico Sennrich

{"title":"统计机器翻译领域自适应的无监督聚类混合建模","authors":"Rico Sennrich","doi":"10.5167/UZH-62826","DOIUrl":null,"url":null,"abstract":"In Statistical Machine Translation, in-domain and out-of-domain training data are not always clearly delineated. This paper investigates how we can still use mixture-modeling techniques for domain adaptation in such cases. We apply unsupervised clustering methods to split the original training set, and then use mixture-modeling techniques to build a model adapted to a given target domain. We show that this approach improves performance over an unadapted baseline, and several alternative domain adaptation methods.","PeriodicalId":137211,"journal":{"name":"European Association for Machine Translation Conferences/Workshops","volume":"1230 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"23","resultStr":"{\"title\":\"Mixture-Modeling with Unsupervised Clusters for Domain Adaptation in Statistical Machine Translation\",\"authors\":\"Rico Sennrich\",\"doi\":\"10.5167/UZH-62826\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In Statistical Machine Translation, in-domain and out-of-domain training data are not always clearly delineated. This paper investigates how we can still use mixture-modeling techniques for domain adaptation in such cases. We apply unsupervised clustering methods to split the original training set, and then use mixture-modeling techniques to build a model adapted to a given target domain. We show that this approach improves performance over an unadapted baseline, and several alternative domain adaptation methods.\",\"PeriodicalId\":137211,\"journal\":{\"name\":\"European Association for Machine Translation Conferences/Workshops\",\"volume\":\"1230 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-05-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"23\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"European Association for Machine Translation Conferences/Workshops\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.5167/UZH-62826\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"European Association for Machine Translation Conferences/Workshops","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5167/UZH-62826","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 23

摘要

在统计机器翻译中，域内和域外的训练数据并不总是被清晰地描绘出来。本文探讨了如何在这种情况下仍然使用混合建模技术进行领域适应。我们采用无监督聚类方法对原始训练集进行分割，然后使用混合建模技术构建适应给定目标域的模型。我们展示了这种方法在未适应的基线和几种可选的领域适应方法上提高了性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Mixture-Modeling with Unsupervised Clusters for Domain Adaptation in Statistical Machine Translation

In Statistical Machine Translation, in-domain and out-of-domain training data are not always clearly delineated. This paper investigates how we can still use mixture-modeling techniques for domain adaptation in such cases. We apply unsupervised clustering methods to split the original training set, and then use mixture-modeling techniques to build a model adapted to a given target domain. We show that this approach improves performance over an unadapted baseline, and several alternative domain adaptation methods.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

European Association for Machine Translation Conferences/Workshops

自引率

0.00%

发文量