管理多语言文档集合中的信息差异

ACM Trans. Speech Lang. Process. Pub Date : 2013-03-01 DOI:10.1145/2442076.2442077

Kevin Duh, C. Yeung, Tomoharu Iwata, M. Nagata

{"title":"管理多语言文档集合中的信息差异","authors":"Kevin Duh, C. Yeung, Tomoharu Iwata, M. Nagata","doi":"10.1145/2442076.2442077","DOIUrl":null,"url":null,"abstract":"Information disparity is a major challenge with multilingual document collections. When documents are dynamically updated in a distributed fashion, information content among different language editions may gradually diverge. We propose a framework for assisting human editors to manage this information disparity, using tools from machine translation and machine learning. Given source and target documents in two different languages, our system automatically identifies information nuggets that are new with respect to the target and suggests positions to place their translations. We perform both real-world experiments and large-scale simulations on Wikipedia documents and conclude our system is effective in a variety of scenarios.","PeriodicalId":412532,"journal":{"name":"ACM Trans. Speech Lang. Process.","volume":"38 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"18","resultStr":"{\"title\":\"Managing information disparity in multilingual document collections\",\"authors\":\"Kevin Duh, C. Yeung, Tomoharu Iwata, M. Nagata\",\"doi\":\"10.1145/2442076.2442077\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Information disparity is a major challenge with multilingual document collections. When documents are dynamically updated in a distributed fashion, information content among different language editions may gradually diverge. We propose a framework for assisting human editors to manage this information disparity, using tools from machine translation and machine learning. Given source and target documents in two different languages, our system automatically identifies information nuggets that are new with respect to the target and suggests positions to place their translations. We perform both real-world experiments and large-scale simulations on Wikipedia documents and conclude our system is effective in a variety of scenarios.\",\"PeriodicalId\":412532,\"journal\":{\"name\":\"ACM Trans. Speech Lang. Process.\",\"volume\":\"38 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"18\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Trans. Speech Lang. Process.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2442076.2442077\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Trans. Speech Lang. Process.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2442076.2442077","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 18

摘要

信息差异是多语言文献收集面临的主要挑战。当文档以分布式方式动态更新时，不同语言版本之间的信息内容可能会逐渐出现分歧。我们提出了一个框架来帮助人类编辑管理这种信息差异，使用机器翻译和机器学习的工具。给定两种不同语言的源文档和目标文档，我们的系统会自动识别相对于目标而言的新信息块，并建议放置其翻译的位置。我们对维基百科文档进行了真实世界的实验和大规模的模拟，并得出结论，我们的系统在各种场景下都是有效的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Managing information disparity in multilingual document collections

Information disparity is a major challenge with multilingual document collections. When documents are dynamically updated in a distributed fashion, information content among different language editions may gradually diverge. We propose a framework for assisting human editors to manage this information disparity, using tools from machine translation and machine learning. Given source and target documents in two different languages, our system automatically identifies information nuggets that are new with respect to the target and suggests positions to place their translations. We perform both real-world experiments and large-scale simulations on Wikipedia documents and conclude our system is effective in a variety of scenarios.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ACM Trans. Speech Lang. Process.

自引率

0.00%

发文量