用结构化和基于网络的语言模型对机器翻译假设进行重新排序

2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU) Pub Date : 2007-12-01 DOI:10.1109/ASRU.2007.4430102

Wen Wang, A. Stolcke, Jing Zheng

{"title":"用结构化和基于网络的语言模型对机器翻译假设进行重新排序","authors":"Wen Wang, A. Stolcke, Jing Zheng","doi":"10.1109/ASRU.2007.4430102","DOIUrl":null,"url":null,"abstract":"In this paper, we investigate the use of linguistically motivated and computationally efficient structured language models for reranking N-best hypotheses in a statistical machine translation system. These language models, developed from constraint dependency grammar parses, tightly integrate knowledge of words, morphological and lexical features, and syntactic dependency constraints. Two structured language models are applied for N-best rescoring, one is an almost-parsing language model, and the other utilizes more syntactic features by explicitly modeling syntactic dependencies between words. We also investigate effective and efficient language modeling methods to use N-grams extracted from up to 1 teraword of web documents. We apply all these language models for N-best re-ranking on the NIST and DARPA GALE program1 2006 and 2007 machine translation evaluation tasks and find that the combination of these language models increases the BLEU score up to 1.6% absolutely on blind test sets.","PeriodicalId":371729,"journal":{"name":"2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"22","resultStr":"{\"title\":\"Reranking machine translation hypotheses with structured and web-based language models\",\"authors\":\"Wen Wang, A. Stolcke, Jing Zheng\",\"doi\":\"10.1109/ASRU.2007.4430102\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we investigate the use of linguistically motivated and computationally efficient structured language models for reranking N-best hypotheses in a statistical machine translation system. These language models, developed from constraint dependency grammar parses, tightly integrate knowledge of words, morphological and lexical features, and syntactic dependency constraints. Two structured language models are applied for N-best rescoring, one is an almost-parsing language model, and the other utilizes more syntactic features by explicitly modeling syntactic dependencies between words. We also investigate effective and efficient language modeling methods to use N-grams extracted from up to 1 teraword of web documents. We apply all these language models for N-best re-ranking on the NIST and DARPA GALE program1 2006 and 2007 machine translation evaluation tasks and find that the combination of these language models increases the BLEU score up to 1.6% absolutely on blind test sets.\",\"PeriodicalId\":371729,\"journal\":{\"name\":\"2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2007-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"22\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ASRU.2007.4430102\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASRU.2007.4430102","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 22

摘要

在本文中，我们研究了在统计机器翻译系统中使用语言动机和计算效率高的结构化语言模型对n个最佳假设进行重新排序。这些语言模型由约束依赖语法解析发展而来，紧密地集成了单词知识、形态和词汇特征以及句法依赖约束。两种结构化语言模型应用于N-best评分，一种是几乎解析语言模型，另一种通过显式建模词之间的句法依赖性来利用更多的句法特征。我们还研究了有效和高效的语言建模方法，使用从多达1兆字的web文档中提取的N-grams。我们将所有这些语言模型应用于NIST和DARPA GALE项目1 2006年和2007年的机器翻译评估任务上进行N-best重新排名，发现这些语言模型的组合在盲测试集上将BLEU分数绝对提高了1.6%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Reranking machine translation hypotheses with structured and web-based language models

In this paper, we investigate the use of linguistically motivated and computationally efficient structured language models for reranking N-best hypotheses in a statistical machine translation system. These language models, developed from constraint dependency grammar parses, tightly integrate knowledge of words, morphological and lexical features, and syntactic dependency constraints. Two structured language models are applied for N-best rescoring, one is an almost-parsing language model, and the other utilizes more syntactic features by explicitly modeling syntactic dependencies between words. We also investigate effective and efficient language modeling methods to use N-grams extracted from up to 1 teraword of web documents. We apply all these language models for N-best re-ranking on the NIST and DARPA GALE program1 2006 and 2007 machine translation evaluation tasks and find that the combination of these language models increases the BLEU score up to 1.6% absolutely on blind test sets.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)

自引率

0.00%

发文量