{"title":"双语词汇搭配翻译:统计学方法","authors":"Frank Smadja, K. McKeown, V. Hatzivassiloglou","doi":"10.7916/D8C82M3R","DOIUrl":null,"url":null,"abstract":"Collocations are notoriously difficult for non-native speakers to translate, primarily because they are opaque and cannot be translated on a word-by-word basis. We describe a program named Champollion which, given a pair of parallel corpora in two different languages and a list of collocations in one of them, automatically produces their translations. Our goal is to provide a tool for compiling bilingual lexical information above the word level in multiple languages, for different domains. The algorithm we use is based on statistical methods and produces p-word translations of n-word collocations in which n and p need not be the same. For example, Champollion translates make...decision, employment equity, and stock market into prendre...decision, equite en matiere d'emploi, and bourse respectively. Testing Champollion on three years' worth of the Hansards corpus yielded the French translations of 300 collocations for each year, evaluated at 73% accuracy on average. In this paper, we describe the statistical measures used, the algorithm, and the implementation of Champollion, presenting our results and evaluation.","PeriodicalId":360119,"journal":{"name":"Comput. Linguistics","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1996-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"578","resultStr":"{\"title\":\"Translating Collocations for Bilingual Lexicons: A Statistical Approach\",\"authors\":\"Frank Smadja, K. McKeown, V. Hatzivassiloglou\",\"doi\":\"10.7916/D8C82M3R\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Collocations are notoriously difficult for non-native speakers to translate, primarily because they are opaque and cannot be translated on a word-by-word basis. We describe a program named Champollion which, given a pair of parallel corpora in two different languages and a list of collocations in one of them, automatically produces their translations. Our goal is to provide a tool for compiling bilingual lexical information above the word level in multiple languages, for different domains. The algorithm we use is based on statistical methods and produces p-word translations of n-word collocations in which n and p need not be the same. For example, Champollion translates make...decision, employment equity, and stock market into prendre...decision, equite en matiere d'emploi, and bourse respectively. Testing Champollion on three years' worth of the Hansards corpus yielded the French translations of 300 collocations for each year, evaluated at 73% accuracy on average. In this paper, we describe the statistical measures used, the algorithm, and the implementation of Champollion, presenting our results and evaluation.\",\"PeriodicalId\":360119,\"journal\":{\"name\":\"Comput. Linguistics\",\"volume\":\"22 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1996-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"578\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Comput. Linguistics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.7916/D8C82M3R\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Comput. Linguistics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.7916/D8C82M3R","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 578
摘要
众所周知,搭配对于非母语人士来说很难翻译,主要是因为它们不透明,不能逐字翻译。我们描述了一个名为Champollion的程序,给定两种不同语言的一对平行语料库和其中一种语言的搭配列表,该程序自动生成它们的翻译。我们的目标是提供一种工具,用于在多种语言的不同领域中编译单词级别以上的双语词汇信息。我们使用的算法基于统计方法,并产生n个单词搭配的p个单词翻译,其中n和p不必相同。例如,商博良将make…decision、employment equity和stock market分别翻译成prenre…decision、equite en matiere d'emploi和bourse。商博良用三年的汉莎语料库进行测试,每年得出300个搭配的法语翻译,平均准确率为73%。在本文中,我们描述了使用的统计度量、算法和商博良的实现,并给出了我们的结果和评价。
Translating Collocations for Bilingual Lexicons: A Statistical Approach
Collocations are notoriously difficult for non-native speakers to translate, primarily because they are opaque and cannot be translated on a word-by-word basis. We describe a program named Champollion which, given a pair of parallel corpora in two different languages and a list of collocations in one of them, automatically produces their translations. Our goal is to provide a tool for compiling bilingual lexical information above the word level in multiple languages, for different domains. The algorithm we use is based on statistical methods and produces p-word translations of n-word collocations in which n and p need not be the same. For example, Champollion translates make...decision, employment equity, and stock market into prendre...decision, equite en matiere d'emploi, and bourse respectively. Testing Champollion on three years' worth of the Hansards corpus yielded the French translations of 300 collocations for each year, evaluated at 73% accuracy on average. In this paper, we describe the statistical measures used, the algorithm, and the implementation of Champollion, presenting our results and evaluation.