{"title":"用同源词对双语语料库中的句子进行对齐","authors":"Michel Simard, George F. Foster, P. Isabelle","doi":"10.1145/962411","DOIUrl":null,"url":null,"abstract":"In a recent paper, Gale and Church describe an inexpensive method for aligning bitext, based exclusively on sentence lengths [3]. While this method produces surprisingly good results (a success rate around 96%), even better results are required to perform such tasks as the computer-assisted revision of translations. In this paper, we examine some of the weaknesses of Gale and Church's program, and explain how just a small amount of linguistic knowledge would help to overcome these weaknesses. We discuss how cognates provide for a cheap and reasonably reliable source of linguistic knowledge. To illustrate this, we describe a modification to the program in which the criterion is cognates rather than sentence lengths. Finally, we show how better and more efficient results may be obtained by combining the two criteria length and \"cogneteness\". Our method can be generalized to accommodate other sources of linguistic knowledge, and experimentation shows that it produces better results than alignments based on length alone, at a minimal cost.","PeriodicalId":345684,"journal":{"name":"Conference on Theoretical and Methodological Issues in Machine Translation of Natural Languages","volume":"282 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1993-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"384","resultStr":"{\"title\":\"Using cognates to align sentences in bilingual corpora\",\"authors\":\"Michel Simard, George F. Foster, P. Isabelle\",\"doi\":\"10.1145/962411\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In a recent paper, Gale and Church describe an inexpensive method for aligning bitext, based exclusively on sentence lengths [3]. While this method produces surprisingly good results (a success rate around 96%), even better results are required to perform such tasks as the computer-assisted revision of translations. In this paper, we examine some of the weaknesses of Gale and Church's program, and explain how just a small amount of linguistic knowledge would help to overcome these weaknesses. We discuss how cognates provide for a cheap and reasonably reliable source of linguistic knowledge. To illustrate this, we describe a modification to the program in which the criterion is cognates rather than sentence lengths. Finally, we show how better and more efficient results may be obtained by combining the two criteria length and \\\"cogneteness\\\". Our method can be generalized to accommodate other sources of linguistic knowledge, and experimentation shows that it produces better results than alignments based on length alone, at a minimal cost.\",\"PeriodicalId\":345684,\"journal\":{\"name\":\"Conference on Theoretical and Methodological Issues in Machine Translation of Natural Languages\",\"volume\":\"282 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1993-10-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"384\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Conference on Theoretical and Methodological Issues in Machine Translation of Natural Languages\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/962411\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Conference on Theoretical and Methodological Issues in Machine Translation of Natural Languages","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/962411","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Using cognates to align sentences in bilingual corpora
In a recent paper, Gale and Church describe an inexpensive method for aligning bitext, based exclusively on sentence lengths [3]. While this method produces surprisingly good results (a success rate around 96%), even better results are required to perform such tasks as the computer-assisted revision of translations. In this paper, we examine some of the weaknesses of Gale and Church's program, and explain how just a small amount of linguistic knowledge would help to overcome these weaknesses. We discuss how cognates provide for a cheap and reasonably reliable source of linguistic knowledge. To illustrate this, we describe a modification to the program in which the criterion is cognates rather than sentence lengths. Finally, we show how better and more efficient results may be obtained by combining the two criteria length and "cogneteness". Our method can be generalized to accommodate other sources of linguistic knowledge, and experimentation shows that it produces better results than alignments based on length alone, at a minimal cost.