{"title":"平行语料库部分对应提取的学习方法","authors":"Ryo Terashima, Hiroshi Echizen-ya, K. Araki","doi":"10.1109/IALP.2009.69","DOIUrl":null,"url":null,"abstract":"For machine translations using a parallel corpus, it is effective to extract partial correspondences: pairs of phrases of the source language(SL) and target language(TL) in bilingual sentences. However, it is difficult to extract the partial correspondences correctly and efficiently in the data sparse corpus. In this paper, we propose a new learning method that extracts the partial correspondences solely from the parallel corpus without any analytical tools. In the proposed method, the extraction rules are automatically acquired from bilingual sentences using bi-gram statistics in each language sentence and the similarity based on Dice coefficient between SL words and TL words. The acquired extraction rules possess information about the first parts(e.g., \"a\", \"the\") or the last parts in phrases. Moreover, the partial correspondences are extracted from the bilingual sentences using the extraction rules correctly and efficiently. Evaluation experiments indicated that our proposed method can improve the translation quality of the learning-type machine translation by correctly and efficiently extracting the partial correspondences in bilingual sentences.","PeriodicalId":156840,"journal":{"name":"2009 International Conference on Asian Language Processing","volume":"28 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Learning Method for Extraction of Partial Correspondence from Parallel Corpus\",\"authors\":\"Ryo Terashima, Hiroshi Echizen-ya, K. Araki\",\"doi\":\"10.1109/IALP.2009.69\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"For machine translations using a parallel corpus, it is effective to extract partial correspondences: pairs of phrases of the source language(SL) and target language(TL) in bilingual sentences. However, it is difficult to extract the partial correspondences correctly and efficiently in the data sparse corpus. In this paper, we propose a new learning method that extracts the partial correspondences solely from the parallel corpus without any analytical tools. In the proposed method, the extraction rules are automatically acquired from bilingual sentences using bi-gram statistics in each language sentence and the similarity based on Dice coefficient between SL words and TL words. The acquired extraction rules possess information about the first parts(e.g., \\\"a\\\", \\\"the\\\") or the last parts in phrases. Moreover, the partial correspondences are extracted from the bilingual sentences using the extraction rules correctly and efficiently. Evaluation experiments indicated that our proposed method can improve the translation quality of the learning-type machine translation by correctly and efficiently extracting the partial correspondences in bilingual sentences.\",\"PeriodicalId\":156840,\"journal\":{\"name\":\"2009 International Conference on Asian Language Processing\",\"volume\":\"28 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2009-12-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2009 International Conference on Asian Language Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IALP.2009.69\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 International Conference on Asian Language Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IALP.2009.69","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Learning Method for Extraction of Partial Correspondence from Parallel Corpus
For machine translations using a parallel corpus, it is effective to extract partial correspondences: pairs of phrases of the source language(SL) and target language(TL) in bilingual sentences. However, it is difficult to extract the partial correspondences correctly and efficiently in the data sparse corpus. In this paper, we propose a new learning method that extracts the partial correspondences solely from the parallel corpus without any analytical tools. In the proposed method, the extraction rules are automatically acquired from bilingual sentences using bi-gram statistics in each language sentence and the similarity based on Dice coefficient between SL words and TL words. The acquired extraction rules possess information about the first parts(e.g., "a", "the") or the last parts in phrases. Moreover, the partial correspondences are extracted from the bilingual sentences using the extraction rules correctly and efficiently. Evaluation experiments indicated that our proposed method can improve the translation quality of the learning-type machine translation by correctly and efficiently extracting the partial correspondences in bilingual sentences.