{"title":"统计机器翻译的主题自适应","authors":"Mina Taraghi, Shahram Khadivi","doi":"10.1109/IRANIANCEE.2017.7985416","DOIUrl":null,"url":null,"abstract":"we present new ways for Farsi to English topic adaptation for statistical machine translation. We incorporate topic in the phrase table in the form of sparse phrasal features and make use of sparse lexical features by determining the topic distribution of source sentences in the development and test corpus. These sparse features cover a lot of source to target topic related translations. We also develop systems with features that measure the topical similarity of the source sentence and each hypothesis. These features include features based on distributional profiles and two types of features which make use of bilingual topic models to measure the similarity of the source sentence and the hypothesis using topic vectors in source and target languages. Domain and topic adaptation is also combined to improve the translation quality. Different experiments are carried out on Farsi to English Verbmobil and CNN datasets. BLEU score shows up to 2.0 improvement on Verbmobil dataset. Up to 1.17 BLEU improvement and several individual translation corrections are observed in CNN dataset.","PeriodicalId":161929,"journal":{"name":"2017 Iranian Conference on Electrical Engineering (ICEE)","volume":"83 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Topic adaptation for Statistical Machine Translation\",\"authors\":\"Mina Taraghi, Shahram Khadivi\",\"doi\":\"10.1109/IRANIANCEE.2017.7985416\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"we present new ways for Farsi to English topic adaptation for statistical machine translation. We incorporate topic in the phrase table in the form of sparse phrasal features and make use of sparse lexical features by determining the topic distribution of source sentences in the development and test corpus. These sparse features cover a lot of source to target topic related translations. We also develop systems with features that measure the topical similarity of the source sentence and each hypothesis. These features include features based on distributional profiles and two types of features which make use of bilingual topic models to measure the similarity of the source sentence and the hypothesis using topic vectors in source and target languages. Domain and topic adaptation is also combined to improve the translation quality. Different experiments are carried out on Farsi to English Verbmobil and CNN datasets. BLEU score shows up to 2.0 improvement on Verbmobil dataset. Up to 1.17 BLEU improvement and several individual translation corrections are observed in CNN dataset.\",\"PeriodicalId\":161929,\"journal\":{\"name\":\"2017 Iranian Conference on Electrical Engineering (ICEE)\",\"volume\":\"83 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 Iranian Conference on Electrical Engineering (ICEE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IRANIANCEE.2017.7985416\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 Iranian Conference on Electrical Engineering (ICEE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IRANIANCEE.2017.7985416","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Topic adaptation for Statistical Machine Translation
we present new ways for Farsi to English topic adaptation for statistical machine translation. We incorporate topic in the phrase table in the form of sparse phrasal features and make use of sparse lexical features by determining the topic distribution of source sentences in the development and test corpus. These sparse features cover a lot of source to target topic related translations. We also develop systems with features that measure the topical similarity of the source sentence and each hypothesis. These features include features based on distributional profiles and two types of features which make use of bilingual topic models to measure the similarity of the source sentence and the hypothesis using topic vectors in source and target languages. Domain and topic adaptation is also combined to improve the translation quality. Different experiments are carried out on Farsi to English Verbmobil and CNN datasets. BLEU score shows up to 2.0 improvement on Verbmobil dataset. Up to 1.17 BLEU improvement and several individual translation corrections are observed in CNN dataset.