{"title":"越南语变音符恢复的机器翻译方法","authors":"T. Do, Duy Binh Nguyen, Dang-Khoa Mac, D. Tran","doi":"10.1109/IALP.2013.30","DOIUrl":null,"url":null,"abstract":"The diacritic marks exist in many languages such as French, German, Slovak, Vietnamese, etc. However for some reasons, sometime they are omitted in writing. This phenomenon may lead to the ambiguity for reader when reading a non-diacritic text. The automatic diacritic restoration problem has been proposed and resolved in several languages using the character-based approach, word-based approach, point-wise approach, etc. However, these approaches lean heavily on the linguistics information, size of training corpus and sometime they are language dependent. In this paper, a simple and effective restoration method will be presented. The machine translation approach will be used as a new solution for this problem. The restoration method has been applied for Vietnamese language, and integrated in an Android application named VIVA (Vietnamese Voice Assistant) that reads out the content of incoming text messages on mobile phone. Our experiments show that the proposed restoration method can recover diacritic marks with a 99.0% accuracy rate.","PeriodicalId":413833,"journal":{"name":"2013 International Conference on Asian Language Processing","volume":"120 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":"{\"title\":\"Machine Translation Approach for Vietnamese Diacritic Restoration\",\"authors\":\"T. Do, Duy Binh Nguyen, Dang-Khoa Mac, D. Tran\",\"doi\":\"10.1109/IALP.2013.30\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The diacritic marks exist in many languages such as French, German, Slovak, Vietnamese, etc. However for some reasons, sometime they are omitted in writing. This phenomenon may lead to the ambiguity for reader when reading a non-diacritic text. The automatic diacritic restoration problem has been proposed and resolved in several languages using the character-based approach, word-based approach, point-wise approach, etc. However, these approaches lean heavily on the linguistics information, size of training corpus and sometime they are language dependent. In this paper, a simple and effective restoration method will be presented. The machine translation approach will be used as a new solution for this problem. The restoration method has been applied for Vietnamese language, and integrated in an Android application named VIVA (Vietnamese Voice Assistant) that reads out the content of incoming text messages on mobile phone. Our experiments show that the proposed restoration method can recover diacritic marks with a 99.0% accuracy rate.\",\"PeriodicalId\":413833,\"journal\":{\"name\":\"2013 International Conference on Asian Language Processing\",\"volume\":\"120 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-08-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"10\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 International Conference on Asian Language Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IALP.2013.30\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 International Conference on Asian Language Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IALP.2013.30","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Machine Translation Approach for Vietnamese Diacritic Restoration
The diacritic marks exist in many languages such as French, German, Slovak, Vietnamese, etc. However for some reasons, sometime they are omitted in writing. This phenomenon may lead to the ambiguity for reader when reading a non-diacritic text. The automatic diacritic restoration problem has been proposed and resolved in several languages using the character-based approach, word-based approach, point-wise approach, etc. However, these approaches lean heavily on the linguistics information, size of training corpus and sometime they are language dependent. In this paper, a simple and effective restoration method will be presented. The machine translation approach will be used as a new solution for this problem. The restoration method has been applied for Vietnamese language, and integrated in an Android application named VIVA (Vietnamese Voice Assistant) that reads out the content of incoming text messages on mobile phone. Our experiments show that the proposed restoration method can recover diacritic marks with a 99.0% accuracy rate.