{"title":"Maximum n-Gram HMM-based Name Transliteration: Experiment in NEWS 2009 on English-Chinese Corpus","authors":"Yilu Zhou","doi":"10.3115/1699705.1699738","DOIUrl":null,"url":null,"abstract":"We propose an English-Chinese name transliteration system using a maximum N-gram Hidden Markov Model. To handle special challenges with alphabet-based and character-based language pair, we apply a two-phase transliteration model by building two HMM models, one between English and Chinese Pinyin and another between Chinese Pinyin and Chinese characters. Our model improves traditional HMM by assigning the longest prior translation sequence of syllables the largest weight. In our non-standard runs, we use a Web-mining module to boost the performance by adding online popularity information of candidate translations. The entire model does not rely on any dictionaries and the probability tables are derived merely from training corpus. In participation of NEWS 2009 experiment, our model achieved 0.462 Top-1 accuracy and 0.764 Mean F-score.","PeriodicalId":262513,"journal":{"name":"NEWS@IJCNLP","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"NEWS@IJCNLP","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3115/1699705.1699738","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
We propose an English-Chinese name transliteration system using a maximum N-gram Hidden Markov Model. To handle special challenges with alphabet-based and character-based language pair, we apply a two-phase transliteration model by building two HMM models, one between English and Chinese Pinyin and another between Chinese Pinyin and Chinese characters. Our model improves traditional HMM by assigning the longest prior translation sequence of syllables the largest weight. In our non-standard runs, we use a Web-mining module to boost the performance by adding online popularity information of candidate translations. The entire model does not rely on any dictionaries and the probability tables are derived merely from training corpus. In participation of NEWS 2009 experiment, our model achieved 0.462 Top-1 accuracy and 0.764 Mean F-score.
提出了一种基于最大n元隐马尔可夫模型的英汉人名音译系统。为了解决基于字母和基于字符的语言对的特殊挑战,我们通过建立英汉拼音和汉语拼音与汉字之间的HMM模型,应用了两阶段的音译模型。我们的模型通过赋予最长的音节先验翻译序列最大的权重来改进传统HMM。在我们的非标准运行中,我们使用web挖掘模块通过添加候选翻译的在线流行度信息来提高性能。整个模型不依赖任何字典,概率表仅来自训练语料库。在参与NEWS 2009实验中,我们的模型达到了0.462 Top-1准确率和0.764 Mean F-score。