Combining a Two-step Conditional Random Field Model and a Joint Source Channel Model for Machine Transliteration

D. Yang, Paul R. Dixon, Yi-Cheng Pan, T. Oonishi, Masanobu Nakamura, S. Furui
{"title":"Combining a Two-step Conditional Random Field Model and a Joint Source Channel Model for Machine Transliteration","authors":"D. Yang, Paul R. Dixon, Yi-Cheng Pan, T. Oonishi, Masanobu Nakamura, S. Furui","doi":"10.3115/1699705.1699724","DOIUrl":null,"url":null,"abstract":"This paper describes our system for \"NEWS 2009 Machine Transliteration Shared Task\" (NEWS 2009). We only participated in the standard run, which is a direct orthographical mapping (DOP) between two languages without using any intermediate phonemic mapping. We propose a new two-step conditional random field (CRF) model for DOP machine transliteration, in which the first CRF segments a source word into chunks and the second CRF maps the chunks to a word in the target language. The two-step CRF model obtains a slightly lower top-1 accuracy when compared to a state-of-the-art n-gram joint source-channel model. The combination of the CRF model with the joint source-channel leads to improvements in all the tasks. The official result of our system in the NEWS 2009 shared task confirms the effectiveness of our system; where we achieved 0.627 top-1 accuracy for Japanese transliterated to Japanese Kanji(JJ), 0.713 for English-to-Chinese(E2C) and 0.510 for English-to-Japanese Katakana(E2J).","PeriodicalId":262513,"journal":{"name":"NEWS@IJCNLP","volume":"47 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"19","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"NEWS@IJCNLP","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3115/1699705.1699724","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 19

Abstract

This paper describes our system for "NEWS 2009 Machine Transliteration Shared Task" (NEWS 2009). We only participated in the standard run, which is a direct orthographical mapping (DOP) between two languages without using any intermediate phonemic mapping. We propose a new two-step conditional random field (CRF) model for DOP machine transliteration, in which the first CRF segments a source word into chunks and the second CRF maps the chunks to a word in the target language. The two-step CRF model obtains a slightly lower top-1 accuracy when compared to a state-of-the-art n-gram joint source-channel model. The combination of the CRF model with the joint source-channel leads to improvements in all the tasks. The official result of our system in the NEWS 2009 shared task confirms the effectiveness of our system; where we achieved 0.627 top-1 accuracy for Japanese transliterated to Japanese Kanji(JJ), 0.713 for English-to-Chinese(E2C) and 0.510 for English-to-Japanese Katakana(E2J).
结合两步条件随机场模型和联合源信道模型的机器音译
本文描述了“NEWS 2009机器音译共享任务”(NEWS 2009)的系统。我们只参与了标准运行,这是两种语言之间的直接正字法映射(DOP),不使用任何中间音位映射。本文提出了一种新的两步条件随机场(CRF) DOP机器音译模型,其中第一步条件随机场将源词分割成块,第二步条件随机场将块映射到目标语言中的单词。与最先进的n-gram联合源信道模型相比,两步CRF模型获得的top-1精度略低。将CRF模型与联合源-通道相结合,可以改善所有任务。本系统在NEWS 2009共享任务中的正式测试结果证实了本系统的有效性;其中,我们将日语音译为日本汉字(JJ)的准确率达到了0.627,英语到汉语(E2C)的准确率为0.713,英语到日语片假名(E2J)的准确率为0.510。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信