A joint source-channel model for machine transliteration

Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics - ACL '04 Pub Date : 1900-01-01 DOI:10.3115/1218955.1218976

Li Haizhou, Zhang Min, S. Jian

引用次数: 280

Abstract

Most foreign names are transliterated into Chinese, Japanese or Korean with approximate phonetic equivalents. The transliteration is usually achieved through intermediate phonemic mapping. This paper presents a new framework that allows direct orthographical mapping (DOM) between two different languages, through a joint source-channel model, also called n-gram transliteration model (TM). With the n-gram TM model, we automate the orthographic alignment process to derive the aligned transliteration units from a bilingual dictionary. The n-gram TM under the DOM framework greatly reduces system development effort and provides a quantum leap in improvement in transliteration accuracy over that of other state-of-the-art machine learning algorithms. The modeling framework is validated through several experiments for English-Chinese language pair.

查看原文本刊更多论文

机器音译的联合源信道模型

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics - ACL '04

自引率

0.00%

发文量