Error-Driven Adaptive Language Modeling for Chinese Pinyin-to-Character Conversion

2011 International Conference on Asian Language Processing Pub Date : 2011-11-15 DOI:10.1109/IALP.2011.46

J. Huang, D. Powers

引用次数: 0

Abstract

The performance of Chinese Pinyin-to-Character conversion is severely affected when the characteristics of the training and conversion data differ. As natural language is highly variable and uncertain, it is impossible to build a complete and general language model to suit all the tasks. The traditional adaptive MAP models mix the task independent data with task dependent data using a mixture coefficient but we never can predict what style of language users have and what new domain will appear. This paper presents a statistical error-driven adaptive language modeling approach to Chinese Pinyin input system. This model can be incrementally adapted when an error occurs during Pinyin-to-Character converting time. It significantly improves Pinyin-to-Character conversion rate.

查看原文本刊更多论文

基于错误驱动的汉语拼音字符转换自适应语言建模

当训练数据和转换数据的特征不同时，会严重影响汉字拼音转换的性能。由于自然语言具有高度的可变性和不确定性，不可能建立一个完整的、通用的语言模型来适应所有的任务。传统的自适应MAP模型使用混合系数将任务独立数据与任务相关数据混合，但无法预测用户的语言风格和新领域的出现。提出了一种统计误差驱动的自适应汉语拼音输入系统语言建模方法。当在拼音到字符转换期间发生错误时，可以逐步调整此模型。它显著提高了拼音到字符的转换率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2011 International Conference on Asian Language Processing

自引率

0.00%

发文量