Morpheme Recovery Based on Naïve Bayes Model

The Kips Transactions:partb Pub Date : 2012-06-30 DOI:10.3745/KIPSTB.2012.19B.3.195

Jae-Hoon Kim, Kil-Ho Jeon

引用次数: 0

Abstract

In Korean, spelling change in various forms must be recovered into base forms in morphological analysis as well as part-of-speech (POS) tagging is difficult without morphological analysis because Korean is agglutinative. This is one of notorious problems in Korean morphological analysis and has been solved by morpheme recovery rules, which generate morphological ambiguity resolved by POS tagging. In this paper, we propose a morpheme recovery scheme based on machine learning methods like Nave Bayes models. Input features of the models are the surrounding context of the syllable which the spelling change is occurred and categories of the models are the recovered syllables. The POS tagging system with the proposed model has demonstrated the -score of 97.5% for the ETRI tree-tagged corpus. Thus it can be decided that the proposed model is very useful to handle morpheme recovery in Korean.

查看原文本刊更多论文

基于Naïve贝叶斯模型的语素恢复

在韩国语中，各种形式的拼写变化必须在词形分析中恢复为基本形式，而且由于韩国语具有黏性，如果不进行词性分析，就很难进行词性标注。这是韩国语词法分析中最常见的问题之一，词素恢复规则解决了这一问题，词素恢复规则产生词法歧义，词法标注解决了词法歧义。在本文中，我们提出了一种基于机器学习方法(如Nave Bayes模型)的语素恢复方案。模型的输入特征是发生拼写变化的音节的周围上下文，模型的类别是恢复的音节。使用该模型的词性标注系统对ETRI树标注语料的-得分为97.5%。由此可见，该模型对朝鲜语语素恢复是非常有用的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

The Kips Transactions:partb

自引率

0.00%

发文量