基于Naïve贝叶斯模型的语素恢复

Jae-Hoon Kim, Kil-Ho Jeon
{"title":"基于Naïve贝叶斯模型的语素恢复","authors":"Jae-Hoon Kim, Kil-Ho Jeon","doi":"10.3745/KIPSTB.2012.19B.3.195","DOIUrl":null,"url":null,"abstract":"In Korean, spelling change in various forms must be recovered into base forms in morphological analysis as well as part-of-speech (POS) tagging is difficult without morphological analysis because Korean is agglutinative. This is one of notorious problems in Korean morphological analysis and has been solved by morpheme recovery rules, which generate morphological ambiguity resolved by POS tagging. In this paper, we propose a morpheme recovery scheme based on machine learning methods like Nave Bayes models. Input features of the models are the surrounding context of the syllable which the spelling change is occurred and categories of the models are the recovered syllables. The POS tagging system with the proposed model has demonstrated the -score of 97.5% for the ETRI tree-tagged corpus. Thus it can be decided that the proposed model is very useful to handle morpheme recovery in Korean.","PeriodicalId":122700,"journal":{"name":"The Kips Transactions:partb","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Morpheme Recovery Based on Naïve Bayes Model\",\"authors\":\"Jae-Hoon Kim, Kil-Ho Jeon\",\"doi\":\"10.3745/KIPSTB.2012.19B.3.195\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In Korean, spelling change in various forms must be recovered into base forms in morphological analysis as well as part-of-speech (POS) tagging is difficult without morphological analysis because Korean is agglutinative. This is one of notorious problems in Korean morphological analysis and has been solved by morpheme recovery rules, which generate morphological ambiguity resolved by POS tagging. In this paper, we propose a morpheme recovery scheme based on machine learning methods like Nave Bayes models. Input features of the models are the surrounding context of the syllable which the spelling change is occurred and categories of the models are the recovered syllables. The POS tagging system with the proposed model has demonstrated the -score of 97.5% for the ETRI tree-tagged corpus. Thus it can be decided that the proposed model is very useful to handle morpheme recovery in Korean.\",\"PeriodicalId\":122700,\"journal\":{\"name\":\"The Kips Transactions:partb\",\"volume\":\"2 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-06-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"The Kips Transactions:partb\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3745/KIPSTB.2012.19B.3.195\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Kips Transactions:partb","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3745/KIPSTB.2012.19B.3.195","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

在韩国语中,各种形式的拼写变化必须在词形分析中恢复为基本形式,而且由于韩国语具有黏性,如果不进行词性分析,就很难进行词性标注。这是韩国语词法分析中最常见的问题之一,词素恢复规则解决了这一问题,词素恢复规则产生词法歧义,词法标注解决了词法歧义。在本文中,我们提出了一种基于机器学习方法(如Nave Bayes模型)的语素恢复方案。模型的输入特征是发生拼写变化的音节的周围上下文,模型的类别是恢复的音节。使用该模型的词性标注系统对ETRI树标注语料的-得分为97.5%。由此可见,该模型对朝鲜语语素恢复是非常有用的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Morpheme Recovery Based on Naïve Bayes Model
In Korean, spelling change in various forms must be recovered into base forms in morphological analysis as well as part-of-speech (POS) tagging is difficult without morphological analysis because Korean is agglutinative. This is one of notorious problems in Korean morphological analysis and has been solved by morpheme recovery rules, which generate morphological ambiguity resolved by POS tagging. In this paper, we propose a morpheme recovery scheme based on machine learning methods like Nave Bayes models. Input features of the models are the surrounding context of the syllable which the spelling change is occurred and categories of the models are the recovered syllables. The POS tagging system with the proposed model has demonstrated the -score of 97.5% for the ETRI tree-tagged corpus. Thus it can be decided that the proposed model is very useful to handle morpheme recovery in Korean.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信