印尼语词性标注的隐马尔可夫模型- Ngram & Viterbi

D. E. Cahyani, Mtchael Juan Vindiyanto
{"title":"印尼语词性标注的隐马尔可夫模型- Ngram & Viterbi","authors":"D. E. Cahyani, Mtchael Juan Vindiyanto","doi":"10.1109/ICITISEE48480.2019.9003989","DOIUrl":null,"url":null,"abstract":"Part of Speech (POS) Tagging is a process of labelling word classes on sentences. One of the POS Tagging problems is some words that spelt the same but have a different POS Tag depending on the context of the sentence (ambiguity). The approach to solving this problem is using the Hidden Markov Model (HMM) Ngram Algorithm and the Viterbi Algorithm. This study discusses the development of a system for Indonesian POS Tagging using the HMM N-gram algorithm (Bigram and Trigram) and the Viterbi algorithm and compares the result between the HMM Bigram and HMM trigram. An Indonesian language corpus that has been manually labeled called Indonesian Manually Tagged Corpus is used as the knowledge for the system. Then the corpus is processed using the HMM N-gram algorithm to get the rules. Furthermore, process the data with Viterbi algorithm using the previous formed rules to determine the POS tag with the highest probability. The highest accuracy results is 77.56% using the HMM Bigram - Viterbi Algorithm. While the HMM Trigram– Viterbi algorithm has the highest accuracy of 61.67%. The result shows that the system can solve the problem of tag ambiguity with HMM Ngram – Viterbi algorithm and the accuracy of HMM Bigram is better than the HMM Trigram.","PeriodicalId":380472,"journal":{"name":"2019 4th International Conference on Information Technology, Information Systems and Electrical Engineering (ICITISEE)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Indonesian Part of Speech Tagging Using Hidden Markov Model – Ngram & Viterbi\",\"authors\":\"D. E. Cahyani, Mtchael Juan Vindiyanto\",\"doi\":\"10.1109/ICITISEE48480.2019.9003989\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Part of Speech (POS) Tagging is a process of labelling word classes on sentences. One of the POS Tagging problems is some words that spelt the same but have a different POS Tag depending on the context of the sentence (ambiguity). The approach to solving this problem is using the Hidden Markov Model (HMM) Ngram Algorithm and the Viterbi Algorithm. This study discusses the development of a system for Indonesian POS Tagging using the HMM N-gram algorithm (Bigram and Trigram) and the Viterbi algorithm and compares the result between the HMM Bigram and HMM trigram. An Indonesian language corpus that has been manually labeled called Indonesian Manually Tagged Corpus is used as the knowledge for the system. Then the corpus is processed using the HMM N-gram algorithm to get the rules. Furthermore, process the data with Viterbi algorithm using the previous formed rules to determine the POS tag with the highest probability. The highest accuracy results is 77.56% using the HMM Bigram - Viterbi Algorithm. While the HMM Trigram– Viterbi algorithm has the highest accuracy of 61.67%. The result shows that the system can solve the problem of tag ambiguity with HMM Ngram – Viterbi algorithm and the accuracy of HMM Bigram is better than the HMM Trigram.\",\"PeriodicalId\":380472,\"journal\":{\"name\":\"2019 4th International Conference on Information Technology, Information Systems and Electrical Engineering (ICITISEE)\",\"volume\":\"45 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 4th International Conference on Information Technology, Information Systems and Electrical Engineering (ICITISEE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICITISEE48480.2019.9003989\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 4th International Conference on Information Technology, Information Systems and Electrical Engineering (ICITISEE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICITISEE48480.2019.9003989","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6

摘要

词性标注是在句子上标注词类的过程。词性标注问题之一是一些拼写相同的单词,但根据句子的上下文有不同的词性标注(歧义)。解决这一问题的方法是使用隐马尔可夫模型(HMM) Ngram算法和Viterbi算法。本研究讨论了使用HMM N-gram算法(Bigram和Trigram)和Viterbi算法开发印尼语词性标注系统,并比较了HMM Bigram和HMM triram的结果。人工标记的印尼语语料库称为印尼语手动标记语料库,用作系统的知识。然后使用HMM N-gram算法对语料库进行处理,得到规则。然后,使用前面形成的规则对数据进行Viterbi算法处理,确定概率最大的POS标签。使用HMM Bigram - Viterbi算法,准确率最高,达到77.56%。而HMM Trigram - Viterbi算法的准确率最高,为61.67%。结果表明,该系统可以使用HMM Ngram - Viterbi算法解决标签歧义问题,并且HMM Bigram的准确率优于HMM Trigram。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Indonesian Part of Speech Tagging Using Hidden Markov Model – Ngram & Viterbi
Part of Speech (POS) Tagging is a process of labelling word classes on sentences. One of the POS Tagging problems is some words that spelt the same but have a different POS Tag depending on the context of the sentence (ambiguity). The approach to solving this problem is using the Hidden Markov Model (HMM) Ngram Algorithm and the Viterbi Algorithm. This study discusses the development of a system for Indonesian POS Tagging using the HMM N-gram algorithm (Bigram and Trigram) and the Viterbi algorithm and compares the result between the HMM Bigram and HMM trigram. An Indonesian language corpus that has been manually labeled called Indonesian Manually Tagged Corpus is used as the knowledge for the system. Then the corpus is processed using the HMM N-gram algorithm to get the rules. Furthermore, process the data with Viterbi algorithm using the previous formed rules to determine the POS tag with the highest probability. The highest accuracy results is 77.56% using the HMM Bigram - Viterbi Algorithm. While the HMM Trigram– Viterbi algorithm has the highest accuracy of 61.67%. The result shows that the system can solve the problem of tag ambiguity with HMM Ngram – Viterbi algorithm and the accuracy of HMM Bigram is better than the HMM Trigram.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信