Semi-supervised approach for Persian word sense disambiguation

Mohamadreza Mahmoodvand, Maryam Hourali
{"title":"Semi-supervised approach for Persian word sense disambiguation","authors":"Mohamadreza Mahmoodvand, Maryam Hourali","doi":"10.1109/ICCKE.2017.8167937","DOIUrl":null,"url":null,"abstract":"Word-sense disambiguation is one of the key concepts in natural language processing. The main goal of a language is to present a specific concept to the audience. This concept is extracted from the meaning of words in that language. System should be able to identify role and meaning of words in order to identify the concepts in texts properly. This issue becomes more problematic if there are words that take different meanings because of their surrounding words. Regarding that different practical programs have been developed in Persian language, it is vital now to find a solution for word-sense disambiguation in Persian language. Lack of training data is the biggest challenge in the course of word-sense disambiguation in Persian language. In order to face this problem, machine learning approach with minimal supervision is employed in this research. The applied method tries to disambiguate word senses by considering defined features of target words and applying collaborative learning method. Extracted corpus from published news by news agencies is used as the reference corpus. Evaluating the program by the available corpus on three considered ambiguous words, the implemented method has been able to properly identify the meaning of 5368 documents with 88% recall, 95% precision and 93% accuracy rate.","PeriodicalId":151934,"journal":{"name":"2017 7th International Conference on Computer and Knowledge Engineering (ICCKE)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 7th International Conference on Computer and Knowledge Engineering (ICCKE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCKE.2017.8167937","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

Abstract

Word-sense disambiguation is one of the key concepts in natural language processing. The main goal of a language is to present a specific concept to the audience. This concept is extracted from the meaning of words in that language. System should be able to identify role and meaning of words in order to identify the concepts in texts properly. This issue becomes more problematic if there are words that take different meanings because of their surrounding words. Regarding that different practical programs have been developed in Persian language, it is vital now to find a solution for word-sense disambiguation in Persian language. Lack of training data is the biggest challenge in the course of word-sense disambiguation in Persian language. In order to face this problem, machine learning approach with minimal supervision is employed in this research. The applied method tries to disambiguate word senses by considering defined features of target words and applying collaborative learning method. Extracted corpus from published news by news agencies is used as the reference corpus. Evaluating the program by the available corpus on three considered ambiguous words, the implemented method has been able to properly identify the meaning of 5368 documents with 88% recall, 95% precision and 93% accuracy rate.
波斯语词义消歧的半监督方法
词义消歧是自然语言处理中的关键概念之一。语言的主要目的是向听众呈现一个特定的概念。这个概念是从那种语言的词义中提炼出来的。系统应该能够识别单词的作用和意义,以便正确识别文本中的概念。如果有些单词由于其周围的单词而具有不同的含义,那么这个问题就变得更成问题了。鉴于波斯语已经开发出了不同的实用程序,现在寻找波斯语词义消歧的解决方案至关重要。训练数据的缺乏是波斯语词义消歧过程中面临的最大挑战。为了解决这一问题,本研究采用了最小监督的机器学习方法。该方法通过考虑目标词的已定义特征并采用协作学习方法来消除词义歧义。从新闻机构发布的新闻中提取语料库作为参考语料库。利用现有语料库对三个考虑过的歧义词进行评估,实现的方法能够正确识别5368个文档的含义,查全率为88%,准确率为95%,正确率为93%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信