A Word Sense Disambiguation Technique for Sinhala

Janindu Arukgoda, V. Bandara, Samiththa Bashani, Vijayindu Gamage, Daya C. Wimalasuriya
{"title":"A Word Sense Disambiguation Technique for Sinhala","authors":"Janindu Arukgoda, V. Bandara, Samiththa Bashani, Vijayindu Gamage, Daya C. Wimalasuriya","doi":"10.1109/ICAIET.2014.42","DOIUrl":null,"url":null,"abstract":"Word sense disambiguation is the task of identifying the implied sense of a polysemous word in a given context. There have been many efforts on word sense disambiguation for English, but the amount of efforts for Sinhala is very little. This paper presents ongoing efforts on developing a rule based word sense disambiguation algorithm using the Sinhala WordNet developed at University of Moratuwa as a basis. This is the first attempt on building such an algorithm for Sinhala. For this task we have implemented the Simplified Lesk algorithm with our own modifications under the two assumptions 'one sense per collocation' and 'one sense per discourse'. We define a window size around the target polysemous word and calculate the number of words in that window that overlap with each sense of the target polysemous word. Since there has not been many significant initiatives on natural language processing applications for Sinhala, critical resources such as functioning morphological analysis tools are not available, making accurate word sense disambiguation an even harder task. Using web articles as the data source, this system has attempted to disambiguate 10 instances of polysemous words and has been evaluated to achieve a precision of 63% and an F score 0.63.","PeriodicalId":225159,"journal":{"name":"2014 4th International Conference on Artificial Intelligence with Applications in Engineering and Technology","volume":"3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 4th International Conference on Artificial Intelligence with Applications in Engineering and Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICAIET.2014.42","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

Word sense disambiguation is the task of identifying the implied sense of a polysemous word in a given context. There have been many efforts on word sense disambiguation for English, but the amount of efforts for Sinhala is very little. This paper presents ongoing efforts on developing a rule based word sense disambiguation algorithm using the Sinhala WordNet developed at University of Moratuwa as a basis. This is the first attempt on building such an algorithm for Sinhala. For this task we have implemented the Simplified Lesk algorithm with our own modifications under the two assumptions 'one sense per collocation' and 'one sense per discourse'. We define a window size around the target polysemous word and calculate the number of words in that window that overlap with each sense of the target polysemous word. Since there has not been many significant initiatives on natural language processing applications for Sinhala, critical resources such as functioning morphological analysis tools are not available, making accurate word sense disambiguation an even harder task. Using web articles as the data source, this system has attempted to disambiguate 10 instances of polysemous words and has been evaluated to achieve a precision of 63% and an F score 0.63.
僧伽罗语词义消歧技术
词义消歧是指在给定的语境中识别多义词的隐含意义。英语在词义消歧方面做了很多努力,但僧伽罗语在词义消歧方面的努力却很少。本文介绍了以Moratuwa大学开发的僧伽罗语WordNet为基础,开发基于规则的词义消歧算法的持续努力。这是第一次尝试为僧伽罗语建立这样的算法。对于这个任务,我们在“每个搭配一种感觉”和“每个话语一种感觉”两个假设下实现了简化Lesk算法,并进行了自己的修改。我们在目标多义词周围定义一个窗口大小,并计算该窗口中与目标多义词的每个意义重叠的单词数量。由于在僧伽罗语的自然语言处理应用方面还没有很多重要的举措,关键的资源,如功能性形态学分析工具,都是不可用的,这使得准确的词义消歧成为一项更加困难的任务。该系统以网络文章为数据源,尝试消歧了10个多义词实例,并获得了63%的准确率和0.63的F分。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信