僧伽罗语词义消歧技术

2014 4th International Conference on Artificial Intelligence with Applications in Engineering and Technology Pub Date : 2014-03-12 DOI:10.1109/ICAIET.2014.42

Janindu Arukgoda, V. Bandara, Samiththa Bashani, Vijayindu Gamage, Daya C. Wimalasuriya

{"title":"僧伽罗语词义消歧技术","authors":"Janindu Arukgoda, V. Bandara, Samiththa Bashani, Vijayindu Gamage, Daya C. Wimalasuriya","doi":"10.1109/ICAIET.2014.42","DOIUrl":null,"url":null,"abstract":"Word sense disambiguation is the task of identifying the implied sense of a polysemous word in a given context. There have been many efforts on word sense disambiguation for English, but the amount of efforts for Sinhala is very little. This paper presents ongoing efforts on developing a rule based word sense disambiguation algorithm using the Sinhala WordNet developed at University of Moratuwa as a basis. This is the first attempt on building such an algorithm for Sinhala. For this task we have implemented the Simplified Lesk algorithm with our own modifications under the two assumptions 'one sense per collocation' and 'one sense per discourse'. We define a window size around the target polysemous word and calculate the number of words in that window that overlap with each sense of the target polysemous word. Since there has not been many significant initiatives on natural language processing applications for Sinhala, critical resources such as functioning morphological analysis tools are not available, making accurate word sense disambiguation an even harder task. Using web articles as the data source, this system has attempted to disambiguate 10 instances of polysemous words and has been evaluated to achieve a precision of 63% and an F score 0.63.","PeriodicalId":225159,"journal":{"name":"2014 4th International Conference on Artificial Intelligence with Applications in Engineering and Technology","volume":"3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"A Word Sense Disambiguation Technique for Sinhala\",\"authors\":\"Janindu Arukgoda, V. Bandara, Samiththa Bashani, Vijayindu Gamage, Daya C. Wimalasuriya\",\"doi\":\"10.1109/ICAIET.2014.42\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Word sense disambiguation is the task of identifying the implied sense of a polysemous word in a given context. There have been many efforts on word sense disambiguation for English, but the amount of efforts for Sinhala is very little. This paper presents ongoing efforts on developing a rule based word sense disambiguation algorithm using the Sinhala WordNet developed at University of Moratuwa as a basis. This is the first attempt on building such an algorithm for Sinhala. For this task we have implemented the Simplified Lesk algorithm with our own modifications under the two assumptions 'one sense per collocation' and 'one sense per discourse'. We define a window size around the target polysemous word and calculate the number of words in that window that overlap with each sense of the target polysemous word. Since there has not been many significant initiatives on natural language processing applications for Sinhala, critical resources such as functioning morphological analysis tools are not available, making accurate word sense disambiguation an even harder task. Using web articles as the data source, this system has attempted to disambiguate 10 instances of polysemous words and has been evaluated to achieve a precision of 63% and an F score 0.63.\",\"PeriodicalId\":225159,\"journal\":{\"name\":\"2014 4th International Conference on Artificial Intelligence with Applications in Engineering and Technology\",\"volume\":\"3 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-03-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 4th International Conference on Artificial Intelligence with Applications in Engineering and Technology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICAIET.2014.42\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 4th International Conference on Artificial Intelligence with Applications in Engineering and Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICAIET.2014.42","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

词义消歧是指在给定的语境中识别多义词的隐含意义。英语在词义消歧方面做了很多努力，但僧伽罗语在词义消歧方面的努力却很少。本文介绍了以Moratuwa大学开发的僧伽罗语WordNet为基础，开发基于规则的词义消歧算法的持续努力。这是第一次尝试为僧伽罗语建立这样的算法。对于这个任务，我们在“每个搭配一种感觉”和“每个话语一种感觉”两个假设下实现了简化Lesk算法，并进行了自己的修改。我们在目标多义词周围定义一个窗口大小，并计算该窗口中与目标多义词的每个意义重叠的单词数量。由于在僧伽罗语的自然语言处理应用方面还没有很多重要的举措，关键的资源，如功能性形态学分析工具，都是不可用的，这使得准确的词义消歧成为一项更加困难的任务。该系统以网络文章为数据源，尝试消歧了10个多义词实例，并获得了63%的准确率和0.63的F分。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A Word Sense Disambiguation Technique for Sinhala

Word sense disambiguation is the task of identifying the implied sense of a polysemous word in a given context. There have been many efforts on word sense disambiguation for English, but the amount of efforts for Sinhala is very little. This paper presents ongoing efforts on developing a rule based word sense disambiguation algorithm using the Sinhala WordNet developed at University of Moratuwa as a basis. This is the first attempt on building such an algorithm for Sinhala. For this task we have implemented the Simplified Lesk algorithm with our own modifications under the two assumptions 'one sense per collocation' and 'one sense per discourse'. We define a window size around the target polysemous word and calculate the number of words in that window that overlap with each sense of the target polysemous word. Since there has not been many significant initiatives on natural language processing applications for Sinhala, critical resources such as functioning morphological analysis tools are not available, making accurate word sense disambiguation an even harder task. Using web articles as the data source, this system has attempted to disambiguate 10 instances of polysemous words and has been evaluated to achieve a precision of 63% and an F score 0.63.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2014 4th International Conference on Artificial Intelligence with Applications in Engineering and Technology

自引率

0.00%

发文量