Janindu Arukgoda, V. Bandara, Samiththa Bashani, Vijayindu Gamage, Daya C. Wimalasuriya
{"title":"僧伽罗语词义消歧技术","authors":"Janindu Arukgoda, V. Bandara, Samiththa Bashani, Vijayindu Gamage, Daya C. Wimalasuriya","doi":"10.1109/ICAIET.2014.42","DOIUrl":null,"url":null,"abstract":"Word sense disambiguation is the task of identifying the implied sense of a polysemous word in a given context. There have been many efforts on word sense disambiguation for English, but the amount of efforts for Sinhala is very little. This paper presents ongoing efforts on developing a rule based word sense disambiguation algorithm using the Sinhala WordNet developed at University of Moratuwa as a basis. This is the first attempt on building such an algorithm for Sinhala. For this task we have implemented the Simplified Lesk algorithm with our own modifications under the two assumptions 'one sense per collocation' and 'one sense per discourse'. We define a window size around the target polysemous word and calculate the number of words in that window that overlap with each sense of the target polysemous word. Since there has not been many significant initiatives on natural language processing applications for Sinhala, critical resources such as functioning morphological analysis tools are not available, making accurate word sense disambiguation an even harder task. Using web articles as the data source, this system has attempted to disambiguate 10 instances of polysemous words and has been evaluated to achieve a precision of 63% and an F score 0.63.","PeriodicalId":225159,"journal":{"name":"2014 4th International Conference on Artificial Intelligence with Applications in Engineering and Technology","volume":"3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"A Word Sense Disambiguation Technique for Sinhala\",\"authors\":\"Janindu Arukgoda, V. Bandara, Samiththa Bashani, Vijayindu Gamage, Daya C. Wimalasuriya\",\"doi\":\"10.1109/ICAIET.2014.42\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Word sense disambiguation is the task of identifying the implied sense of a polysemous word in a given context. There have been many efforts on word sense disambiguation for English, but the amount of efforts for Sinhala is very little. This paper presents ongoing efforts on developing a rule based word sense disambiguation algorithm using the Sinhala WordNet developed at University of Moratuwa as a basis. This is the first attempt on building such an algorithm for Sinhala. For this task we have implemented the Simplified Lesk algorithm with our own modifications under the two assumptions 'one sense per collocation' and 'one sense per discourse'. We define a window size around the target polysemous word and calculate the number of words in that window that overlap with each sense of the target polysemous word. Since there has not been many significant initiatives on natural language processing applications for Sinhala, critical resources such as functioning morphological analysis tools are not available, making accurate word sense disambiguation an even harder task. Using web articles as the data source, this system has attempted to disambiguate 10 instances of polysemous words and has been evaluated to achieve a precision of 63% and an F score 0.63.\",\"PeriodicalId\":225159,\"journal\":{\"name\":\"2014 4th International Conference on Artificial Intelligence with Applications in Engineering and Technology\",\"volume\":\"3 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-03-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 4th International Conference on Artificial Intelligence with Applications in Engineering and Technology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICAIET.2014.42\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 4th International Conference on Artificial Intelligence with Applications in Engineering and Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICAIET.2014.42","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Word sense disambiguation is the task of identifying the implied sense of a polysemous word in a given context. There have been many efforts on word sense disambiguation for English, but the amount of efforts for Sinhala is very little. This paper presents ongoing efforts on developing a rule based word sense disambiguation algorithm using the Sinhala WordNet developed at University of Moratuwa as a basis. This is the first attempt on building such an algorithm for Sinhala. For this task we have implemented the Simplified Lesk algorithm with our own modifications under the two assumptions 'one sense per collocation' and 'one sense per discourse'. We define a window size around the target polysemous word and calculate the number of words in that window that overlap with each sense of the target polysemous word. Since there has not been many significant initiatives on natural language processing applications for Sinhala, critical resources such as functioning morphological analysis tools are not available, making accurate word sense disambiguation an even harder task. Using web articles as the data source, this system has attempted to disambiguate 10 instances of polysemous words and has been evaluated to achieve a precision of 63% and an F score 0.63.