{"title":"Naïve Bayes classifier for Kashmiri word sense disambiguation","authors":"Tawseef Ahmad Mir, Aadil Ahmad Lawaye","doi":"10.1007/s12046-024-02551-7","DOIUrl":null,"url":null,"abstract":"<p>Many applications of Natural Language Processing (NLP) like machine translation, document clustering, and information retrieval make use of Word Sense Disambiguation (WSD). WSD automatically predicts the sense of an ambiguous word that exactly fits it as per the given situation. While it may seem very easy for humans to interpret the meaning of natural language, machines require the processing of huge amounts of data for similar tasks. In this paper, we propose an automatic WSD system for the Kashmiri language based on the Naive Bayes classifier. This work is the first attempt towards developing a WSD system for the Kashmiri language to the best of our knowledge. Bag-of-Words (BoW) and Part-of-Speech (PoS) based features are used in this study for developing the WSD system. Experiments are carried out on a manually crafted sense-tagged dataset for 60 ambiguous Kashmiri words. These 60 words are selected based on the frequency in the raw corpus collected. Senses for annotation purposes of these ambiguous words are extracted from Kashmiri WordNet. The performance of the proposed system is measured using accuracy, precision, recall and F-1 measure metrics. The proposed WSD model reported the best performance (accuracy = 89.92, precision = 0.84, recall = 0.89, F-1 measure = 0.86) when both PoS and BoW features were used at the same time.</p>","PeriodicalId":21498,"journal":{"name":"Sādhanā","volume":"96 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Sādhanā","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s12046-024-02551-7","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Many applications of Natural Language Processing (NLP) like machine translation, document clustering, and information retrieval make use of Word Sense Disambiguation (WSD). WSD automatically predicts the sense of an ambiguous word that exactly fits it as per the given situation. While it may seem very easy for humans to interpret the meaning of natural language, machines require the processing of huge amounts of data for similar tasks. In this paper, we propose an automatic WSD system for the Kashmiri language based on the Naive Bayes classifier. This work is the first attempt towards developing a WSD system for the Kashmiri language to the best of our knowledge. Bag-of-Words (BoW) and Part-of-Speech (PoS) based features are used in this study for developing the WSD system. Experiments are carried out on a manually crafted sense-tagged dataset for 60 ambiguous Kashmiri words. These 60 words are selected based on the frequency in the raw corpus collected. Senses for annotation purposes of these ambiguous words are extracted from Kashmiri WordNet. The performance of the proposed system is measured using accuracy, precision, recall and F-1 measure metrics. The proposed WSD model reported the best performance (accuracy = 89.92, precision = 0.84, recall = 0.89, F-1 measure = 0.86) when both PoS and BoW features were used at the same time.