Gurmukhi词义消歧的Naïve贝叶斯方法

2017 6th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO) Pub Date : 2017-09-01 DOI:10.1109/ICRITO.2017.8342465

Himdweep Walia, A. Rana, Vineet Kansal

{"title":"Gurmukhi词义消歧的Naïve贝叶斯方法","authors":"Himdweep Walia, A. Rana, Vineet Kansal","doi":"10.1109/ICRITO.2017.8342465","DOIUrl":null,"url":null,"abstract":"Natural Language Processing is a technique which allows communication between the human and the machine. In this technique the major problem has been Word Sense Disambiguation (WSD). WSD is the process of uniquely identifying the correct usage of the given word, of the multiple meanings that the word may have. A lot of work is going on in this field, especially in English and European Languages. In recent years, significant work has been done in Indian Regional Languages also. Punjabi is an Indian Regional Language and Gurmukhi is its script. The WSD applies three approaches — knowledge based, corpus based and hybrid approach. The corpus based approach can be further divided into — supervised and unsupervised approach. Off the many algorithms implemented under supervised approach, Naive Bayes Approach has shown higher accuracy in WSD. For this paper we have used the Punjabi Corpora (obtained from Evaluations and Language Resources Distribution Agency, Paris, France) which has been sense-tagged with 100 words.","PeriodicalId":357118,"journal":{"name":"2017 6th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"32","resultStr":"{\"title\":\"A Naïve Bayes Approach for working on Gurmukhi Word Sense Disambiguation\",\"authors\":\"Himdweep Walia, A. Rana, Vineet Kansal\",\"doi\":\"10.1109/ICRITO.2017.8342465\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Natural Language Processing is a technique which allows communication between the human and the machine. In this technique the major problem has been Word Sense Disambiguation (WSD). WSD is the process of uniquely identifying the correct usage of the given word, of the multiple meanings that the word may have. A lot of work is going on in this field, especially in English and European Languages. In recent years, significant work has been done in Indian Regional Languages also. Punjabi is an Indian Regional Language and Gurmukhi is its script. The WSD applies three approaches — knowledge based, corpus based and hybrid approach. The corpus based approach can be further divided into — supervised and unsupervised approach. Off the many algorithms implemented under supervised approach, Naive Bayes Approach has shown higher accuracy in WSD. For this paper we have used the Punjabi Corpora (obtained from Evaluations and Language Resources Distribution Agency, Paris, France) which has been sense-tagged with 100 words.\",\"PeriodicalId\":357118,\"journal\":{\"name\":\"2017 6th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO)\",\"volume\":\"41 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"32\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 6th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICRITO.2017.8342465\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 6th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICRITO.2017.8342465","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 32

摘要

自然语言处理是一种允许人与机器之间进行交流的技术。在这种技术中，主要的问题是词义消歧。WSD是唯一识别给定单词的正确用法，以及该单词可能具有的多种含义的过程。在这个领域有很多工作正在进行，特别是在英语和欧洲语言方面。近年来，在印度地区语言方面也做了大量工作。旁遮普语是印度的一种地方语言，古尔穆克语是它的文字。水务署采用三种方法:基于知识的方法、基于语料库的方法和混合方法。基于语料库的方法可进一步分为有监督和无监督两种。在监督方法下实现的许多算法中，朴素贝叶斯方法在WSD中显示出更高的精度。在本文中，我们使用了旁遮普语料库(从法国巴黎的评估和语言资源分发机构获得)，该语料库已被意义标记为100个单词。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A Naïve Bayes Approach for working on Gurmukhi Word Sense Disambiguation

Natural Language Processing is a technique which allows communication between the human and the machine. In this technique the major problem has been Word Sense Disambiguation (WSD). WSD is the process of uniquely identifying the correct usage of the given word, of the multiple meanings that the word may have. A lot of work is going on in this field, especially in English and European Languages. In recent years, significant work has been done in Indian Regional Languages also. Punjabi is an Indian Regional Language and Gurmukhi is its script. The WSD applies three approaches — knowledge based, corpus based and hybrid approach. The corpus based approach can be further divided into — supervised and unsupervised approach. Off the many algorithms implemented under supervised approach, Naive Bayes Approach has shown higher accuracy in WSD. For this paper we have used the Punjabi Corpora (obtained from Evaluations and Language Resources Distribution Agency, Paris, France) which has been sense-tagged with 100 words.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2017 6th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO)

自引率

0.00%

发文量