基于词典的马拉雅拉姆语情感分析系统

2017 International Conference on Computing Methodologies and Communication (ICCMC) Pub Date : 2017-07-01 DOI:10.1109/ICCMC.2017.8282571

M. Ashna, Ancy K Sunny

{"title":"基于词典的马拉雅拉姆语情感分析系统","authors":"M. Ashna, Ancy K Sunny","doi":"10.1109/ICCMC.2017.8282571","DOIUrl":null,"url":null,"abstract":"Sentiment Analysis is a natural language processing task that mines information from various text forms such as reviews, news, and blogs and classifies them by their polarity as positive, negative or neutral. Mining sentiments in Malayalam come with a lot of issues and challenges. As compared to English, Malayalam is a free order and morphologically rich language, which adds complexity while handling the user- generated content. Much of the research in Malayalam sentiment analysis has been done using different supervised learning techniques. Although the Supervised learning methods provide better accuracy compared to dictionary-based approach, supervised learning method cannot perform well without sufficient training examples. The accuracy of supervised learning method is directly related to the quality of training corpus created. In Dictionary based approach a sentiment lexicon is created from a pre-annotated seed list of words and its synonyms and antonyms obtained from WorldNet for the purpose of classifying the sentiment. Compared to supervised learning techniques dictionary based approach takes less processing time. But there is no sentiment lexicon readily available for the Malayalam language. So in order to perform sentiment analysis by using lexicon based approach, a sentiment lexicon should be created. In this work, a lexicon based document-level sentiment analysis system is proposed for Malayalam language. Dictionary based approach is used to develop the Malayalam sentiment lexicon. This is because, dictionary based method is typically more efficient than other approaches and include all the words. Besides, dictionary approach is not domain specific that means it is applicable to all domains. The proposed system gives an accuracy of 87.5% for sentence level classification and 90% for document-level classification.","PeriodicalId":163288,"journal":{"name":"2017 International Conference on Computing Methodologies and Communication (ICCMC)","volume":"72 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"Lexicon based sentiment analysis system for malayalam language\",\"authors\":\"M. Ashna, Ancy K Sunny\",\"doi\":\"10.1109/ICCMC.2017.8282571\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Sentiment Analysis is a natural language processing task that mines information from various text forms such as reviews, news, and blogs and classifies them by their polarity as positive, negative or neutral. Mining sentiments in Malayalam come with a lot of issues and challenges. As compared to English, Malayalam is a free order and morphologically rich language, which adds complexity while handling the user- generated content. Much of the research in Malayalam sentiment analysis has been done using different supervised learning techniques. Although the Supervised learning methods provide better accuracy compared to dictionary-based approach, supervised learning method cannot perform well without sufficient training examples. The accuracy of supervised learning method is directly related to the quality of training corpus created. In Dictionary based approach a sentiment lexicon is created from a pre-annotated seed list of words and its synonyms and antonyms obtained from WorldNet for the purpose of classifying the sentiment. Compared to supervised learning techniques dictionary based approach takes less processing time. But there is no sentiment lexicon readily available for the Malayalam language. So in order to perform sentiment analysis by using lexicon based approach, a sentiment lexicon should be created. In this work, a lexicon based document-level sentiment analysis system is proposed for Malayalam language. Dictionary based approach is used to develop the Malayalam sentiment lexicon. This is because, dictionary based method is typically more efficient than other approaches and include all the words. Besides, dictionary approach is not domain specific that means it is applicable to all domains. The proposed system gives an accuracy of 87.5% for sentence level classification and 90% for document-level classification.\",\"PeriodicalId\":163288,\"journal\":{\"name\":\"2017 International Conference on Computing Methodologies and Communication (ICCMC)\",\"volume\":\"72 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 International Conference on Computing Methodologies and Communication (ICCMC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCMC.2017.8282571\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 International Conference on Computing Methodologies and Communication (ICCMC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCMC.2017.8282571","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 8

摘要

情感分析是一项自然语言处理任务，它从各种文本形式(如评论、新闻和博客)中挖掘信息，并根据它们的极性将它们分类为积极、消极或中性。马拉雅拉姆的矿业情绪伴随着许多问题和挑战。与英语相比，马拉雅拉姆语是一种自由秩序和形态丰富的语言，在处理用户生成的内容时增加了复杂性。马拉雅拉姆情感分析的许多研究都是使用不同的监督学习技术完成的。尽管与基于字典的方法相比，监督式学习方法提供了更好的准确性，但如果没有足够的训练样例，监督式学习方法无法很好地发挥作用。监督学习方法的准确性直接关系到所创建的训练语料库的质量。在基于字典的方法中，从世界网络上获得的预先标注的词及其同义词和反义词种子列表中创建情感词典，用于对情感进行分类。与有监督学习技术相比，基于字典的学习方法处理时间更短。但是马拉雅拉姆语没有现成的情感词汇。因此，为了使用基于词汇的方法进行情感分析，需要创建情感词典。本文提出了一种基于词典的马拉雅拉姆语文档级情感分析系统。采用基于词典的方法开发马拉雅拉姆语情感词典。这是因为，基于字典的方法通常比其他方法更有效，并且包含所有的单词。此外，字典方法不是特定于领域的，这意味着它适用于所有领域。该系统的句子级分类准确率为87.5%，文档级分类准确率为90%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Lexicon based sentiment analysis system for malayalam language

Sentiment Analysis is a natural language processing task that mines information from various text forms such as reviews, news, and blogs and classifies them by their polarity as positive, negative or neutral. Mining sentiments in Malayalam come with a lot of issues and challenges. As compared to English, Malayalam is a free order and morphologically rich language, which adds complexity while handling the user- generated content. Much of the research in Malayalam sentiment analysis has been done using different supervised learning techniques. Although the Supervised learning methods provide better accuracy compared to dictionary-based approach, supervised learning method cannot perform well without sufficient training examples. The accuracy of supervised learning method is directly related to the quality of training corpus created. In Dictionary based approach a sentiment lexicon is created from a pre-annotated seed list of words and its synonyms and antonyms obtained from WorldNet for the purpose of classifying the sentiment. Compared to supervised learning techniques dictionary based approach takes less processing time. But there is no sentiment lexicon readily available for the Malayalam language. So in order to perform sentiment analysis by using lexicon based approach, a sentiment lexicon should be created. In this work, a lexicon based document-level sentiment analysis system is proposed for Malayalam language. Dictionary based approach is used to develop the Malayalam sentiment lexicon. This is because, dictionary based method is typically more efficient than other approaches and include all the words. Besides, dictionary approach is not domain specific that means it is applicable to all domains. The proposed system gives an accuracy of 87.5% for sentence level classification and 90% for document-level classification.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2017 International Conference on Computing Methodologies and Communication (ICCMC)

自引率

0.00%

发文量