A Novel Approach to Optimize the Performance of Hadoop Frameworks for Sentiment Analysis

Q4 Computer Science
G. Srinivasa, Amith K. Jain, Prithviraj Jain, R. NageshH.
{"title":"A Novel Approach to Optimize the Performance of Hadoop Frameworks for Sentiment Analysis","authors":"G. Srinivasa, Amith K. Jain, Prithviraj Jain, R. NageshH.","doi":"10.4018/ijossp.2019100103","DOIUrl":null,"url":null,"abstract":"Twitter is one among most popular micro blogging services with millions of active users. It is a hub of massive collection of data arriving from various sources. In Twitter, users most often express their views, opinions, thoughts, emotions or feelings about a particular topic, product or service, of their interest, choice or concern. This makes twitter a hub of gargantuan amount of data, and at the same time a useful platform in getting to know and understand the underlying sentiment behind a particular product or for that matter anything expressed in twitter as tweets. It is important to note here that aforesaid massive collection of data is not just any redundant data, but one which contains useful information as noted earlier. In view of aforesaid context, Sentiment analysis in relation to twitter data gains enormous importance. Sentiment analysis offers itself as a good approach in classifying the opinions formulated by individuals (tweeters) into different sentiments such as, positive, negative, or neutral. Implementing Sentiment analysis algorithms using conventional tools leads to high computation time, and thus are less effective. Hence, there is a need for state-of-the-art tools and techniques to be developed for sentiment analysis making it the need of the hour to facilitate faster computation. An Apache Hadoop framework is one such option that supports distributed data computing and has been commonly adopted for a variety of use-cases. In this article, the author identifies factors affecting the performance of sentiment analysis algorithms based on Hadoop framework and proposes an approach for optimizing the performance of sentiment analysis. The experimental results depict the potential of the proposed approach.","PeriodicalId":53605,"journal":{"name":"International Journal of Open Source Software and Processes","volume":"44 1","pages":"44-59"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Open Source Software and Processes","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4018/ijossp.2019100103","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"Computer Science","Score":null,"Total":0}
引用次数: 5

Abstract

Twitter is one among most popular micro blogging services with millions of active users. It is a hub of massive collection of data arriving from various sources. In Twitter, users most often express their views, opinions, thoughts, emotions or feelings about a particular topic, product or service, of their interest, choice or concern. This makes twitter a hub of gargantuan amount of data, and at the same time a useful platform in getting to know and understand the underlying sentiment behind a particular product or for that matter anything expressed in twitter as tweets. It is important to note here that aforesaid massive collection of data is not just any redundant data, but one which contains useful information as noted earlier. In view of aforesaid context, Sentiment analysis in relation to twitter data gains enormous importance. Sentiment analysis offers itself as a good approach in classifying the opinions formulated by individuals (tweeters) into different sentiments such as, positive, negative, or neutral. Implementing Sentiment analysis algorithms using conventional tools leads to high computation time, and thus are less effective. Hence, there is a need for state-of-the-art tools and techniques to be developed for sentiment analysis making it the need of the hour to facilitate faster computation. An Apache Hadoop framework is one such option that supports distributed data computing and has been commonly adopted for a variety of use-cases. In this article, the author identifies factors affecting the performance of sentiment analysis algorithms based on Hadoop framework and proposes an approach for optimizing the performance of sentiment analysis. The experimental results depict the potential of the proposed approach.
一种优化Hadoop情感分析框架性能的新方法
Twitter是最受欢迎的微博服务之一,拥有数百万活跃用户。它是来自各种来源的大量数据收集的中心。在Twitter上,用户最常表达他们对特定主题、产品或服务、他们的兴趣、选择或关注的观点、意见、想法、情感或感受。这使得twitter成为海量数据的中心,同时也是一个有用的平台,可以了解和理解特定产品背后的潜在情绪,或者在twitter上以tweet的形式表达的任何东西。这里必须指出的是,上述大量的数据收集不仅仅是任何冗余的数据,而是包含前面提到的有用信息的数据。鉴于上述背景,与twitter数据相关的情绪分析变得非常重要。情绪分析是一种很好的方法,可以将个人(推特用户)的观点分类为不同的情绪,如积极、消极或中立。使用传统工具实现情感分析算法会导致高计算时间,因此效率较低。因此,需要开发最先进的工具和技术来进行情感分析,使其成为促进更快计算的需要。Apache Hadoop框架就是这样一种选择,它支持分布式数据计算,并已被广泛用于各种用例。在本文中,作者识别了影响基于Hadoop框架的情感分析算法性能的因素,并提出了一种优化情感分析性能的方法。实验结果表明了该方法的可行性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
1.90
自引率
0.00%
发文量
16
期刊介绍: The International Journal of Open Source Software and Processes (IJOSSP) publishes high-quality peer-reviewed and original research articles on the large field of open source software and processes. This wide area entails many intriguing question and facets, including the special development process performed by a large number of geographically dispersed programmers, community issues like coordination and communication, motivations of the participants, and also economic and legal issues. Beyond this topic, open source software is an example of a highly distributed innovation process led by the users. Therefore, many aspects have relevance beyond the realm of software and its development. In this tradition, IJOSSP also publishes papers on these topics. IJOSSP is a multi-disciplinary outlet, and welcomes submissions from all relevant fields of research and applying a multitude of research approaches.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信