G. Srinivasa, Amith K. Jain, Prithviraj Jain, R. NageshH.
{"title":"A Novel Approach to Optimize the Performance of Hadoop Frameworks for Sentiment Analysis","authors":"G. Srinivasa, Amith K. Jain, Prithviraj Jain, R. NageshH.","doi":"10.4018/ijossp.2019100103","DOIUrl":null,"url":null,"abstract":"Twitter is one among most popular micro blogging services with millions of active users. It is a hub of massive collection of data arriving from various sources. In Twitter, users most often express their views, opinions, thoughts, emotions or feelings about a particular topic, product or service, of their interest, choice or concern. This makes twitter a hub of gargantuan amount of data, and at the same time a useful platform in getting to know and understand the underlying sentiment behind a particular product or for that matter anything expressed in twitter as tweets. It is important to note here that aforesaid massive collection of data is not just any redundant data, but one which contains useful information as noted earlier. In view of aforesaid context, Sentiment analysis in relation to twitter data gains enormous importance. Sentiment analysis offers itself as a good approach in classifying the opinions formulated by individuals (tweeters) into different sentiments such as, positive, negative, or neutral. Implementing Sentiment analysis algorithms using conventional tools leads to high computation time, and thus are less effective. Hence, there is a need for state-of-the-art tools and techniques to be developed for sentiment analysis making it the need of the hour to facilitate faster computation. An Apache Hadoop framework is one such option that supports distributed data computing and has been commonly adopted for a variety of use-cases. In this article, the author identifies factors affecting the performance of sentiment analysis algorithms based on Hadoop framework and proposes an approach for optimizing the performance of sentiment analysis. The experimental results depict the potential of the proposed approach.","PeriodicalId":53605,"journal":{"name":"International Journal of Open Source Software and Processes","volume":"44 1","pages":"44-59"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Open Source Software and Processes","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4018/ijossp.2019100103","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"Computer Science","Score":null,"Total":0}
引用次数: 5
Abstract
Twitter is one among most popular micro blogging services with millions of active users. It is a hub of massive collection of data arriving from various sources. In Twitter, users most often express their views, opinions, thoughts, emotions or feelings about a particular topic, product or service, of their interest, choice or concern. This makes twitter a hub of gargantuan amount of data, and at the same time a useful platform in getting to know and understand the underlying sentiment behind a particular product or for that matter anything expressed in twitter as tweets. It is important to note here that aforesaid massive collection of data is not just any redundant data, but one which contains useful information as noted earlier. In view of aforesaid context, Sentiment analysis in relation to twitter data gains enormous importance. Sentiment analysis offers itself as a good approach in classifying the opinions formulated by individuals (tweeters) into different sentiments such as, positive, negative, or neutral. Implementing Sentiment analysis algorithms using conventional tools leads to high computation time, and thus are less effective. Hence, there is a need for state-of-the-art tools and techniques to be developed for sentiment analysis making it the need of the hour to facilitate faster computation. An Apache Hadoop framework is one such option that supports distributed data computing and has been commonly adopted for a variety of use-cases. In this article, the author identifies factors affecting the performance of sentiment analysis algorithms based on Hadoop framework and proposes an approach for optimizing the performance of sentiment analysis. The experimental results depict the potential of the proposed approach.
期刊介绍:
The International Journal of Open Source Software and Processes (IJOSSP) publishes high-quality peer-reviewed and original research articles on the large field of open source software and processes. This wide area entails many intriguing question and facets, including the special development process performed by a large number of geographically dispersed programmers, community issues like coordination and communication, motivations of the participants, and also economic and legal issues. Beyond this topic, open source software is an example of a highly distributed innovation process led by the users. Therefore, many aspects have relevance beyond the realm of software and its development. In this tradition, IJOSSP also publishes papers on these topics. IJOSSP is a multi-disciplinary outlet, and welcomes submissions from all relevant fields of research and applying a multitude of research approaches.