{"title":"基于Spark的大意见挖掘的分布式联合情感和主题建模","authors":"Esmaeil Zahedi, Zahra Baniasadi, M. Saraee","doi":"10.1109/IRANIANCEE.2017.7985276","DOIUrl":null,"url":null,"abstract":"Opinion data are produced rapidly by a large and uncontrolled number of opinion holders in different domains (public, business, politic and etc). The volume, variety and velocity of such data requires an opinion mining model to be also adopted with the ever growing and huge volume of opinions and obtaining the probabilistic generative model advantages. In this paper we propose a parallel implementation of joint sentiment and topic (JST) model for simultaneously discovering topics and sentiments from reviews on Spark. Spark is an open source and fast cluster computing framework for large-scale data processing. Here we discuss the implementation of JST on Spark and also discuss the benefit of using Spark while exploring the challenges encountered. We used different Amazon opinion datasets with different volume such as (reviews of electronic devices, book, restaurants, DVD and kitchen). The results present significant speedup and high efficiency on larger scale dataset in our experiments.","PeriodicalId":161929,"journal":{"name":"2017 Iranian Conference on Electrical Engineering (ICEE)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"A distributed joint sentiment and topic modeling using Spark for big opinion mining\",\"authors\":\"Esmaeil Zahedi, Zahra Baniasadi, M. Saraee\",\"doi\":\"10.1109/IRANIANCEE.2017.7985276\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Opinion data are produced rapidly by a large and uncontrolled number of opinion holders in different domains (public, business, politic and etc). The volume, variety and velocity of such data requires an opinion mining model to be also adopted with the ever growing and huge volume of opinions and obtaining the probabilistic generative model advantages. In this paper we propose a parallel implementation of joint sentiment and topic (JST) model for simultaneously discovering topics and sentiments from reviews on Spark. Spark is an open source and fast cluster computing framework for large-scale data processing. Here we discuss the implementation of JST on Spark and also discuss the benefit of using Spark while exploring the challenges encountered. We used different Amazon opinion datasets with different volume such as (reviews of electronic devices, book, restaurants, DVD and kitchen). The results present significant speedup and high efficiency on larger scale dataset in our experiments.\",\"PeriodicalId\":161929,\"journal\":{\"name\":\"2017 Iranian Conference on Electrical Engineering (ICEE)\",\"volume\":\"25 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-07-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 Iranian Conference on Electrical Engineering (ICEE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IRANIANCEE.2017.7985276\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 Iranian Conference on Electrical Engineering (ICEE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IRANIANCEE.2017.7985276","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A distributed joint sentiment and topic modeling using Spark for big opinion mining
Opinion data are produced rapidly by a large and uncontrolled number of opinion holders in different domains (public, business, politic and etc). The volume, variety and velocity of such data requires an opinion mining model to be also adopted with the ever growing and huge volume of opinions and obtaining the probabilistic generative model advantages. In this paper we propose a parallel implementation of joint sentiment and topic (JST) model for simultaneously discovering topics and sentiments from reviews on Spark. Spark is an open source and fast cluster computing framework for large-scale data processing. Here we discuss the implementation of JST on Spark and also discuss the benefit of using Spark while exploring the challenges encountered. We used different Amazon opinion datasets with different volume such as (reviews of electronic devices, book, restaurants, DVD and kitchen). The results present significant speedup and high efficiency on larger scale dataset in our experiments.