{"title":"An efficient framework of data mining and its analytics on massive streams of big data repositories","authors":"D. Disha, B. J. Sowmya, Chetan, S. Seema","doi":"10.1109/DISCOVER.2016.7806259","DOIUrl":null,"url":null,"abstract":"Big Data consists of huge volume of complex growing data sets from several independent sources. With the rapid development of data collection and storage capacity, big data are expanding in all science and engineering domains. The most fundamental challenge for big data applications is to scrutinize the large amount of data and extract required information or knowledge for future usage which is beyond the limit of relational databases with respect to storage and processing of massive quantity of data. Intent of this paper is by considering the big data repository as Twitter, dynamically mine the recent tweets related to Kapoor and Sons movie and perform the data mining operation and analytics on it by overcoming the challenges categorized with respect to the HACE theorem. To handle the massive amount of tweets we have used Hadoop Map Reduce framework to perform data mining analytic operations such as data cleansing, data classification and data clustering. Prediction model for the movie review is built by using Naive Bayes Classifier and accuracy of the prediction is calculated with the help of binomial test as it conforms to the Bernoulli distribution. Clustering of Tweets are obtained on the basis of Location and Hash Tags. Additionally privacy for the user tweets are preserved by using Data mining anomaly Technique, results are displayed with the help of intelligent graphs. As a Performance Evaluation of Map Reduce, the predictive analysis is done by using Map Reduce as well as without using Map Reduce, based on the execution time comparison performance graph is obtained to prove Map Reduce is an Efficient framework for huge volume of data.","PeriodicalId":383554,"journal":{"name":"2016 IEEE Distributed Computing, VLSI, Electrical Circuits and Robotics (DISCOVER)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE Distributed Computing, VLSI, Electrical Circuits and Robotics (DISCOVER)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DISCOVER.2016.7806259","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
Big Data consists of huge volume of complex growing data sets from several independent sources. With the rapid development of data collection and storage capacity, big data are expanding in all science and engineering domains. The most fundamental challenge for big data applications is to scrutinize the large amount of data and extract required information or knowledge for future usage which is beyond the limit of relational databases with respect to storage and processing of massive quantity of data. Intent of this paper is by considering the big data repository as Twitter, dynamically mine the recent tweets related to Kapoor and Sons movie and perform the data mining operation and analytics on it by overcoming the challenges categorized with respect to the HACE theorem. To handle the massive amount of tweets we have used Hadoop Map Reduce framework to perform data mining analytic operations such as data cleansing, data classification and data clustering. Prediction model for the movie review is built by using Naive Bayes Classifier and accuracy of the prediction is calculated with the help of binomial test as it conforms to the Bernoulli distribution. Clustering of Tweets are obtained on the basis of Location and Hash Tags. Additionally privacy for the user tweets are preserved by using Data mining anomaly Technique, results are displayed with the help of intelligent graphs. As a Performance Evaluation of Map Reduce, the predictive analysis is done by using Map Reduce as well as without using Map Reduce, based on the execution time comparison performance graph is obtained to prove Map Reduce is an Efficient framework for huge volume of data.