Shubhangi Chaturvedi, S. Saritha, Animesh Chaturvedi
{"title":"Spark based Parallel Frequent Pattern Rules for Social Media Data Analytics","authors":"Shubhangi Chaturvedi, S. Saritha, Animesh Chaturvedi","doi":"10.1109/CCGridW59191.2023.00039","DOIUrl":null,"url":null,"abstract":"The number of users on social media are increasing, thus the data produced is also increasing tremendously. Social media data mining and analysis can produce a lot of hidden information, which can be helpful in decision-making. Prediction of the co-occurring words with confidence can provide deep insights of social media. The paper presents an applied process to mine social media dataset to retrieve frequent patterns (or rules) in cost effective time. The retrieved patterns can be useful in making decisions related to social media. The experiment is performed on three social media datasets and various rules are analyzed by varying the values of threshold (minimum support and minimum confidence). Experiments are also performed for both Frequent Pattern (FP) Growth and Parallel FP (PFP) Growth using the same datasets. The parallel computation is achieved with the help of a scalable Apache Spark environment. Execution time for both FP-Growth and PFP-Growth on the same datasets is also described. While performing experiments it is found that FP-Growth of SPMF requires preprocessing to convert item-sets into transactional databases. The pre-processing time is required only once, as a result the time required to generate rules is less. Whereas, the PFP-Growth does not require preprocessing on the dataset to generate rules. This saves time to directly generate the association rules using PFP-Growth.","PeriodicalId":341115,"journal":{"name":"2023 IEEE/ACM 23rd International Symposium on Cluster, Cloud and Internet Computing Workshops (CCGridW)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE/ACM 23rd International Symposium on Cluster, Cloud and Internet Computing Workshops (CCGridW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCGridW59191.2023.00039","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The number of users on social media are increasing, thus the data produced is also increasing tremendously. Social media data mining and analysis can produce a lot of hidden information, which can be helpful in decision-making. Prediction of the co-occurring words with confidence can provide deep insights of social media. The paper presents an applied process to mine social media dataset to retrieve frequent patterns (or rules) in cost effective time. The retrieved patterns can be useful in making decisions related to social media. The experiment is performed on three social media datasets and various rules are analyzed by varying the values of threshold (minimum support and minimum confidence). Experiments are also performed for both Frequent Pattern (FP) Growth and Parallel FP (PFP) Growth using the same datasets. The parallel computation is achieved with the help of a scalable Apache Spark environment. Execution time for both FP-Growth and PFP-Growth on the same datasets is also described. While performing experiments it is found that FP-Growth of SPMF requires preprocessing to convert item-sets into transactional databases. The pre-processing time is required only once, as a result the time required to generate rules is less. Whereas, the PFP-Growth does not require preprocessing on the dataset to generate rules. This saves time to directly generate the association rules using PFP-Growth.