{"title":"改进的基于内存计算框架的购物篮分析","authors":"Thanmayee, H. Prasad","doi":"10.1109/ISCO.2017.7855955","DOIUrl":null,"url":null,"abstract":"Data sets are growing day by day as they are being captured by information sensing devices such as mobiles, computers, wireless sensor networks, cameras, software logs, weblogs, remote sensing in various fields such as medical, engineering, science and many more. These large data sets are now called Big Data. Working with Big Data is not a common task. As this large data set has information hidden within them, researchers cannot and they have not ignored the large data set. Data mining is an interdisciplinary field in Computer Science which extracts information or the hidden patterns from data. Association rule mining and frequent itemset mining are popular data mining techniques that requires entire data to be in main memory. But large datasets does not fit into main memory. To handle this drawback, Hadoop MapReduce approach is used which has scalability and robustness features to handle large datasets. Apriori, Eclat and FP Growth are well known Frequent Itemset Mining algorithms. These algorithms are revised to work with Big Data using Hadoop MapReduce. But MapReduce framework has problems such as it stores the intermediate data in local disk. So the data needs to be accessed from the local disk which results in high latency problem. To address this issue Spark follows a general execution model that helps in in-memory computing and optimization of arbitrary operator graphs so that querying data becomes much faster when compared to the disk based engines like MapReduce. Thus the paper focuses on enhancing the performance of Frequent Itemset Mining using Apache Spark architecture and study the performance of this Revamped Market Basket Analysis based on FP-Growth by comparing it with Hadoop MapReduce implementation of Frequent Itemset Mining task, BigFIM and also with different datasets.","PeriodicalId":321113,"journal":{"name":"2017 11th International Conference on Intelligent Systems and Control (ISCO)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Revamped Market-Basket Analysis using In-Memory Computation framework\",\"authors\":\"Thanmayee, H. Prasad\",\"doi\":\"10.1109/ISCO.2017.7855955\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Data sets are growing day by day as they are being captured by information sensing devices such as mobiles, computers, wireless sensor networks, cameras, software logs, weblogs, remote sensing in various fields such as medical, engineering, science and many more. These large data sets are now called Big Data. Working with Big Data is not a common task. As this large data set has information hidden within them, researchers cannot and they have not ignored the large data set. Data mining is an interdisciplinary field in Computer Science which extracts information or the hidden patterns from data. Association rule mining and frequent itemset mining are popular data mining techniques that requires entire data to be in main memory. But large datasets does not fit into main memory. To handle this drawback, Hadoop MapReduce approach is used which has scalability and robustness features to handle large datasets. Apriori, Eclat and FP Growth are well known Frequent Itemset Mining algorithms. These algorithms are revised to work with Big Data using Hadoop MapReduce. But MapReduce framework has problems such as it stores the intermediate data in local disk. So the data needs to be accessed from the local disk which results in high latency problem. To address this issue Spark follows a general execution model that helps in in-memory computing and optimization of arbitrary operator graphs so that querying data becomes much faster when compared to the disk based engines like MapReduce. Thus the paper focuses on enhancing the performance of Frequent Itemset Mining using Apache Spark architecture and study the performance of this Revamped Market Basket Analysis based on FP-Growth by comparing it with Hadoop MapReduce implementation of Frequent Itemset Mining task, BigFIM and also with different datasets.\",\"PeriodicalId\":321113,\"journal\":{\"name\":\"2017 11th International Conference on Intelligent Systems and Control (ISCO)\",\"volume\":\"27 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1900-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 11th International Conference on Intelligent Systems and Control (ISCO)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISCO.2017.7855955\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 11th International Conference on Intelligent Systems and Control (ISCO)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISCO.2017.7855955","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Revamped Market-Basket Analysis using In-Memory Computation framework
Data sets are growing day by day as they are being captured by information sensing devices such as mobiles, computers, wireless sensor networks, cameras, software logs, weblogs, remote sensing in various fields such as medical, engineering, science and many more. These large data sets are now called Big Data. Working with Big Data is not a common task. As this large data set has information hidden within them, researchers cannot and they have not ignored the large data set. Data mining is an interdisciplinary field in Computer Science which extracts information or the hidden patterns from data. Association rule mining and frequent itemset mining are popular data mining techniques that requires entire data to be in main memory. But large datasets does not fit into main memory. To handle this drawback, Hadoop MapReduce approach is used which has scalability and robustness features to handle large datasets. Apriori, Eclat and FP Growth are well known Frequent Itemset Mining algorithms. These algorithms are revised to work with Big Data using Hadoop MapReduce. But MapReduce framework has problems such as it stores the intermediate data in local disk. So the data needs to be accessed from the local disk which results in high latency problem. To address this issue Spark follows a general execution model that helps in in-memory computing and optimization of arbitrary operator graphs so that querying data becomes much faster when compared to the disk based engines like MapReduce. Thus the paper focuses on enhancing the performance of Frequent Itemset Mining using Apache Spark architecture and study the performance of this Revamped Market Basket Analysis based on FP-Growth by comparing it with Hadoop MapReduce implementation of Frequent Itemset Mining task, BigFIM and also with different datasets.