{"title":"利用高性能集群改进数据挖掘技术","authors":"H. Fadhil, Zainab Abdulnasser, S. Mohammed","doi":"10.1109/ICCA56443.2022.10039629","DOIUrl":null,"url":null,"abstract":"People's reliance on computers and the computing power they provide is growing by the minute. An ever-increasing amount of data is being created each day, and the power to analyze this data requires the use of cluster computers to process and calculate data. It has been discovered that data clustering is a beneficial data mining approach. There have been a number of recent attempts to cluster data mining methods. Using a Raspberry Pi cluster, this study employs the Apriori algorithm, which is the most generally used algorithm, to extract frequent itemsets from large data sets. The fundamental aim is to build a cluster and provide data analysis capabilities based on an examination of the major clustering phases in order to illustrate the power of cluster computing and the applications of data analytics. Each Raspberry Pi uses the MPI standard and Python multiprocessing to share a large task and then coordinate their findings among a group of four or more MPICH systems at the conclusion of the processing. At the data partitioning stage, the issue of load balancing must be taken into account. According to our testing results, clustering accelerates sequential classification by a factor of 10. There is a noticeable increase in performance when there are additional processors installed. Additionally, we discovered that item count had a bigger effect on clustering performance than transaction count.","PeriodicalId":153139,"journal":{"name":"2022 International Conference on Computer and Applications (ICCA)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Improve Data Mining Techniques with a High-Performance Cluster\",\"authors\":\"H. Fadhil, Zainab Abdulnasser, S. Mohammed\",\"doi\":\"10.1109/ICCA56443.2022.10039629\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"People's reliance on computers and the computing power they provide is growing by the minute. An ever-increasing amount of data is being created each day, and the power to analyze this data requires the use of cluster computers to process and calculate data. It has been discovered that data clustering is a beneficial data mining approach. There have been a number of recent attempts to cluster data mining methods. Using a Raspberry Pi cluster, this study employs the Apriori algorithm, which is the most generally used algorithm, to extract frequent itemsets from large data sets. The fundamental aim is to build a cluster and provide data analysis capabilities based on an examination of the major clustering phases in order to illustrate the power of cluster computing and the applications of data analytics. Each Raspberry Pi uses the MPI standard and Python multiprocessing to share a large task and then coordinate their findings among a group of four or more MPICH systems at the conclusion of the processing. At the data partitioning stage, the issue of load balancing must be taken into account. According to our testing results, clustering accelerates sequential classification by a factor of 10. There is a noticeable increase in performance when there are additional processors installed. Additionally, we discovered that item count had a bigger effect on clustering performance than transaction count.\",\"PeriodicalId\":153139,\"journal\":{\"name\":\"2022 International Conference on Computer and Applications (ICCA)\",\"volume\":\"11 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-12-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 International Conference on Computer and Applications (ICCA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCA56443.2022.10039629\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Conference on Computer and Applications (ICCA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCA56443.2022.10039629","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Improve Data Mining Techniques with a High-Performance Cluster
People's reliance on computers and the computing power they provide is growing by the minute. An ever-increasing amount of data is being created each day, and the power to analyze this data requires the use of cluster computers to process and calculate data. It has been discovered that data clustering is a beneficial data mining approach. There have been a number of recent attempts to cluster data mining methods. Using a Raspberry Pi cluster, this study employs the Apriori algorithm, which is the most generally used algorithm, to extract frequent itemsets from large data sets. The fundamental aim is to build a cluster and provide data analysis capabilities based on an examination of the major clustering phases in order to illustrate the power of cluster computing and the applications of data analytics. Each Raspberry Pi uses the MPI standard and Python multiprocessing to share a large task and then coordinate their findings among a group of four or more MPICH systems at the conclusion of the processing. At the data partitioning stage, the issue of load balancing must be taken into account. According to our testing results, clustering accelerates sequential classification by a factor of 10. There is a noticeable increase in performance when there are additional processors installed. Additionally, we discovered that item count had a bigger effect on clustering performance than transaction count.