{"title":"Load Balancing Algorithms for Comparative Pattern Mining","authors":"Boqiang Cao","doi":"10.3103/S0146411625700488","DOIUrl":null,"url":null,"abstract":"<p>For addressing the issues of ineffective mining and memory overflow when dealing with high-dimensional and large-scale datasets with traditional comparative pattern mining, and to further lift the limitation of a single machine’s own hardware, the study proposes a parallel comparative pattern mining algorithm based on Spark cluster environment. By constructing an extended data collection project tree, introducing an optimised decision tree for mining, and improving the related load balancing strategy, effective mining of large-scale and high-dimensional datasets is achieved. Experiments show that the algorithm proposed in the study has a maximum value of 1883 and a minimum value of 1549 for the number of contrasting patterns mined in the small-scale and low-dimensional Mushroom dataset, which is slightly higher than the mining method of strong jump revealed patterns with good classification performance. In the large-scale and high-dimensional dataset US census1990, the overall running time of the algorithm of the study is low compared to the cryptogrowth algorithm (<i>T</i><sub>max</sub> 43.2 min, <i>T</i><sub>min</sub> 18.4 min), and finally the failure request rate of the algorithm itself and the improved and weighted polling algorithms are ompared separately, and the results show that the improved algorithm takes the lowest time of 4%. The experiment showcases that the classification effect of the studied algorithm is good, the load balancing strategy of the improved algorithm is effective, and the overall performance of the algorithm is good.</p>","PeriodicalId":46238,"journal":{"name":"AUTOMATIC CONTROL AND COMPUTER SCIENCES","volume":"59 3","pages":"317 - 327"},"PeriodicalIF":0.5000,"publicationDate":"2025-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"AUTOMATIC CONTROL AND COMPUTER SCIENCES","FirstCategoryId":"1085","ListUrlMain":"https://link.springer.com/article/10.3103/S0146411625700488","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
For addressing the issues of ineffective mining and memory overflow when dealing with high-dimensional and large-scale datasets with traditional comparative pattern mining, and to further lift the limitation of a single machine’s own hardware, the study proposes a parallel comparative pattern mining algorithm based on Spark cluster environment. By constructing an extended data collection project tree, introducing an optimised decision tree for mining, and improving the related load balancing strategy, effective mining of large-scale and high-dimensional datasets is achieved. Experiments show that the algorithm proposed in the study has a maximum value of 1883 and a minimum value of 1549 for the number of contrasting patterns mined in the small-scale and low-dimensional Mushroom dataset, which is slightly higher than the mining method of strong jump revealed patterns with good classification performance. In the large-scale and high-dimensional dataset US census1990, the overall running time of the algorithm of the study is low compared to the cryptogrowth algorithm (Tmax 43.2 min, Tmin 18.4 min), and finally the failure request rate of the algorithm itself and the improved and weighted polling algorithms are ompared separately, and the results show that the improved algorithm takes the lowest time of 4%. The experiment showcases that the classification effect of the studied algorithm is good, the load balancing strategy of the improved algorithm is effective, and the overall performance of the algorithm is good.
期刊介绍:
Automatic Control and Computer Sciences is a peer reviewed journal that publishes articles on• Control systems, cyber-physical system, real-time systems, robotics, smart sensors, embedded intelligence • Network information technologies, information security, statistical methods of data processing, distributed artificial intelligence, complex systems modeling, knowledge representation, processing and management • Signal and image processing, machine learning, machine perception, computer vision