Sangmyung Lee , Byungyoon Lee , Yongseok Son , Kiwook Sohn , Hwajung Kim , Sunggon Kim
{"title":"AS2: Adaptive sorting algorithm selection for heterogeneous workloads and systems","authors":"Sangmyung Lee , Byungyoon Lee , Yongseok Son , Kiwook Sohn , Hwajung Kim , Sunggon Kim","doi":"10.1016/j.future.2025.107860","DOIUrl":null,"url":null,"abstract":"<div><div>Sorting is becoming increasingly important in modern computing, ranging from small-scale Internet of Things (IoT) devices to supercomputers. To improve sorting performance, various algorithms, including Intro sort, Merge sort, Heap sort, and Insertion sort, are adopted in different systems. However, the performance of sorting algorithms depends on various factors, and our analysis shows that the optimal algorithm varies, with no single algorithm consistently outperforming the others. In this paper, we first analyze data internal factors (data size, distribution, data type) and external factors (threads, different hardware) that impact sorting algorithm performance. We utilize widely adopted sorting algorithms such as STL sort and Merge sort, as well as state-of-the-art sorting algorithms like Ips4o sort and Aips2o sort. In addition to sequential sorting algorithms, we implement Parallel Intro sort and utilize the parallel versions of state-of-the-art sorting algorithms with varying number of threads. From the analysis, we present an adaptive sorting algorithm selection model for heterogeneous workloads and systems, called AS2 (Adaptive Sorting Algorithm Selection). Its goal is to determine the optimal algorithm from the existing sorting algorithms in heterogeneous workloads and systems. AS2 uses various ML models to build performance models for each sorting algorithm using data internal and external factors from various datasets. Then, AS2 chooses the optimal sorting algorithm based on the performance prediction using the model. We evaluate AS2 using a representative dataset that includes various data internal and external factors. The results show that AS2 can accurately predict the performance of various sorting algorithms, with min and max r-squared values of 0.83 and 0.99, respectively. In addition, AS2 successfully selects the optimal algorithm in our evaluation scenario up to 99.68% accuracy by choosing the algorithm with the shortest predicted sorting time, improving performance by up to 1.83<span><math><mo>×</mo></math></span> compared to the state-of-the-art algorithm. We also evaluate the performance of AS2 using the real-world dataset and the results show that AS2 selects the optimal algorithm with 87.50% accuracy.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"172 ","pages":"Article 107860"},"PeriodicalIF":6.2000,"publicationDate":"2025-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Future Generation Computer Systems-The International Journal of Escience","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167739X25001554","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
引用次数: 0
Abstract
Sorting is becoming increasingly important in modern computing, ranging from small-scale Internet of Things (IoT) devices to supercomputers. To improve sorting performance, various algorithms, including Intro sort, Merge sort, Heap sort, and Insertion sort, are adopted in different systems. However, the performance of sorting algorithms depends on various factors, and our analysis shows that the optimal algorithm varies, with no single algorithm consistently outperforming the others. In this paper, we first analyze data internal factors (data size, distribution, data type) and external factors (threads, different hardware) that impact sorting algorithm performance. We utilize widely adopted sorting algorithms such as STL sort and Merge sort, as well as state-of-the-art sorting algorithms like Ips4o sort and Aips2o sort. In addition to sequential sorting algorithms, we implement Parallel Intro sort and utilize the parallel versions of state-of-the-art sorting algorithms with varying number of threads. From the analysis, we present an adaptive sorting algorithm selection model for heterogeneous workloads and systems, called AS2 (Adaptive Sorting Algorithm Selection). Its goal is to determine the optimal algorithm from the existing sorting algorithms in heterogeneous workloads and systems. AS2 uses various ML models to build performance models for each sorting algorithm using data internal and external factors from various datasets. Then, AS2 chooses the optimal sorting algorithm based on the performance prediction using the model. We evaluate AS2 using a representative dataset that includes various data internal and external factors. The results show that AS2 can accurately predict the performance of various sorting algorithms, with min and max r-squared values of 0.83 and 0.99, respectively. In addition, AS2 successfully selects the optimal algorithm in our evaluation scenario up to 99.68% accuracy by choosing the algorithm with the shortest predicted sorting time, improving performance by up to 1.83 compared to the state-of-the-art algorithm. We also evaluate the performance of AS2 using the real-world dataset and the results show that AS2 selects the optimal algorithm with 87.50% accuracy.
期刊介绍:
Computing infrastructures and systems are constantly evolving, resulting in increasingly complex and collaborative scientific applications. To cope with these advancements, there is a growing need for collaborative tools that can effectively map, control, and execute these applications.
Furthermore, with the explosion of Big Data, there is a requirement for innovative methods and infrastructures to collect, analyze, and derive meaningful insights from the vast amount of data generated. This necessitates the integration of computational and storage capabilities, databases, sensors, and human collaboration.
Future Generation Computer Systems aims to pioneer advancements in distributed systems, collaborative environments, high-performance computing, and Big Data analytics. It strives to stay at the forefront of developments in grids, clouds, and the Internet of Things (IoT) to effectively address the challenges posed by these wide-area, fully distributed sensing and computing systems.