{"title":"SDS-Sort:可扩展的动态倾斜感知并行排序","authors":"Bin Dong, S. Byna, Kesheng Wu","doi":"10.1145/2907294.2907300","DOIUrl":null,"url":null,"abstract":"Parallel sorting is an essential algorithm in large-scale data analytics using distributed memory systems. As the number of processes increases, existing parallel sorting algorithms could become inefficient because of the unbalanced workload. A common cause of load imbalance is the skewness of data, which is common in application data sets from physics, biology, earth and planetary sciences. In this work, we introduce a new scalable dynamic skew-aware parallel sorting algorithm, named SDS-Sort. It uses a skew-aware partition method to guarantee a tighter upper bound on the workload of each process. To improve load balance among parallel processes, existing algorithms usually add extra variables to the sorting key, which increase the time needed to complete the sorting operation. SDS-Sort allows a user to select any sorting key without sacrificing performance. SDS-Sort also provides optimizations, including adaptive local merging, overlapping of data exchange and data processing, and dynamic selection of data processing algorithms for different hardware configurations and for partially ordered data. SDS-Sort uses local-sampling based partitioning to further reduce its overhead. We tested SDS-Sort extensively on Edison, a Cray XC30 supercomputer. Timing measurements show that SDS-Sort can scale to 130K CPU cores and deliver a sorting throughput of 117TB/min. In tests with real application data from large science projects, SDS-Sort outperforms HykSort, a state-of-art parallel sorting algorithm, by 3.4X.","PeriodicalId":20515,"journal":{"name":"Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing","volume":"48 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2016-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":"{\"title\":\"SDS-Sort: Scalable Dynamic Skew-aware Parallel Sorting\",\"authors\":\"Bin Dong, S. Byna, Kesheng Wu\",\"doi\":\"10.1145/2907294.2907300\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Parallel sorting is an essential algorithm in large-scale data analytics using distributed memory systems. As the number of processes increases, existing parallel sorting algorithms could become inefficient because of the unbalanced workload. A common cause of load imbalance is the skewness of data, which is common in application data sets from physics, biology, earth and planetary sciences. In this work, we introduce a new scalable dynamic skew-aware parallel sorting algorithm, named SDS-Sort. It uses a skew-aware partition method to guarantee a tighter upper bound on the workload of each process. To improve load balance among parallel processes, existing algorithms usually add extra variables to the sorting key, which increase the time needed to complete the sorting operation. SDS-Sort allows a user to select any sorting key without sacrificing performance. SDS-Sort also provides optimizations, including adaptive local merging, overlapping of data exchange and data processing, and dynamic selection of data processing algorithms for different hardware configurations and for partially ordered data. SDS-Sort uses local-sampling based partitioning to further reduce its overhead. We tested SDS-Sort extensively on Edison, a Cray XC30 supercomputer. Timing measurements show that SDS-Sort can scale to 130K CPU cores and deliver a sorting throughput of 117TB/min. In tests with real application data from large science projects, SDS-Sort outperforms HykSort, a state-of-art parallel sorting algorithm, by 3.4X.\",\"PeriodicalId\":20515,\"journal\":{\"name\":\"Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing\",\"volume\":\"48 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-05-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"10\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2907294.2907300\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2907294.2907300","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Parallel sorting is an essential algorithm in large-scale data analytics using distributed memory systems. As the number of processes increases, existing parallel sorting algorithms could become inefficient because of the unbalanced workload. A common cause of load imbalance is the skewness of data, which is common in application data sets from physics, biology, earth and planetary sciences. In this work, we introduce a new scalable dynamic skew-aware parallel sorting algorithm, named SDS-Sort. It uses a skew-aware partition method to guarantee a tighter upper bound on the workload of each process. To improve load balance among parallel processes, existing algorithms usually add extra variables to the sorting key, which increase the time needed to complete the sorting operation. SDS-Sort allows a user to select any sorting key without sacrificing performance. SDS-Sort also provides optimizations, including adaptive local merging, overlapping of data exchange and data processing, and dynamic selection of data processing algorithms for different hardware configurations and for partially ordered data. SDS-Sort uses local-sampling based partitioning to further reduce its overhead. We tested SDS-Sort extensively on Edison, a Cray XC30 supercomputer. Timing measurements show that SDS-Sort can scale to 130K CPU cores and deliver a sorting throughput of 117TB/min. In tests with real application data from large science projects, SDS-Sort outperforms HykSort, a state-of-art parallel sorting algorithm, by 3.4X.