Jinsu Park, Seongbeom Park, Myeonggyun Han, Woongki Baek
{"title":"PALM: Progress- and Locality-Aware Adaptive Task Migration for Efficient Thread Packing","authors":"Jinsu Park, Seongbeom Park, Myeonggyun Han, Woongki Baek","doi":"10.1109/IPDPS49936.2021.00041","DOIUrl":null,"url":null,"abstract":"Thread packing (TP) is an effective and widely-used technique to significantly improve the efficiency of parallel systems by dynamically controlling the number of cores allocated to multithreaded applications based on their requirements such as performance and energy efficiency. Despite the extensive prior works on TP, little work has been done to investigate and address its performance inefficiencies that arise across various parallel systems and applications with different characteristics. To bridge this gap, we investigate the performance inefficiencies of TP using a wide range of parallel applications and system configurations and identify their root causes. Guided by the in-depth performance characterization results, we propose PALM, progress- and locality-aware adaptive task migration for efficient TP. Through quantitative evaluation, we demonstrate that PALM achieves significantly higher performance and lower energy consumption than TP across various synchronization-intensive applications and system configurations, provides the performance and energy consumption comparable with the thread reduction technique, and considerably improves the efficiency of dynamic server consolidation and the performance under power capping.","PeriodicalId":372234,"journal":{"name":"2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPS49936.2021.00041","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Thread packing (TP) is an effective and widely-used technique to significantly improve the efficiency of parallel systems by dynamically controlling the number of cores allocated to multithreaded applications based on their requirements such as performance and energy efficiency. Despite the extensive prior works on TP, little work has been done to investigate and address its performance inefficiencies that arise across various parallel systems and applications with different characteristics. To bridge this gap, we investigate the performance inefficiencies of TP using a wide range of parallel applications and system configurations and identify their root causes. Guided by the in-depth performance characterization results, we propose PALM, progress- and locality-aware adaptive task migration for efficient TP. Through quantitative evaluation, we demonstrate that PALM achieves significantly higher performance and lower energy consumption than TP across various synchronization-intensive applications and system configurations, provides the performance and energy consumption comparable with the thread reduction technique, and considerably improves the efficiency of dynamic server consolidation and the performance under power capping.