Criticality stacks: identifying critical threads in parallel programs using synchronization behavior

Proceedings of the 40th Annual International Symposium on Computer Architecture Pub Date : 2013-06-23 DOI:10.1145/2485922.2485966

Kristof Du Bois, Stijn Eyerman, Jennifer B. Sartor, L. Eeckhout

{"title":"Criticality stacks: identifying critical threads in parallel programs using synchronization behavior","authors":"Kristof Du Bois, Stijn Eyerman, Jennifer B. Sartor, L. Eeckhout","doi":"10.1145/2485922.2485966","DOIUrl":null,"url":null,"abstract":"Analyzing multi-threaded programs is quite challenging, but is necessary to obtain good multicore performance while saving energy. Due to synchronization, certain threads make others wait, because they hold a lock or have yet to reach a barrier. We call these critical threads, i.e., threads whose performance is determinative of program performance as a whole. Identifying these threads can reveal numerous optimization opportunities, for the software developer and for hardware. In this paper, we propose a new metric for assessing thread criticality, which combines both how much time a thread is performing useful work and how many co-running threads are waiting. We show how thread criticality can be calculated online with modest hardware additions and with low overhead. We use our metric to create criticality stacks that break total execution time into each thread's criticality component, allowing for easy visual analysis of parallel imbalance. To validate our criticality metric, and demonstrate it is better than previous metrics, we scale the frequency of the most critical thread and show it achieves the largest performance improvement. We then demonstrate the broad applicability of criticality stacks by using them to perform three types of optimizations: (1) program analysis to remove parallel bottlenecks, (2) dynamically identifying the most critical thread and accelerating it using frequency scaling to improve performance, and (3) showing that accelerating only the most critical thread allows for targeted energy reduction.","PeriodicalId":20555,"journal":{"name":"Proceedings of the 40th Annual International Symposium on Computer Architecture","volume":"19 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2013-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"79","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 40th Annual International Symposium on Computer Architecture","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2485922.2485966","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 79

Abstract

Analyzing multi-threaded programs is quite challenging, but is necessary to obtain good multicore performance while saving energy. Due to synchronization, certain threads make others wait, because they hold a lock or have yet to reach a barrier. We call these critical threads, i.e., threads whose performance is determinative of program performance as a whole. Identifying these threads can reveal numerous optimization opportunities, for the software developer and for hardware. In this paper, we propose a new metric for assessing thread criticality, which combines both how much time a thread is performing useful work and how many co-running threads are waiting. We show how thread criticality can be calculated online with modest hardware additions and with low overhead. We use our metric to create criticality stacks that break total execution time into each thread's criticality component, allowing for easy visual analysis of parallel imbalance. To validate our criticality metric, and demonstrate it is better than previous metrics, we scale the frequency of the most critical thread and show it achieves the largest performance improvement. We then demonstrate the broad applicability of criticality stacks by using them to perform three types of optimizations: (1) program analysis to remove parallel bottlenecks, (2) dynamically identifying the most critical thread and accelerating it using frequency scaling to improve performance, and (3) showing that accelerating only the most critical thread allows for targeted energy reduction.

查看原文本刊更多论文

临界栈:在使用同步行为的并行程序中识别临界线程

分析多线程程序是相当具有挑战性的，但是为了在节省能源的同时获得良好的多核性能是必要的。由于同步，某些线程使其他线程等待，因为它们持有锁或尚未到达屏障。我们称这些线程为关键线程，即其性能决定整个程序性能的线程。识别这些线程可以为软件开发人员和硬件开发人员揭示许多优化机会。在本文中，我们提出了一个评估线程临界性的新度量，它结合了线程执行有用工作的时间和协同运行的线程等待的时间。我们将展示如何在少量硬件添加和低开销的情况下在线计算线程临界性。我们使用我们的度量来创建临界堆栈，将总执行时间分解为每个线程的临界组件，从而可以轻松地可视化分析并行不平衡。为了验证我们的关键性指标，并证明它比以前的指标更好，我们缩放了最关键线程的频率，并证明它实现了最大的性能改进。然后，我们通过使用临界堆栈执行三种类型的优化来展示临界堆栈的广泛适用性:(1)程序分析以消除并行瓶颈，(2)动态识别最关键的线程并使用频率缩放来加速它以提高性能，以及(3)显示仅加速最关键的线程可以实现目标能耗降低。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 40th Annual International Symposium on Computer Architecture

自引率

0.00%

发文量