Pandia: comprehensive contention-sensitive thread placement

Proceedings of the Twelfth European Conference on Computer Systems Pub Date : 2017-04-23 DOI:10.1145/3064176.3064177

D. Goodman, Georgios Varisteas, T. Harris

{"title":"Pandia: comprehensive contention-sensitive thread placement","authors":"D. Goodman, Georgios Varisteas, T. Harris","doi":"10.1145/3064176.3064177","DOIUrl":null,"url":null,"abstract":"Pandia is a system for modeling the performance of in-memory parallel workloads. It generates a description of a workload from a series of profiling runs, and combines this with a description of the machine's hardware to model the workload's performance over different thread counts and different placements of those threads. The approach is \"comprehensive\" in that it accounts for contention at multiple resources such as processor functional units and memory channels. The points of contention for a workload can shift between resources as the degree of parallelism and thread placement changes. Pandia accounts for these changes and provides a close correspondence between predicted performance and actual performance. Testing a set of 22 benchmarks on 2 socket Intel machines fitted with chips ranging from Sandy Bridge to Haswell we see median differences of 1.05% to 0% between the fastest predicted placement and the fastest measured placement, and median errors of 8% to 4% across all placements. Pandia can be used to optimize the performance of a given workload---for instance, identifying whether or not multiple processor sockets should be used, and whether or not the workload benefits from using multiple threads per core. In addition, Pandia can be used to identify opportunities for reducing resource consumption where additional resources are not matched by additional performance---for instance, limiting a workload to a small number of cores when its scaling is poor.","PeriodicalId":262089,"journal":{"name":"Proceedings of the Twelfth European Conference on Computer Systems","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"19","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Twelfth European Conference on Computer Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3064176.3064177","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 19

Abstract

Pandia is a system for modeling the performance of in-memory parallel workloads. It generates a description of a workload from a series of profiling runs, and combines this with a description of the machine's hardware to model the workload's performance over different thread counts and different placements of those threads. The approach is "comprehensive" in that it accounts for contention at multiple resources such as processor functional units and memory channels. The points of contention for a workload can shift between resources as the degree of parallelism and thread placement changes. Pandia accounts for these changes and provides a close correspondence between predicted performance and actual performance. Testing a set of 22 benchmarks on 2 socket Intel machines fitted with chips ranging from Sandy Bridge to Haswell we see median differences of 1.05% to 0% between the fastest predicted placement and the fastest measured placement, and median errors of 8% to 4% across all placements. Pandia can be used to optimize the performance of a given workload---for instance, identifying whether or not multiple processor sockets should be used, and whether or not the workload benefits from using multiple threads per core. In addition, Pandia can be used to identify opportunities for reducing resource consumption where additional resources are not matched by additional performance---for instance, limiting a workload to a small number of cores when its scaling is poor.

查看原文本刊更多论文

Pandia:全面的对争用敏感的线程放置

Pandia是一个用于模拟内存中并行工作负载性能的系统。它从一系列分析运行中生成工作负载的描述，并将其与机器硬件的描述结合起来，对不同线程数和这些线程的不同位置上的工作负载性能进行建模。该方法是“全面的”，因为它考虑了多个资源(如处理器功能单元和内存通道)上的争用。随着并行度和线程位置的变化，工作负载的争用点可以在资源之间转移。Pandia解释了这些变化，并提供了预测性能和实际性能之间的密切对应关系。在配备Sandy Bridge到Haswell芯片的2套Intel机器上测试了22个基准测试，我们看到最快的预测放置和最快的测量放置之间的中位数差异为1.05%到0%，所有放置的中位数误差为8%到4%。Pandia可用于优化给定工作负载的性能——例如，确定是否应该使用多个处理器套接字，以及工作负载是否从每核使用多个线程中获益。此外，Pandia还可以用于识别在额外的资源与额外的性能不匹配的情况下减少资源消耗的机会——例如，在可扩展性差的情况下，将工作负载限制在少量的核心上。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the Twelfth European Conference on Computer Systems

自引率

0.00%

发文量