RICH: implementing reductions in the cache hierarchy

Proceedings of the 34th ACM International Conference on Supercomputing Pub Date : 2020-06-29 DOI:10.1145/3392717.3392736

Vladimir Dimic, Miquel Moretó, Marc Casas, Jan Ciesko, M. Valero

{"title":"RICH: implementing reductions in the cache hierarchy","authors":"Vladimir Dimic, Miquel Moretó, Marc Casas, Jan Ciesko, M. Valero","doi":"10.1145/3392717.3392736","DOIUrl":null,"url":null,"abstract":"Reductions constitute a frequent algorithmic pattern in high-performance and scientific computing. Sophisticated techniques are needed to ensure their correct and scalable concurrent execution on modern processors. Reductions on large arrays represent the most demanding case where traditional approaches are not always applicable due to low performance scalability. To address these challenges, we propose RICH, a runtime-assisted solution that relies on architectural and parallel programming model extensions. RICH updates the reduction variable directly in the cache hierarchy with the help of added in-cache functional units. Our programming model extensions fit with the most relevant parallel programming solutions for shared memory environments like OpenMP. RICH does not modify the ISA, which allows the use of algorithms with reductions from pre-compiled external libraries. Experiments show that our solution achieves the performance improvements of 11.2% on average, compared to the state-of-the-art hardware-based approaches, while it introduces 2.4% area and 3.8% power overhead.","PeriodicalId":346687,"journal":{"name":"Proceedings of the 34th ACM International Conference on Supercomputing","volume":"297 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 34th ACM International Conference on Supercomputing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3392717.3392736","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Reductions constitute a frequent algorithmic pattern in high-performance and scientific computing. Sophisticated techniques are needed to ensure their correct and scalable concurrent execution on modern processors. Reductions on large arrays represent the most demanding case where traditional approaches are not always applicable due to low performance scalability. To address these challenges, we propose RICH, a runtime-assisted solution that relies on architectural and parallel programming model extensions. RICH updates the reduction variable directly in the cache hierarchy with the help of added in-cache functional units. Our programming model extensions fit with the most relevant parallel programming solutions for shared memory environments like OpenMP. RICH does not modify the ISA, which allows the use of algorithms with reductions from pre-compiled external libraries. Experiments show that our solution achieves the performance improvements of 11.2% on average, compared to the state-of-the-art hardware-based approaches, while it introduces 2.4% area and 3.8% power overhead.

查看原文本刊更多论文

RICH:在缓存层次结构中实现缩减

约简是高性能和科学计算中常见的算法模式。需要复杂的技术来确保它们在现代处理器上的正确和可伸缩的并发执行。大型数组上的缩减代表了最苛刻的情况，在这种情况下，由于低性能可伸缩性，传统方法并不总是适用。为了应对这些挑战，我们提出了RICH，这是一种运行时辅助的解决方案，它依赖于架构和并行编程模型扩展。RICH在添加的缓存内功能单元的帮助下直接在缓存层次结构中更新reduce变量。我们的编程模型扩展适用于共享内存环境(如OpenMP)中最相关的并行编程解决方案。RICH不修改ISA，它允许使用预先编译的外部库中的缩减算法。实验表明，与最先进的基于硬件的方法相比，我们的解决方案平均实现了11.2%的性能改进，同时引入了2.4%的面积和3.8%的功耗开销。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 34th ACM International Conference on Supercomputing

自引率

0.00%

发文量