在Tilera多核架构上优化不规则应用程序的能源和性能

Proceedings of the 12th ACM International Conference on Computing Frontiers Pub Date : 2015-05-06 DOI:10.1145/2742854.2742865

D. Chavarría-Miranda, Ajay Panyala, M. Halappanavar, J. Manzano, Antonino Tumeo

{"title":"在Tilera多核架构上优化不规则应用程序的能源和性能","authors":"D. Chavarría-Miranda, Ajay Panyala, M. Halappanavar, J. Manzano, Antonino Tumeo","doi":"10.1145/2742854.2742865","DOIUrl":null,"url":null,"abstract":"Optimizing applications simultaneously for energy and performance is a complex problem. High performance, parallel, irregular applications are notoriously hard to optimize due to their data-dependent memory accesses, lack of structured locality and complex data structures and code patterns. Irregular kernels are growing in importance in applications such as machine learning, graph analytics and combinatorial scientific computing. Performance- and energy-efficient implementation of these kernels on modern, energy efficient, many-core platforms is therefore an important and challenging problem. We present results from optimizing two irregular applications -- the Louvain method for community detection (Grappolo), and high-performance conjugate gradient (HPCCG) -- on the Tilera many-core system. We have significantly extended MIT's OpenTuner auto-tuning framework to conduct a detailed study of platform-independent and platform-specific optimizations to improve performance as well as reduce total energy consumption. We explore the optimization design space along three dimensions: memory layout schemes, compiler-based code transformations, and optimization of parallel loop schedules. Using auto-tuning, we demonstrate whole-node energy savings of up to 41% relative to a baseline instantiation, and up to 31% relative to manually optimized variants.","PeriodicalId":417279,"journal":{"name":"Proceedings of the 12th ACM International Conference on Computing Frontiers","volume":"41 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Optimizing irregular applications for energy and performance on the Tilera many-core architecture\",\"authors\":\"D. Chavarría-Miranda, Ajay Panyala, M. Halappanavar, J. Manzano, Antonino Tumeo\",\"doi\":\"10.1145/2742854.2742865\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Optimizing applications simultaneously for energy and performance is a complex problem. High performance, parallel, irregular applications are notoriously hard to optimize due to their data-dependent memory accesses, lack of structured locality and complex data structures and code patterns. Irregular kernels are growing in importance in applications such as machine learning, graph analytics and combinatorial scientific computing. Performance- and energy-efficient implementation of these kernels on modern, energy efficient, many-core platforms is therefore an important and challenging problem. We present results from optimizing two irregular applications -- the Louvain method for community detection (Grappolo), and high-performance conjugate gradient (HPCCG) -- on the Tilera many-core system. We have significantly extended MIT's OpenTuner auto-tuning framework to conduct a detailed study of platform-independent and platform-specific optimizations to improve performance as well as reduce total energy consumption. We explore the optimization design space along three dimensions: memory layout schemes, compiler-based code transformations, and optimization of parallel loop schedules. Using auto-tuning, we demonstrate whole-node energy savings of up to 41% relative to a baseline instantiation, and up to 31% relative to manually optimized variants.\",\"PeriodicalId\":417279,\"journal\":{\"name\":\"Proceedings of the 12th ACM International Conference on Computing Frontiers\",\"volume\":\"41 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-05-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 12th ACM International Conference on Computing Frontiers\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2742854.2742865\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 12th ACM International Conference on Computing Frontiers","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2742854.2742865","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

摘要

同时优化应用程序的能源和性能是一个复杂的问题。高性能、并行、不规则的应用程序很难优化，因为它们依赖于数据的内存访问、缺乏结构化的局部性以及复杂的数据结构和代码模式。不规则核在机器学习、图形分析和组合科学计算等应用中越来越重要。因此，在现代、节能、多核平台上实现这些内核的性能和节能是一个重要而具有挑战性的问题。我们展示了在Tilera多核系统上优化两种不规则应用的结果——用于群落检测的Louvain方法(Grappolo)和高性能共轭梯度(HPCCG)。我们大大扩展了MIT的OpenTuner自动调优框架，对平台无关和平台特定的优化进行了详细的研究，以提高性能并降低总能耗。我们沿着三个维度探索优化设计空间:内存布局方案，基于编译器的代码转换，并行循环调度的优化。使用自动调优，我们证明了与基线实例化相比，全节点节能高达41%，与手动优化的变体相比，节能高达31%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Optimizing irregular applications for energy and performance on the Tilera many-core architecture

Optimizing applications simultaneously for energy and performance is a complex problem. High performance, parallel, irregular applications are notoriously hard to optimize due to their data-dependent memory accesses, lack of structured locality and complex data structures and code patterns. Irregular kernels are growing in importance in applications such as machine learning, graph analytics and combinatorial scientific computing. Performance- and energy-efficient implementation of these kernels on modern, energy efficient, many-core platforms is therefore an important and challenging problem. We present results from optimizing two irregular applications -- the Louvain method for community detection (Grappolo), and high-performance conjugate gradient (HPCCG) -- on the Tilera many-core system. We have significantly extended MIT's OpenTuner auto-tuning framework to conduct a detailed study of platform-independent and platform-specific optimizations to improve performance as well as reduce total energy consumption. We explore the optimization design space along three dimensions: memory layout schemes, compiler-based code transformations, and optimization of parallel loop schedules. Using auto-tuning, we demonstrate whole-node energy savings of up to 41% relative to a baseline instantiation, and up to 31% relative to manually optimized variants.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 12th ACM International Conference on Computing Frontiers

自引率

0.00%

发文量