A compiler framework for extracting superword level parallelism

Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation Pub Date : 2012-06-11 DOI:10.1145/2254064.2254106

Jun Liu, Yuanrui Zhang, Ohyoung Jang, W. Ding, M. Kandemir

{"title":"A compiler framework for extracting superword level parallelism","authors":"Jun Liu, Yuanrui Zhang, Ohyoung Jang, W. Ding, M. Kandemir","doi":"10.1145/2254064.2254106","DOIUrl":null,"url":null,"abstract":"SIMD (single-instruction multiple-data) instruction set extensions are quite common today in both high performance and embedded microprocessors, and enable the exploitation of a specific type of data parallelism called SLP (Superword Level Parallelism). While prior research shows that significant performance savings are possible when SLP is exploited, placing SIMD instructions in an application code manually can be very difficult and error prone. In this paper, we propose a novel automated compiler framework for improving superword level parallelism exploitation. The key part of our framework consists of two stages: superword statement generation and data layout optimization. The first stage is our main contribution and has two phases, statement grouping and statement scheduling, of which the primary goals are to increase SIMD parallelism and, more importantly, capture more superword reuses among the superword statements through global data access and reuse pattern analysis. Further, as a complementary optimization, our data layout optimization organizes data in memory space such that the price of memory operations for SLP is minimized. The results from our compiler implementation and tests on two systems indicate performance improvements as high as 15.2% over a state-of-the-art SLP optimization algorithm.","PeriodicalId":308121,"journal":{"name":"Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation","volume":"122 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"48","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2254064.2254106","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 48

Abstract

SIMD (single-instruction multiple-data) instruction set extensions are quite common today in both high performance and embedded microprocessors, and enable the exploitation of a specific type of data parallelism called SLP (Superword Level Parallelism). While prior research shows that significant performance savings are possible when SLP is exploited, placing SIMD instructions in an application code manually can be very difficult and error prone. In this paper, we propose a novel automated compiler framework for improving superword level parallelism exploitation. The key part of our framework consists of two stages: superword statement generation and data layout optimization. The first stage is our main contribution and has two phases, statement grouping and statement scheduling, of which the primary goals are to increase SIMD parallelism and, more importantly, capture more superword reuses among the superword statements through global data access and reuse pattern analysis. Further, as a complementary optimization, our data layout optimization organizes data in memory space such that the price of memory operations for SLP is minimized. The results from our compiler implementation and tests on two systems indicate performance improvements as high as 15.2% over a state-of-the-art SLP optimization algorithm.

查看原文本刊更多论文

一个用于提取超词级并行性的编译器框架

SIMD(单指令多数据)指令集扩展如今在高性能微处理器和嵌入式微处理器中都很常见，并且支持利用一种称为SLP (Superword Level parallelism)的特定类型的数据并行性。虽然先前的研究表明，利用SLP可以显著节省性能，但在应用程序代码中手动放置SIMD指令可能非常困难，而且容易出错。在本文中，我们提出了一个新的自动编译框架，以提高超词级并行性的利用。我们的框架的关键部分包括两个阶段:超级词语句生成和数据布局优化。第一个阶段是我们的主要贡献，它有两个阶段:语句分组和语句调度，其中的主要目标是增加SIMD并行性，更重要的是，通过全局数据访问和重用模式分析在超词语句之间捕获更多的超词重用。此外，作为补充优化，我们的数据布局优化在内存空间中组织数据，使SLP的内存操作成本最小化。我们在两个系统上的编译器实现和测试结果表明，与最先进的SLP优化算法相比，性能提高高达15.2%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation

自引率

0.00%

发文量