数据聚合的未来矢量微处理器扩展

2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA) Pub Date : 2016-06-18 DOI:10.1145/3007787.3001182

Timothy Hayes, Oscar Palomar, O. Unsal, A. Cristal, M. Valero

{"title":"数据聚合的未来矢量微处理器扩展","authors":"Timothy Hayes, Oscar Palomar, O. Unsal, A. Cristal, M. Valero","doi":"10.1145/3007787.3001182","DOIUrl":null,"url":null,"abstract":"As the rate of annual data generation grows exponentially, there is a demand to aggregate and summarise vast amounts of information quickly. In the past, frequency scaling was relied upon to push application throughput. Today, Dennard scaling has ceased and further performance must come from exploiting parallelism. Single instruction-multiple data (SIMD) instruction sets offer a highly efficient and scalable way of exploiting data-level parallelism (DLP). While microprocessors originally offered very simple SIMD support targeted at multimedia applications, these extensions have been growing both in width and functionality. Observing this trend, we use a simulation framework to model future SIMD support and then propose and evaluate five different ways of vectorising data aggregation. We find that although data aggregation is abundant in DLP, it is often too irregular to be expressed efficiently using typical SIMD instructions. Based on this observation, we propose a set of novel algorithms and SIMD instructions to better capture this irregular DLP. Furthermore, we discover that the best algorithm is highly dependent on the characteristics of the input. Our proposed solution can dynamically choose the optimal algorithm in the majority of cases and achieves speedups between 2.7x and 7.6x over a scalar baseline.","PeriodicalId":6634,"journal":{"name":"2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA)","volume":"167 1","pages":"418-430"},"PeriodicalIF":0.0000,"publicationDate":"2016-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":"{\"title\":\"Future Vector Microprocessor Extensions for Data Aggregations\",\"authors\":\"Timothy Hayes, Oscar Palomar, O. Unsal, A. Cristal, M. Valero\",\"doi\":\"10.1145/3007787.3001182\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"As the rate of annual data generation grows exponentially, there is a demand to aggregate and summarise vast amounts of information quickly. In the past, frequency scaling was relied upon to push application throughput. Today, Dennard scaling has ceased and further performance must come from exploiting parallelism. Single instruction-multiple data (SIMD) instruction sets offer a highly efficient and scalable way of exploiting data-level parallelism (DLP). While microprocessors originally offered very simple SIMD support targeted at multimedia applications, these extensions have been growing both in width and functionality. Observing this trend, we use a simulation framework to model future SIMD support and then propose and evaluate five different ways of vectorising data aggregation. We find that although data aggregation is abundant in DLP, it is often too irregular to be expressed efficiently using typical SIMD instructions. Based on this observation, we propose a set of novel algorithms and SIMD instructions to better capture this irregular DLP. Furthermore, we discover that the best algorithm is highly dependent on the characteristics of the input. Our proposed solution can dynamically choose the optimal algorithm in the majority of cases and achieves speedups between 2.7x and 7.6x over a scalar baseline.\",\"PeriodicalId\":6634,\"journal\":{\"name\":\"2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA)\",\"volume\":\"167 1\",\"pages\":\"418-430\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-06-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"9\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3007787.3001182\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3007787.3001182","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 9

摘要

随着每年数据生成的速度呈指数级增长，人们需要快速汇总和总结大量信息。在过去，频率缩放是用来提高应用程序吞吐量的。今天，Dennard缩放已经停止，进一步的性能必须通过利用并行性来实现。单指令多数据(SIMD)指令集提供了一种利用数据级并行性(DLP)的高效且可扩展的方法。虽然微处理器最初提供了针对多媒体应用程序的非常简单的SIMD支持，但这些扩展在宽度和功能上都在不断增长。观察到这一趋势，我们使用一个模拟框架来模拟未来SIMD支持，然后提出并评估五种不同的向量化数据聚合方法。我们发现，尽管DLP中有大量的数据聚合，但它往往过于不规则，无法使用典型的SIMD指令有效地表达。基于这一观察，我们提出了一套新的算法和SIMD指令来更好地捕获这种不规则的DLP。此外，我们发现最好的算法高度依赖于输入的特征。我们提出的解决方案可以在大多数情况下动态选择最优算法，并在标量基线上实现2.7到7.6倍的加速。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Future Vector Microprocessor Extensions for Data Aggregations

As the rate of annual data generation grows exponentially, there is a demand to aggregate and summarise vast amounts of information quickly. In the past, frequency scaling was relied upon to push application throughput. Today, Dennard scaling has ceased and further performance must come from exploiting parallelism. Single instruction-multiple data (SIMD) instruction sets offer a highly efficient and scalable way of exploiting data-level parallelism (DLP). While microprocessors originally offered very simple SIMD support targeted at multimedia applications, these extensions have been growing both in width and functionality. Observing this trend, we use a simulation framework to model future SIMD support and then propose and evaluate five different ways of vectorising data aggregation. We find that although data aggregation is abundant in DLP, it is often too irregular to be expressed efficiently using typical SIMD instructions. Based on this observation, we propose a set of novel algorithms and SIMD instructions to better capture this irregular DLP. Furthermore, we discover that the best algorithm is highly dependent on the characteristics of the input. Our proposed solution can dynamically choose the optimal algorithm in the majority of cases and achieves speedups between 2.7x and 7.6x over a scalar baseline.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA)

自引率

0.00%

发文量