基于缓存的交叉迭代相干推测并行化

20th Annual International Conference on High Performance Computing Pub Date : 2013-12-01 DOI:10.1109/HiPC.2013.6799113

Andre Baixo, João Paulo Porto, G. Araújo

{"title":"基于缓存的交叉迭代相干推测并行化","authors":"Andre Baixo, João Paulo Porto, G. Araújo","doi":"10.1109/HiPC.2013.6799113","DOIUrl":null,"url":null,"abstract":"Maximal utilization of cores in multicore architectures is key to realize the potential performance available from higher density devices. In order to achieve scalable performance, parallelization techniques rely on carefully tunning speculative architecture support, run-time environment and software-based transformations. Hardware and software mechanisms have already been proposed to address this problem. They either require deep (and risky) changes on the existing hardware and cache coherence protocols, or exhibit poor performance scalability for a range of applications. The addition of cache tags as an enabler for data versioning, recently announced by the industry (i.e. IBM BlueGene/Q), could allow a better exploitation of parallelism at the microarchitecture level. In this paper, we present an execution model that supports both DOPIPE-based speculation and traditional speculative parallelization techniques. It is based on a simple cache tagging approach for data versioning, which integrates smoothly with typical cache coherence protocols, not requiring any changes to them. Experimental results, using SPEC and PARSEC benchmarks, reveal substantial speedups in a 24-core simulated CMP, while demonstrate improved scalability when compared to a software-only approach.","PeriodicalId":206307,"journal":{"name":"20th Annual International Conference on High Performance Computing","volume":"17 2","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Cache-based cross-iteration coherence for speculative parallelization\",\"authors\":\"Andre Baixo, João Paulo Porto, G. Araújo\",\"doi\":\"10.1109/HiPC.2013.6799113\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Maximal utilization of cores in multicore architectures is key to realize the potential performance available from higher density devices. In order to achieve scalable performance, parallelization techniques rely on carefully tunning speculative architecture support, run-time environment and software-based transformations. Hardware and software mechanisms have already been proposed to address this problem. They either require deep (and risky) changes on the existing hardware and cache coherence protocols, or exhibit poor performance scalability for a range of applications. The addition of cache tags as an enabler for data versioning, recently announced by the industry (i.e. IBM BlueGene/Q), could allow a better exploitation of parallelism at the microarchitecture level. In this paper, we present an execution model that supports both DOPIPE-based speculation and traditional speculative parallelization techniques. It is based on a simple cache tagging approach for data versioning, which integrates smoothly with typical cache coherence protocols, not requiring any changes to them. Experimental results, using SPEC and PARSEC benchmarks, reveal substantial speedups in a 24-core simulated CMP, while demonstrate improved scalability when compared to a software-only approach.\",\"PeriodicalId\":206307,\"journal\":{\"name\":\"20th Annual International Conference on High Performance Computing\",\"volume\":\"17 2\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"20th Annual International Conference on High Performance Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/HiPC.2013.6799113\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"20th Annual International Conference on High Performance Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HiPC.2013.6799113","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

在多核架构中，最大限度地利用核心是实现高密度设备潜在性能的关键。为了实现可伸缩的性能，并行化技术依赖于仔细地调优推测的体系结构支持、运行时环境和基于软件的转换。已经提出了硬件和软件机制来解决这个问题。它们要么需要对现有硬件和缓存一致性协议进行深入(且有风险)的更改，要么在一系列应用程序中表现出较差的性能可伸缩性。最近，业界(例如IBM BlueGene/Q)宣布，添加缓存标签作为数据版本控制的推手，可以在微架构级别上更好地利用并行性。在本文中，我们提出了一个执行模型，该模型既支持基于dopipe的推测，也支持传统的推测并行化技术。它基于一种用于数据版本控制的简单缓存标记方法，该方法与典型的缓存一致性协议顺利集成，不需要对它们进行任何更改。使用SPEC和PARSEC基准测试的实验结果显示，在24核模拟CMP中有显着的加速，同时与仅使用软件的方法相比，证明了改进的可伸缩性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Cache-based cross-iteration coherence for speculative parallelization

Maximal utilization of cores in multicore architectures is key to realize the potential performance available from higher density devices. In order to achieve scalable performance, parallelization techniques rely on carefully tunning speculative architecture support, run-time environment and software-based transformations. Hardware and software mechanisms have already been proposed to address this problem. They either require deep (and risky) changes on the existing hardware and cache coherence protocols, or exhibit poor performance scalability for a range of applications. The addition of cache tags as an enabler for data versioning, recently announced by the industry (i.e. IBM BlueGene/Q), could allow a better exploitation of parallelism at the microarchitecture level. In this paper, we present an execution model that supports both DOPIPE-based speculation and traditional speculative parallelization techniques. It is based on a simple cache tagging approach for data versioning, which integrates smoothly with typical cache coherence protocols, not requiring any changes to them. Experimental results, using SPEC and PARSEC benchmarks, reveal substantial speedups in a 24-core simulated CMP, while demonstrate improved scalability when compared to a software-only approach.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

20th Annual International Conference on High Performance Computing

自引率

0.00%

发文量