基于缓存的交叉迭代相干推测并行化

Andre Baixo, João Paulo Porto, G. Araújo
{"title":"基于缓存的交叉迭代相干推测并行化","authors":"Andre Baixo, João Paulo Porto, G. Araújo","doi":"10.1109/HiPC.2013.6799113","DOIUrl":null,"url":null,"abstract":"Maximal utilization of cores in multicore architectures is key to realize the potential performance available from higher density devices. In order to achieve scalable performance, parallelization techniques rely on carefully tunning speculative architecture support, run-time environment and software-based transformations. Hardware and software mechanisms have already been proposed to address this problem. They either require deep (and risky) changes on the existing hardware and cache coherence protocols, or exhibit poor performance scalability for a range of applications. The addition of cache tags as an enabler for data versioning, recently announced by the industry (i.e. IBM BlueGene/Q), could allow a better exploitation of parallelism at the microarchitecture level. In this paper, we present an execution model that supports both DOPIPE-based speculation and traditional speculative parallelization techniques. It is based on a simple cache tagging approach for data versioning, which integrates smoothly with typical cache coherence protocols, not requiring any changes to them. Experimental results, using SPEC and PARSEC benchmarks, reveal substantial speedups in a 24-core simulated CMP, while demonstrate improved scalability when compared to a software-only approach.","PeriodicalId":206307,"journal":{"name":"20th Annual International Conference on High Performance Computing","volume":"17 2","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Cache-based cross-iteration coherence for speculative parallelization\",\"authors\":\"Andre Baixo, João Paulo Porto, G. Araújo\",\"doi\":\"10.1109/HiPC.2013.6799113\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Maximal utilization of cores in multicore architectures is key to realize the potential performance available from higher density devices. In order to achieve scalable performance, parallelization techniques rely on carefully tunning speculative architecture support, run-time environment and software-based transformations. Hardware and software mechanisms have already been proposed to address this problem. They either require deep (and risky) changes on the existing hardware and cache coherence protocols, or exhibit poor performance scalability for a range of applications. The addition of cache tags as an enabler for data versioning, recently announced by the industry (i.e. IBM BlueGene/Q), could allow a better exploitation of parallelism at the microarchitecture level. In this paper, we present an execution model that supports both DOPIPE-based speculation and traditional speculative parallelization techniques. It is based on a simple cache tagging approach for data versioning, which integrates smoothly with typical cache coherence protocols, not requiring any changes to them. Experimental results, using SPEC and PARSEC benchmarks, reveal substantial speedups in a 24-core simulated CMP, while demonstrate improved scalability when compared to a software-only approach.\",\"PeriodicalId\":206307,\"journal\":{\"name\":\"20th Annual International Conference on High Performance Computing\",\"volume\":\"17 2\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"20th Annual International Conference on High Performance Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/HiPC.2013.6799113\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"20th Annual International Conference on High Performance Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HiPC.2013.6799113","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

在多核架构中,最大限度地利用核心是实现高密度设备潜在性能的关键。为了实现可伸缩的性能,并行化技术依赖于仔细地调优推测的体系结构支持、运行时环境和基于软件的转换。已经提出了硬件和软件机制来解决这个问题。它们要么需要对现有硬件和缓存一致性协议进行深入(且有风险)的更改,要么在一系列应用程序中表现出较差的性能可伸缩性。最近,业界(例如IBM BlueGene/Q)宣布,添加缓存标签作为数据版本控制的推手,可以在微架构级别上更好地利用并行性。在本文中,我们提出了一个执行模型,该模型既支持基于dopipe的推测,也支持传统的推测并行化技术。它基于一种用于数据版本控制的简单缓存标记方法,该方法与典型的缓存一致性协议顺利集成,不需要对它们进行任何更改。使用SPEC和PARSEC基准测试的实验结果显示,在24核模拟CMP中有显着的加速,同时与仅使用软件的方法相比,证明了改进的可伸缩性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Cache-based cross-iteration coherence for speculative parallelization
Maximal utilization of cores in multicore architectures is key to realize the potential performance available from higher density devices. In order to achieve scalable performance, parallelization techniques rely on carefully tunning speculative architecture support, run-time environment and software-based transformations. Hardware and software mechanisms have already been proposed to address this problem. They either require deep (and risky) changes on the existing hardware and cache coherence protocols, or exhibit poor performance scalability for a range of applications. The addition of cache tags as an enabler for data versioning, recently announced by the industry (i.e. IBM BlueGene/Q), could allow a better exploitation of parallelism at the microarchitecture level. In this paper, we present an execution model that supports both DOPIPE-based speculation and traditional speculative parallelization techniques. It is based on a simple cache tagging approach for data versioning, which integrates smoothly with typical cache coherence protocols, not requiring any changes to them. Experimental results, using SPEC and PARSEC benchmarks, reveal substantial speedups in a 24-core simulated CMP, while demonstrate improved scalability when compared to a software-only approach.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信