对DOALL循环序列应用数组收缩

Yonghong Song, Zhiyuan Li
{"title":"对DOALL循环序列应用数组收缩","authors":"Yonghong Song, Zhiyuan Li","doi":"10.1109/ICPP.2004.1327903","DOIUrl":null,"url":null,"abstract":"Efficient program execution on multiprocessor computers requires both sufficient parallelism and good data locality. Recent research found that, using a combination of loop shifting, loop fusion, and array contraction, one can reduce the memory required to execute a sequence of serial loops, thereby to improve the cache locality. This paper studies how to extend such a memory-reduction scheme to a sequence of DOALL loops, which are executed in parallel on multiprocessors. Two methods are proposed to overcome difficulties caused by loop-carried dependences. Data copy-in is performed to remove anti-dependences between different parallel threads, and computation duplication is performed to remove flow dependences. Experiments performed on a number of benchmark programs show that the proposed technique improves both cache locality and parallel execution speed for the DOALL loops. The scheme achieves an average speedup of 1.41 for 17 programs on a 4-processor SUN machine.","PeriodicalId":106240,"journal":{"name":"International Conference on Parallel Processing, 2004. ICPP 2004.","volume":"90 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2004-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Applying array contraction to a sequence of DOALL loops\",\"authors\":\"Yonghong Song, Zhiyuan Li\",\"doi\":\"10.1109/ICPP.2004.1327903\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Efficient program execution on multiprocessor computers requires both sufficient parallelism and good data locality. Recent research found that, using a combination of loop shifting, loop fusion, and array contraction, one can reduce the memory required to execute a sequence of serial loops, thereby to improve the cache locality. This paper studies how to extend such a memory-reduction scheme to a sequence of DOALL loops, which are executed in parallel on multiprocessors. Two methods are proposed to overcome difficulties caused by loop-carried dependences. Data copy-in is performed to remove anti-dependences between different parallel threads, and computation duplication is performed to remove flow dependences. Experiments performed on a number of benchmark programs show that the proposed technique improves both cache locality and parallel execution speed for the DOALL loops. The scheme achieves an average speedup of 1.41 for 17 programs on a 4-processor SUN machine.\",\"PeriodicalId\":106240,\"journal\":{\"name\":\"International Conference on Parallel Processing, 2004. ICPP 2004.\",\"volume\":\"90 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2004-08-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Conference on Parallel Processing, 2004. ICPP 2004.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICPP.2004.1327903\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Parallel Processing, 2004. ICPP 2004.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICPP.2004.1327903","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

摘要

在多处理器计算机上高效地执行程序需要足够的并行性和良好的数据局部性。最近的研究发现,使用循环移位、循环融合和阵列收缩的组合,可以减少执行一系列串行循环所需的内存,从而改善缓存局域性。本文研究了如何将这种内存缩减方案扩展到在多处理器上并行执行的DOALL循环序列。提出了两种方法来克服由环携带依赖性引起的困难。执行数据复制以消除不同并行线程之间的反依赖,执行计算重复以消除流依赖。在一些基准程序上进行的实验表明,所提出的技术提高了DOALL循环的缓存局部性和并行执行速度。该方案在一台4处理器的SUN机器上实现了17个程序平均1.41的加速。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Applying array contraction to a sequence of DOALL loops
Efficient program execution on multiprocessor computers requires both sufficient parallelism and good data locality. Recent research found that, using a combination of loop shifting, loop fusion, and array contraction, one can reduce the memory required to execute a sequence of serial loops, thereby to improve the cache locality. This paper studies how to extend such a memory-reduction scheme to a sequence of DOALL loops, which are executed in parallel on multiprocessors. Two methods are proposed to overcome difficulties caused by loop-carried dependences. Data copy-in is performed to remove anti-dependences between different parallel threads, and computation duplication is performed to remove flow dependences. Experiments performed on a number of benchmark programs show that the proposed technique improves both cache locality and parallel execution speed for the DOALL loops. The scheme achieves an average speedup of 1.41 for 17 programs on a 4-processor SUN machine.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信