基于组件标注和循环携带概率的DOACROSS并行化

2018 30th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD) Pub Date : 2018-09-01 DOI:10.1109/CAHPC.2018.8645904

Luis Mattos, D. C. S. Lucas, Juan Salamanca, J. P. L. Carvalho, M. Pereira, G. Araújo

{"title":"基于组件标注和循环携带概率的DOACROSS并行化","authors":"Luis Mattos, D. C. S. Lucas, Juan Salamanca, J. P. L. Carvalho, M. Pereira, G. Araújo","doi":"10.1109/CAHPC.2018.8645904","DOIUrl":null,"url":null,"abstract":"Although modern compilers implement many loop parallelization techniques, their application is typically restricted to loops that have no loop-carried dependences (DOALL) or that contain well-known structured dependence patterns (e.g. reduction). These restrictions preclude the parallelization of many computational intensive DOACROSS loops. In such loops, either the compiler finds at least one loop-carried dependence or it cannot prove, at compile-time, that the loop is free of such dependences, even though they might never show-up at runtime. In any case, most compilers end-up not parallelizing DOACROSS loops. This paper brings three contributions to address this problem. First, it integrates three algorithms (TLS, DOAX, and BDX) into a simple openMP clause that enables the programmer to select the best algorithm for a given loop. Second, it proposes an annotation approach to separate the sequential components of a loop, thus exposing other components to parallelization. Finally, it shows that loop-carried probability is an effective metric to decide when to use TLS or other non-speculative techniques (e.g. DOAX or BDX) to parallelize DOACROSS loops. Experimental results reveal that, for certain loops, slow-downs can be transformed in 2×speed-ups by quickly selecting the appropriate algorithm.","PeriodicalId":307747,"journal":{"name":"2018 30th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)","volume":"69 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"DOACROSS Parallelization Based on Component Annotation and Loop-Carried Probability\",\"authors\":\"Luis Mattos, D. C. S. Lucas, Juan Salamanca, J. P. L. Carvalho, M. Pereira, G. Araújo\",\"doi\":\"10.1109/CAHPC.2018.8645904\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Although modern compilers implement many loop parallelization techniques, their application is typically restricted to loops that have no loop-carried dependences (DOALL) or that contain well-known structured dependence patterns (e.g. reduction). These restrictions preclude the parallelization of many computational intensive DOACROSS loops. In such loops, either the compiler finds at least one loop-carried dependence or it cannot prove, at compile-time, that the loop is free of such dependences, even though they might never show-up at runtime. In any case, most compilers end-up not parallelizing DOACROSS loops. This paper brings three contributions to address this problem. First, it integrates three algorithms (TLS, DOAX, and BDX) into a simple openMP clause that enables the programmer to select the best algorithm for a given loop. Second, it proposes an annotation approach to separate the sequential components of a loop, thus exposing other components to parallelization. Finally, it shows that loop-carried probability is an effective metric to decide when to use TLS or other non-speculative techniques (e.g. DOAX or BDX) to parallelize DOACROSS loops. Experimental results reveal that, for certain loops, slow-downs can be transformed in 2×speed-ups by quickly selecting the appropriate algorithm.\",\"PeriodicalId\":307747,\"journal\":{\"name\":\"2018 30th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)\",\"volume\":\"69 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 30th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CAHPC.2018.8645904\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 30th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CAHPC.2018.8645904","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

尽管现代编译器实现了许多循环并行化技术，但它们的应用通常仅限于没有循环携带依赖关系(DOALL)或包含众所周知的结构化依赖模式(例如缩减)的循环。这些限制排除了许多计算密集型DOACROSS循环的并行化。在这样的循环中，编译器要么找到至少一个循环携带的依赖项，要么无法在编译时证明循环没有这样的依赖项，即使它们可能永远不会在运行时出现。在任何情况下，大多数编译器最终都不会并行化DOACROSS循环。本文为解决这一问题做出了三方面的贡献。首先，它将三种算法(TLS、DOAX和BDX)集成到一个简单的openMP子句中，使程序员能够为给定的循环选择最佳算法。其次，它提出了一种注释方法来分离循环的顺序组件，从而将其他组件暴露于并行化。最后，它表明环携带概率是决定何时使用TLS或其他非推测技术(例如DOAX或BDX)并行DOACROSS循环的有效度量。实验结果表明，对于某些循环，通过快速选择适当的算法可以在2×speed-ups中转换慢速。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

DOACROSS Parallelization Based on Component Annotation and Loop-Carried Probability

Although modern compilers implement many loop parallelization techniques, their application is typically restricted to loops that have no loop-carried dependences (DOALL) or that contain well-known structured dependence patterns (e.g. reduction). These restrictions preclude the parallelization of many computational intensive DOACROSS loops. In such loops, either the compiler finds at least one loop-carried dependence or it cannot prove, at compile-time, that the loop is free of such dependences, even though they might never show-up at runtime. In any case, most compilers end-up not parallelizing DOACROSS loops. This paper brings three contributions to address this problem. First, it integrates three algorithms (TLS, DOAX, and BDX) into a simple openMP clause that enables the programmer to select the best algorithm for a given loop. Second, it proposes an annotation approach to separate the sequential components of a loop, thus exposing other components to parallelization. Finally, it shows that loop-carried probability is an effective metric to decide when to use TLS or other non-speculative techniques (e.g. DOAX or BDX) to parallelize DOACROSS loops. Experimental results reveal that, for certain loops, slow-downs can be transformed in 2×speed-ups by quickly selecting the appropriate algorithm.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2018 30th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)

自引率

0.00%

发文量