最小割程序分解法的循环性能改进

K. Ootsu, Takeshi Abe, T. Yokota, T. Baba
{"title":"最小割程序分解法的循环性能改进","authors":"K. Ootsu, Takeshi Abe, T. Yokota, T. Baba","doi":"10.1109/IC-NC.2010.47","DOIUrl":null,"url":null,"abstract":"In recent years, speedup by the thread level parallel processing becomes more and more important with the spread of multi-core processors, and various techniques for parallelizing the single thread code into the efficient multithreaded code that can achieve efficient thread level parallel processing have been developed. The speculative multithreading is an important technology for achieving the high performance by the thread level parallel processing. To improve the execution performance by speculative multithreading, it is necessary to appropriately decompose the program code. Against the background of this, T. A. Johnson, et al. proposed a program decomposition technique (hereafter, we refer to their method as min-cut method) for finding the decomposition pattern that can minimize the effects of the performance degradation factors, by finding the minimum cut set in the weighted control flow graph (CFG) of the program. The min-cut method is wide coverage and a very promising technique since the whole program can be decomposed without being restricted to the logical structures, such as loop. However, the min-cut method has a problem that the loop level parallelism cannot be utilized enough. In this paper, we propose an improved method for the min-cut method to enhance the loop execution performance. Our method is based on the min-cut method and tries to apply the loop unrolling to the loop in the target program codes during the process of the decomposition. We apply our method to the practical program codes selected from the SPEC CINT2000 benchmarks. The results show that the loops, that are not decomposed with the min-cut method, are decomposed, and the possibilities of making use of the loop level parallelism increased. In addition, the results of the performance evaluation by using a cycle-based simulator show that our method can improve the loop execution performance, as compared to the min-cut method.","PeriodicalId":375145,"journal":{"name":"2010 First International Conference on Networking and Computing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Loop Performance Improvement for Min-cut Program Decomposition Method\",\"authors\":\"K. Ootsu, Takeshi Abe, T. Yokota, T. Baba\",\"doi\":\"10.1109/IC-NC.2010.47\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In recent years, speedup by the thread level parallel processing becomes more and more important with the spread of multi-core processors, and various techniques for parallelizing the single thread code into the efficient multithreaded code that can achieve efficient thread level parallel processing have been developed. The speculative multithreading is an important technology for achieving the high performance by the thread level parallel processing. To improve the execution performance by speculative multithreading, it is necessary to appropriately decompose the program code. Against the background of this, T. A. Johnson, et al. proposed a program decomposition technique (hereafter, we refer to their method as min-cut method) for finding the decomposition pattern that can minimize the effects of the performance degradation factors, by finding the minimum cut set in the weighted control flow graph (CFG) of the program. The min-cut method is wide coverage and a very promising technique since the whole program can be decomposed without being restricted to the logical structures, such as loop. However, the min-cut method has a problem that the loop level parallelism cannot be utilized enough. In this paper, we propose an improved method for the min-cut method to enhance the loop execution performance. Our method is based on the min-cut method and tries to apply the loop unrolling to the loop in the target program codes during the process of the decomposition. We apply our method to the practical program codes selected from the SPEC CINT2000 benchmarks. The results show that the loops, that are not decomposed with the min-cut method, are decomposed, and the possibilities of making use of the loop level parallelism increased. In addition, the results of the performance evaluation by using a cycle-based simulator show that our method can improve the loop execution performance, as compared to the min-cut method.\",\"PeriodicalId\":375145,\"journal\":{\"name\":\"2010 First International Conference on Networking and Computing\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-11-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2010 First International Conference on Networking and Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IC-NC.2010.47\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 First International Conference on Networking and Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IC-NC.2010.47","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

摘要

近年来,随着多核处理器的普及,线程级并行处理的加速变得越来越重要,各种将单线程代码并行化为能够实现高效线程级并行处理的高效多线程代码的技术被开发出来。推测式多线程是通过线程级并行处理实现高性能的重要技术。为了提高推测多线程的执行性能,有必要对程序代码进行适当的分解。在此背景下,t.a.j hson等人提出了一种程序分解技术(以下简称其方法为min-cut法),通过在程序的加权控制流图(CFG)中找到最小割集,来寻找能使性能退化因素影响最小化的分解模式。最小分割法是一种很有前途的技术,因为它可以在不局限于循环等逻辑结构的情况下对整个程序进行分解。然而,最小割法存在着不能充分利用循环级并行性的问题。在本文中,我们提出了一种改进的最小切割方法来提高循环的执行性能。我们的方法是基于最小切割方法,并试图在分解过程中将循环展开应用于目标程序代码中的循环。我们将我们的方法应用于从SPEC CINT2000基准中选择的实际程序代码。结果表明,不能用最小切割法分解的循环被分解了,增加了利用循环级并行性的可能性。此外,利用基于循环的模拟器进行性能评估的结果表明,与最小切割方法相比,我们的方法可以提高循环执行性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Loop Performance Improvement for Min-cut Program Decomposition Method
In recent years, speedup by the thread level parallel processing becomes more and more important with the spread of multi-core processors, and various techniques for parallelizing the single thread code into the efficient multithreaded code that can achieve efficient thread level parallel processing have been developed. The speculative multithreading is an important technology for achieving the high performance by the thread level parallel processing. To improve the execution performance by speculative multithreading, it is necessary to appropriately decompose the program code. Against the background of this, T. A. Johnson, et al. proposed a program decomposition technique (hereafter, we refer to their method as min-cut method) for finding the decomposition pattern that can minimize the effects of the performance degradation factors, by finding the minimum cut set in the weighted control flow graph (CFG) of the program. The min-cut method is wide coverage and a very promising technique since the whole program can be decomposed without being restricted to the logical structures, such as loop. However, the min-cut method has a problem that the loop level parallelism cannot be utilized enough. In this paper, we propose an improved method for the min-cut method to enhance the loop execution performance. Our method is based on the min-cut method and tries to apply the loop unrolling to the loop in the target program codes during the process of the decomposition. We apply our method to the practical program codes selected from the SPEC CINT2000 benchmarks. The results show that the loops, that are not decomposed with the min-cut method, are decomposed, and the possibilities of making use of the loop level parallelism increased. In addition, the results of the performance evaluation by using a cycle-based simulator show that our method can improve the loop execution performance, as compared to the min-cut method.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信