并行程序的面向模型分析

J. González, C. León, J. R. García, C. Rodríguez, J. Rodríguez, F. D. Sande, A. M. Printista
{"title":"并行程序的面向模型分析","authors":"J. González, C. León, J. R. García, C. Rodríguez, J. Rodríguez, F. D. Sande, A. M. Printista","doi":"10.1109/EMPDP.2002.994212","DOIUrl":null,"url":null,"abstract":"The prediction analysis model presented extends BSP to cover both oblivious synchronization and group partitioning. These generalizations imply that different processors may finish the same superstep at different times. The other consideration is that, even if the numbers of individual communication or computation operations in two stages are the same, the actual times for these two stages may differ. These differences are due to the separate nature of the operations or to the particular pattern followed by the messages. Even worse, the assumption that a constant number of machine instructions takes constant time is far from the truth. Current memory hierarchies imply that memory access vary from a few cycles to several thousands. A natural proposal is to associate a different proportionality constant with each basic block, and analogously, to associate different latencies and bandwidths with each \"communication block\". Unfortunately, to use this approach implies that the evaluation parameters not only depend on given architecture, but also reflect algorithm characteristics. Such parameter evaluation must be done for every algorithm. This is a heavy task, implying experiment design, timing, statistics, pattern recognition and multi-parameter fitting algorithms. Software support is required. We have developed a compiler that takes as source a C program annotated with complexity formulas and produces as output an instrumented code. The trace files obtained from the execution of the resulting code are analyzed with an interactive interpreter giving us, among other information, the values of those parameters.","PeriodicalId":126071,"journal":{"name":"Proceedings 10th Euromicro Workshop on Parallel, Distributed and Network-based Processing","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2002-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Model oriented profiling of parallel programs\",\"authors\":\"J. González, C. León, J. R. García, C. Rodríguez, J. Rodríguez, F. D. Sande, A. M. Printista\",\"doi\":\"10.1109/EMPDP.2002.994212\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The prediction analysis model presented extends BSP to cover both oblivious synchronization and group partitioning. These generalizations imply that different processors may finish the same superstep at different times. The other consideration is that, even if the numbers of individual communication or computation operations in two stages are the same, the actual times for these two stages may differ. These differences are due to the separate nature of the operations or to the particular pattern followed by the messages. Even worse, the assumption that a constant number of machine instructions takes constant time is far from the truth. Current memory hierarchies imply that memory access vary from a few cycles to several thousands. A natural proposal is to associate a different proportionality constant with each basic block, and analogously, to associate different latencies and bandwidths with each \\\"communication block\\\". Unfortunately, to use this approach implies that the evaluation parameters not only depend on given architecture, but also reflect algorithm characteristics. Such parameter evaluation must be done for every algorithm. This is a heavy task, implying experiment design, timing, statistics, pattern recognition and multi-parameter fitting algorithms. Software support is required. We have developed a compiler that takes as source a C program annotated with complexity formulas and produces as output an instrumented code. The trace files obtained from the execution of the resulting code are analyzed with an interactive interpreter giving us, among other information, the values of those parameters.\",\"PeriodicalId\":126071,\"journal\":{\"name\":\"Proceedings 10th Euromicro Workshop on Parallel, Distributed and Network-based Processing\",\"volume\":\"17 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2002-01-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings 10th Euromicro Workshop on Parallel, Distributed and Network-based Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/EMPDP.2002.994212\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings 10th Euromicro Workshop on Parallel, Distributed and Network-based Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/EMPDP.2002.994212","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

提出的预测分析模型扩展了BSP,涵盖了遗忘同步和组划分。这些概括意味着不同的处理器可能在不同的时间完成相同的超步骤。另一个考虑是,即使两个阶段中单个通信或计算操作的数量相同,这两个阶段的实际时间也可能不同。这些差异是由于操作的不同性质或消息遵循的特定模式造成的。更糟糕的是,恒定数量的机器指令需要恒定时间的假设与事实相距甚远。当前的内存层次结构意味着内存访问周期从几个周期到几千个周期不等。一个自然的建议是将不同的比例常数与每个基本块关联起来,类似地,将不同的延迟和带宽与每个“通信块”关联起来。不幸的是,使用这种方法意味着评估参数不仅依赖于给定的体系结构,而且还反映了算法的特征。每一种算法都必须进行这样的参数计算。这是一项艰巨的任务,需要实验设计、定时、统计、模式识别和多参数拟合算法。需要软件支持。我们开发了一个编译器,它以带有复杂公式注释的C程序作为源程序,并产生带有工具的代码作为输出。通过交互式解释器分析从结果代码执行中获得的跟踪文件,该解释器向我们提供了这些参数的值等信息。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Model oriented profiling of parallel programs
The prediction analysis model presented extends BSP to cover both oblivious synchronization and group partitioning. These generalizations imply that different processors may finish the same superstep at different times. The other consideration is that, even if the numbers of individual communication or computation operations in two stages are the same, the actual times for these two stages may differ. These differences are due to the separate nature of the operations or to the particular pattern followed by the messages. Even worse, the assumption that a constant number of machine instructions takes constant time is far from the truth. Current memory hierarchies imply that memory access vary from a few cycles to several thousands. A natural proposal is to associate a different proportionality constant with each basic block, and analogously, to associate different latencies and bandwidths with each "communication block". Unfortunately, to use this approach implies that the evaluation parameters not only depend on given architecture, but also reflect algorithm characteristics. Such parameter evaluation must be done for every algorithm. This is a heavy task, implying experiment design, timing, statistics, pattern recognition and multi-parameter fitting algorithms. Software support is required. We have developed a compiler that takes as source a C program annotated with complexity formulas and produces as output an instrumented code. The trace files obtained from the execution of the resulting code are analyzed with an interactive interpreter giving us, among other information, the values of those parameters.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信