Proceedings 10th Euromicro Workshop on Parallel, Distributed and Network-based Processing最新文献_第2页

Incorporating memory layout in the modeling of message passing programs 在消息传递程序的建模中加入内存布局

Proceedings 10th Euromicro Workshop on Parallel, Distributed and Network-based Processing Pub Date : 2002-01-09 DOI: 10.1109/EMPDP.2002.994294

F. Seinstra, D. Koelma

{"title":"Incorporating memory layout in the modeling of message passing programs","authors":"F. Seinstra, D. Koelma","doi":"10.1109/EMPDP.2002.994294","DOIUrl":"https://doi.org/10.1109/EMPDP.2002.994294","url":null,"abstract":"One of the most fundamental tasks of an automatic parallelization tool is to find an optimal domain decomposition for a given application. For regular domain problems (such as simple matrix manipulations) this task may seem trivial. However, communication costs in message passing programs often significantly depend on the memory layout of data blocks to be transmitted. As a consequence, straightforward domain decompositions may be non-optimal. In this paper we introduce a new point-to-point communication model (called P-3PC) that is specifically designed to overcome this problem. In comparison with related models (e.g., LogGP) P-3PC is similar in complexity, but more accurate in many situations. Although the model is aimed at MPI's standard point-to-point operations, it is applicable to similar message passing definitions as well. The effectiveness of the model is tested in a framework for automatic parallelization of imaging applications. Experiments are performed on two Beowulf-type systems, each having a different interconnection network, and a different MPI implementation. Results show that, where other models frequently fail, P-3PC correctly predicts the communication costs related to any type of domain decomposition.","PeriodicalId":126071,"journal":{"name":"Proceedings 10th Euromicro Workshop on Parallel, Distributed and Network-based Processing","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128224436","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Increasing the adaptivity of routing algorithms for k-ary n-cubes 提高k-ary n-立方体路由算法的自适应

Proceedings 10th Euromicro Workshop on Parallel, Distributed and Network-based Processing Pub Date : 2002-01-09 DOI: 10.1109/EMPDP.2002.994333

Elvira Baydal, P. López, J. Duato

引用次数: 9

On improving the performance of data partitioning oriented parallel irregular reductions 面向数据分区的并行不规则约简性能改进研究

Proceedings 10th Euromicro Workshop on Parallel, Distributed and Network-based Processing Pub Date : 2002-01-09 DOI: 10.1109/EMPDP.2002.994330

E. Gutiérrez, O. Plata, E. Zapata

{"title":"On improving the performance of data partitioning oriented parallel irregular reductions","authors":"E. Gutiérrez, O. Plata, E. Zapata","doi":"10.1109/EMPDP.2002.994330","DOIUrl":"https://doi.org/10.1109/EMPDP.2002.994330","url":null,"abstract":"Different parallelization techniques for reductions have been classified in this paper into two classes: LPO (loop partitioning-oriented techniques) and DPO (data partitioning-oriented techniques). We have analyzed both classes in terms of a set of performance properties: data locality, memory overhead, parallelism and workload balancing. We propose several techniques to increase the exploited parallelism and to introduce load balancing into a DPO method. Regarding parallelism, the solution is based on the partial expansion of the reduction array. For load balancing, the first technique is generic, as it can deal with any kind of load unbalance present in the problem domain. The second technique handles a special case of load unbalancing appearing when there are a large number of write operations on small regions of the reduction arrays. Efficient implementations of the proposed optimizing solutions for the DWA-LIP (data write affinity-loop index prefetching) DPO method are presented, experimentally tested on static and dynamic kernel codes, and compared with other parallel reduction methods.","PeriodicalId":126071,"journal":{"name":"Proceedings 10th Euromicro Workshop on Parallel, Distributed and Network-based Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131020595","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

On the impossibility of implementing perpetual failure detectors in partially synchronous systems 在部分同步系统中实现永久故障检测器的不可能性

Proceedings 10th Euromicro Workshop on Parallel, Distributed and Network-based Processing Pub Date : 2002-01-09 DOI: 10.1109/EMPDP.2002.994241

M. Larrea, Antonio Fernández, S. Arévalo

引用次数: 12

Removing the latency overhead of the ITB mechanism in COWs with source routing 通过源路由消除奶牛中ITB机制的延迟开销

Proceedings 10th Euromicro Workshop on Parallel, Distributed and Network-based Processing Pub Date : 2002-01-09 DOI: 10.1109/EMPDP.2002.994334

J. Flich, Manuel P. Malumbres, P. López, J. Duato

{"title":"Removing the latency overhead of the ITB mechanism in COWs with source routing","authors":"J. Flich, Manuel P. Malumbres, P. López, J. Duato","doi":"10.1109/EMPDP.2002.994334","DOIUrl":"https://doi.org/10.1109/EMPDP.2002.994334","url":null,"abstract":"Clusters of workstations (COWs) are becoming increasingly popular as a cost-effective alternative to parallel computers. The in-transit buffer (ITB) mechanism can improve network performance when applied to COWs with irregular topology and source routing. This mechanism considerably improves the performance of this kind of network when compared to current source routing algorithms; however, it introduces a latency penalty. An implementation of this mechanism was performed, showing that the latency overhead of the mechanism may be noticeable, especially for short messages and at low network loads. In this paper, we analyze in detail the latency overhead of ITBs, proposing several mechanisms to reduce, hide and remove it. Firstly, we show, by simulation, the effect of an ITB implementation that is much slower than the one implemented. Then we propose three mechanisms that try to overcome the latency penalty. All the mechanisms are simple and can be easily implemented; also, they are out of the critical path of the ITB packet-processing procedure. The results show very good behaviour of the proposed mechanisms, considerably reducing or even completely removing the latency overhead.","PeriodicalId":126071,"journal":{"name":"Proceedings 10th Euromicro Workshop on Parallel, Distributed and Network-based Processing","volume":"83 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121522378","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Nodes bearing grudges: towards routing security, fairness, and robustness in mobile ad hoc networks 承载怨恨的节点:移动自组织网络中的路由安全性、公平性和鲁棒性

Proceedings 10th Euromicro Workshop on Parallel, Distributed and Network-based Processing Pub Date : 2002-01-09 DOI: 10.1109/EMPDP.2002.994321

S. Buchegger, J. Boudec

引用次数: 462

Geometric scheduling of 2-D UET-UCT uniform dependence loops 二维UET-UCT均匀依赖回路的几何调度

Proceedings 10th Euromicro Workshop on Parallel, Distributed and Network-based Processing Pub Date : 2002-01-09 DOI: 10.1109/EMPDP.2002.994305

Ioannis Drositis, T. Andronikos, G. Manis, G. Papakonstantinou, N. Koziris

引用次数: 0

Model oriented profiling of parallel programs 并行程序的面向模型分析

Proceedings 10th Euromicro Workshop on Parallel, Distributed and Network-based Processing Pub Date : 2002-01-09 DOI: 10.1109/EMPDP.2002.994212

J. González, C. León, J. R. García, C. Rodríguez, J. Rodríguez, F. D. Sande, A. M. Printista

{"title":"Model oriented profiling of parallel programs","authors":"J. González, C. León, J. R. García, C. Rodríguez, J. Rodríguez, F. D. Sande, A. M. Printista","doi":"10.1109/EMPDP.2002.994212","DOIUrl":"https://doi.org/10.1109/EMPDP.2002.994212","url":null,"abstract":"The prediction analysis model presented extends BSP to cover both oblivious synchronization and group partitioning. These generalizations imply that different processors may finish the same superstep at different times. The other consideration is that, even if the numbers of individual communication or computation operations in two stages are the same, the actual times for these two stages may differ. These differences are due to the separate nature of the operations or to the particular pattern followed by the messages. Even worse, the assumption that a constant number of machine instructions takes constant time is far from the truth. Current memory hierarchies imply that memory access vary from a few cycles to several thousands. A natural proposal is to associate a different proportionality constant with each basic block, and analogously, to associate different latencies and bandwidths with each \"communication block\". Unfortunately, to use this approach implies that the evaluation parameters not only depend on given architecture, but also reflect algorithm characteristics. Such parameter evaluation must be done for every algorithm. This is a heavy task, implying experiment design, timing, statistics, pattern recognition and multi-parameter fitting algorithms. Software support is required. We have developed a compiler that takes as source a C program annotated with complexity formulas and produces as output an instrumented code. The trace files obtained from the execution of the resulting code are analyzed with an interactive interpreter giving us, among other information, the values of those parameters.","PeriodicalId":126071,"journal":{"name":"Proceedings 10th Euromicro Workshop on Parallel, Distributed and Network-based Processing","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114963581","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Flexible service provision considering specific customer resource needs 灵活的服务提供考虑到特定的客户资源需求

Proceedings 10th Euromicro Workshop on Parallel, Distributed and Network-based Processing Pub Date : 2002-01-09 DOI: 10.1109/EMPDP.2002.994283

D. Thißen

{"title":"Flexible service provision considering specific customer resource needs","authors":"D. Thißen","doi":"10.1109/EMPDP.2002.994283","DOIUrl":"https://doi.org/10.1109/EMPDP.2002.994283","url":null,"abstract":"The development of global networks like the Internet has opened new possibilities for the co-operation of various organisations. A computing resource can be offered by one organisation and it can be remotely used by customers, i.e. other organisations or individuals, to perform some task or access some service on it. Such a resource not only has to be provided for a suitable price but, additionally, it has to be deployed in an efficient way, promising a good performance in service provision to satisfy the customers. Because existing infrastructures have to be integrated and used in the service provision process, it becomes necessary to develop new concepts for the management of the arising service-oriented distributed systems and the resources involved. This paper discusses a mechanism for the performance management of services in distributed environments. A service trader is used as a central component, supporting a customer in choosing a suitable service while considering the global state of the distributed system's resources using a load balancer. Management proxies encapsulate services or service groups and observe the performance and availability characteristics of the resources involved in a service usage process to fulfil the quality characteristics of a mediated service. This approach is designed to cause minimal involvement of service providers and customers in the selection and management process.","PeriodicalId":126071,"journal":{"name":"Proceedings 10th Euromicro Workshop on Parallel, Distributed and Network-based Processing","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125015400","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Dynamically reconfigurable system-on-programmable-chip 可编程芯片上的动态可重构系统

Proceedings 10th Euromicro Workshop on Parallel, Distributed and Network-based Processing Pub Date : 2002-01-09 DOI: 10.1109/EMPDP.2002.994277

Heiko Kalte, D. Langen, E. Vonnahme, A. Brinkmann, U. Rückert

引用次数: 36