Proceedings of the Seventh Euromicro Workshop on Parallel and Distributed Processing. PDP'99最新文献

筛选
英文 中文
Optimizing network throughput: optimal versus robust design 优化网络吞吐量:最优与稳健设计
P. López, R. Alcover, J. Duato, L. Zúnica
{"title":"Optimizing network throughput: optimal versus robust design","authors":"P. López, R. Alcover, J. Duato, L. Zúnica","doi":"10.1109/EMPDP.1999.746644","DOIUrl":"https://doi.org/10.1109/EMPDP.1999.746644","url":null,"abstract":"Interconnection network performance is usually measured in terms of its latency (time required to deliver a message) and throughput (maximum traffic accepted by the network). At first glance, minimizing average message latency is the main designer goal, because average network traffic is usually far from saturation. However, applications can also generate very high peak traffic. In order to deal with such situations, it is important that network throughput is also high. On the other hand, interconnection network performance depends on several parameters. Some of them can be chosen by the designer: routing algorithm, switching technique, topology and node design parameters. However, there are other parameters that cannot be selected by the designer. Among these, there are parameters that depend on the application, such as message size, message destination distribution and message traffic, as well as parameters defined by the customer, such as network size. Network designer can select the design parameters that maximize average (optimal design) or the design parameters that achieve a good performance under all the feasible combinations of the parameters that cannot be selected by him (robust design). Notice that both alternatives do not always lead to the same parameter configuration. Previously we chose the design parameters of a k-ary n-cube network considering optimize latency. In this case, optimal and robust design lead to the same choice. In this paper, we obtain these design parameters considering optimized network throughput. Unfortunately, there is a discrepancy between optimal and robust design criteria, being the former the best choice.","PeriodicalId":335983,"journal":{"name":"Proceedings of the Seventh Euromicro Workshop on Parallel and Distributed Processing. PDP'99","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115141150","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
The split data cache in multiprocessor systems: an initial hit ratio analysis 多处理器系统中的分割数据缓存:初始命中率分析
J. Sahuquillo, A. Pont
{"title":"The split data cache in multiprocessor systems: an initial hit ratio analysis","authors":"J. Sahuquillo, A. Pont","doi":"10.1109/EMPDP.1999.746641","DOIUrl":"https://doi.org/10.1109/EMPDP.1999.746641","url":null,"abstract":"As current first level (L1) data caches are poorly and inefficiently managed, new approaches to achieve better performance in uniprocessor systems have been proposed. The L1 data cache management system is basically the same as it was three decades ago. New organizations have recently been proposed, where two multi-lateral caches are included in the first level in accordance with the data locality where they are stored. The processor simultaneously sends the same memory request to both caches located in L1. These caches work independently and have different organizations. The main objective is to minimize the average data access time. These new organizations will normally increase the hit ratio. Additionally, the chip area occupied by these caches-including the necessary management hardware-is smaller than in a conventional organization. As the proposed cache size is smaller, it can work faster and improve access time at this level. Several authors have studied different approaches around this idea in uniprocessors. In this work we have made extensions for shared memory multiprocessors and studied the advantages.","PeriodicalId":335983,"journal":{"name":"Proceedings of the Seventh Euromicro Workshop on Parallel and Distributed Processing. PDP'99","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114151714","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Testing and debugging message passing applications based on the synergy of program and specification executions 基于程序和规范执行的协同作用测试和调试消息传递应用程序
Z. Tsiatsoulis, Y. Cotronis, E. Floros
{"title":"Testing and debugging message passing applications based on the synergy of program and specification executions","authors":"Z. Tsiatsoulis, Y. Cotronis, E. Floros","doi":"10.1109/EMPDP.1999.746668","DOIUrl":"https://doi.org/10.1109/EMPDP.1999.746668","url":null,"abstract":"We outline Ensemble, a design and implementation methodology for composing message passing (MP) applications from program components. We also outline specification composition, directly associated with application composition. We present the integration of specification and implementation of program development. We particularly elaborate on testing and debugging of MP applications based on the synergy of tools for specification simulations with tools for program execution visualisation.","PeriodicalId":335983,"journal":{"name":"Proceedings of the Seventh Euromicro Workshop on Parallel and Distributed Processing. PDP'99","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131835285","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Parallel resolution of alternating-line processes by means of pipelining techniques 利用流水线技术对交替线过程进行并行解析
David Espadas, M. Prieto, I. Llorente, F. Tirado
{"title":"Parallel resolution of alternating-line processes by means of pipelining techniques","authors":"David Espadas, M. Prieto, I. Llorente, F. Tirado","doi":"10.1109/EMPDP.1999.746691","DOIUrl":"https://doi.org/10.1109/EMPDP.1999.746691","url":null,"abstract":"The aim of this paper is to present an easy and efficient method to implement alternating-line processes on current parallel computers. First we show how data locality has an important impact on global efficiency, which leads us to the conclusion that one-dimensional compositions are the most convenient ones for 2D problems. Once this is asserted, a parallel algorithm is presented for the solution of the distributed tridiagonal systems along the partitioned direction. The key idea is to pipeline the simultaneous resolution of many systems of equations, not parallelising each resolution separately. This approach presents good numerical and architectural properties, in terms of memory usage and data locality, and high parallel efficiencies are obtained. For the case of alternating-line processes, the election of the optimal decomposition is studied. The experimental results have been obtained on a Cray T3E.","PeriodicalId":335983,"journal":{"name":"Proceedings of the Seventh Euromicro Workshop on Parallel and Distributed Processing. PDP'99","volume":"115 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115700670","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
The impact of cache organisation on the instruction issue rate of a superscalar processor 缓存组织对超标量处理器指令发放率的影响
L. Vintan, Cristian Armat, G. Steven
{"title":"The impact of cache organisation on the instruction issue rate of a superscalar processor","authors":"L. Vintan, Cristian Armat, G. Steven","doi":"10.1109/EMPDP.1999.746646","DOIUrl":"https://doi.org/10.1109/EMPDP.1999.746646","url":null,"abstract":"Much of the research on multiple-instruction-issue processor architecture assumes a perfect memory hierarchy and concentrates on increasing the instruction issue rate of the processor either through aggressive out-of-order instruction issue or through static instruction scheduling. In this paper we describe a trace driven simulation tool that we have developed to quantify the impact of the memory hierarchy on the performance of a superscalar processor that we have developed to support static instruction scheduling. We describe some initial studies performed using our simulator. As well as examining the more conventional split cache configurations, we also quantify the performance impact of using a unified cache. Finally, we examine the benefits of using two-level caches and victim caches.","PeriodicalId":335983,"journal":{"name":"Proceedings of the Seventh Euromicro Workshop on Parallel and Distributed Processing. PDP'99","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132159344","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Performance evaluation of the bubble algorithm: benefits for k-ary n-cubes 气泡算法的性能评估:k-ary n-cubes的好处
C. Carrión, R. Beivide, J. Gregorio
{"title":"Performance evaluation of the bubble algorithm: benefits for k-ary n-cubes","authors":"C. Carrión, R. Beivide, J. Gregorio","doi":"10.1109/EMPDP.1999.746699","DOIUrl":"https://doi.org/10.1109/EMPDP.1999.746699","url":null,"abstract":"The bubble algorithm evaluated in this paper assures message deadlock freedom in k-ary, n-cube network without using virtual channels. This algorithm is based both on a dimension order I outing (DOR) and on a restricted injection policy extended to the dimension changes. An exhaustive comparison between the bubble mechanism and the classical deterministic virtual channels solution is presented here. For that purpose, the message router of both proposals has been designed by using VHDL descriptions and the Synopsys VLSI CAD tool. Additionally, formal models of the routers, based on colored Petri nets, have been carried out together with simulation techniques in order to assure the validation of the results and shorten the design cycle. The performance evaluation of n-dimension tori highlights the benefits of the bubble algorithm as both the temporal delay and the necessary silicon area of the message router are reduced.","PeriodicalId":335983,"journal":{"name":"Proceedings of the Seventh Euromicro Workshop on Parallel and Distributed Processing. PDP'99","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126841085","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A framework backbone for software fault tolerance in embedded parallel applications 嵌入式并行应用中软件容错的框架主干
Geert Deconinck, M. Truyens, V. D. Florio, W. Rosseel, R. Lauwereins, R. Belmans
{"title":"A framework backbone for software fault tolerance in embedded parallel applications","authors":"Geert Deconinck, M. Truyens, V. D. Florio, W. Rosseel, R. Lauwereins, R. Belmans","doi":"10.1109/EMPDP.1999.746666","DOIUrl":"https://doi.org/10.1109/EMPDP.1999.746666","url":null,"abstract":"The DIR net (detection-isolation-recovery net) is the main module of a software framework for the development of embedded supercomputing applications. This framework provides a set of functional elements, collected in a library, to improve the dependability attributes of the applications (especially the availability). The DIR net enables these functional elements to cooperate and enhances their efficiency by controlling and co-ordinating them. As a supervisor and the main executor of the fault tolerance strategy, it is the backbone of the framework, of which the application developer is the architect. Moreover, it provides an interface to which all detection and recovery tools should conform. Although the DIR net is meant to be used together within this fault tolerance framework, the adopted concepts and design decisions have a more general value, and can be applied in a wide range of parallel systems.","PeriodicalId":335983,"journal":{"name":"Proceedings of the Seventh Euromicro Workshop on Parallel and Distributed Processing. PDP'99","volume":"179 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123030112","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Dynamic load adaption in LIPS LIPS中的动态负载自适应
Thomas Setz
{"title":"Dynamic load adaption in LIPS","authors":"Thomas Setz","doi":"10.1109/EMPDP.1999.746702","DOIUrl":"https://doi.org/10.1109/EMPDP.1999.746702","url":null,"abstract":"LIPS is a system for distributed computing using idle-cycles in heterogeneous networks of workstations. Especially data- and compute-intensive applications in the field of cryptography and computer algebra have used the system. The system provides its user with the tuple space based generative communication paradigm of parallel computing as known from the coordination language LINDA. In LIPS, failures (fail stop failures) like crashed machines are handled transparently for the application. Dynamic Load Adaption, meaning removing application processes from machines not being idle any longer and migrating those processes to idle machines is based on the detection of crashed application processes and the (re)start of application processes on an idle machine. The implementation of Dynamic Load Adaption for LIPS applications is easy, because checkpoint generation and the restart from a checkpoint is independent from the other application processes. As the crash of an application process (assuming the machine and the operating system the application process resides survive) can be detected very fast, the used mechanism allows for fast adaption of the applications distribution to changes in the NOW availability.","PeriodicalId":335983,"journal":{"name":"Proceedings of the Seventh Euromicro Workshop on Parallel and Distributed Processing. PDP'99","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115300366","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A replicated resource architecture for high performance network service 用于高性能网络服务的复制资源体系结构
C. Allison, M. Bramley, Jose Serrano
{"title":"A replicated resource architecture for high performance network service","authors":"C. Allison, M. Bramley, Jose Serrano","doi":"10.1109/EMPDP.1999.746652","DOIUrl":"https://doi.org/10.1109/EMPDP.1999.746652","url":null,"abstract":"Distributed Learning Environments represent the hope that communications and information technology can improve and widen access to education while maintaining and improving its quality. Such environments consist of network applications and services. Good interactive response time is crucial to their success. Slow responses can quickly dissuade teachers and learners alike from investing their time in the use of these services. Responsiveness timings taken across 155 Mb/s IP/ATM networks have exposed traditional monolithic server performance as the main bottleneck in interactive response time. A strategy of providing bigger and faster monolithic server hardware in response to each occurrence of system slow down is not a good solution as it is expensive and inflexible. Cluster computing has proven a successful and cost effective alternative to conventional supercomputing and it would now seem to be appropriate to investigate its application to the problem of high performance network service provision. In order to research this issue a replicated resolute architecture has been designed to harness the combined power of multiple independent computers. The architecture is outlined and an initial implementation of its core component, a coherence server, is described. Results are presented which indicate that this approach is viable within the context of Distributed Learning Environments.","PeriodicalId":335983,"journal":{"name":"Proceedings of the Seventh Euromicro Workshop on Parallel and Distributed Processing. PDP'99","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132366022","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
On the performance of nearest-neighbors load balancing algorithms in parallel systems 并行系统中最近邻负载均衡算法的性能研究
A. Cortés, A. Ripoll, M. A. Senar, P. Pons, E. Luque
{"title":"On the performance of nearest-neighbors load balancing algorithms in parallel systems","authors":"A. Cortés, A. Ripoll, M. A. Senar, P. Pons, E. Luque","doi":"10.1109/EMPDP.1999.746661","DOIUrl":"https://doi.org/10.1109/EMPDP.1999.746661","url":null,"abstract":"DASUD (Diffusion Algorithm Searching Unbalanced Domains) is a totally distributed load-balancing algorithm which belongs to the nearest-neighbors class. DASUD detects unbalanced domains (a processor and its immediate neighbors) and corrects this situation by allowing load movements between non-connected processors. DASUD has been evaluated by comparison with two well-known nearest-neighbors load balancing strategies, namely, the GDE (Generalized Dimension Exchange) and the SID (Sender Initiated Diffusion) by considering a large set of initial load distributions. These distributions were applied to ring, tents and hypercube topologies, and the number of processors ranged from 8 to 128. From these experiments we have observed that DASUD outperforms the other strategies used in the comparison as it provides the best trade-off between the balance degree obtained at the final state and the number of iterations required to reach this state.","PeriodicalId":335983,"journal":{"name":"Proceedings of the Seventh Euromicro Workshop on Parallel and Distributed Processing. PDP'99","volume":"116 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115144315","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信