2008 IEEE International Symposium on Parallel and Distributed Processing最新文献

筛选
英文 中文
Balancing HPC applications through smart allocation of resources in MT processors 通过在MT处理器中智能分配资源来平衡HPC应用程序
2008 IEEE International Symposium on Parallel and Distributed Processing Pub Date : 2008-04-14 DOI: 10.1109/IPDPS.2008.4536293
C. Boneti, R. Gioiosa, F. Cazorla, J. Corbalán, Jesús Labarta, M. Valero
{"title":"Balancing HPC applications through smart allocation of resources in MT processors","authors":"C. Boneti, R. Gioiosa, F. Cazorla, J. Corbalán, Jesús Labarta, M. Valero","doi":"10.1109/IPDPS.2008.4536293","DOIUrl":"https://doi.org/10.1109/IPDPS.2008.4536293","url":null,"abstract":"Many studies have shown that load imbalancing causes significant performance degradation in high performance computing (HPC) applications. Nowadays, multi-threaded (MT1) processors are widely used in HPC for their good performance/energy consumption and performance/cost ratios achieved sharing internal resources, like the instruction window or the physical register. Some of these processors provide the software hardware mechanisms for controlling the allocation of processor's internal resources. In this paper, we show, for the first time, that by appropriately using these mechanisms, we are able to control the tasks speed, reducing the imbalance in parallel applications transparently to the user and, hence, reducing the total execution time. Our results show that our proposal leads to a performance improvement up to 18% for one of the NAS benchmark. For a real HPC application (much more dynamic than the benchmark) the performance improvement is 8.1%. Our results also show that, if resource allocation is not used properly, the imbalance of applications is worsened causing performance loss.","PeriodicalId":162608,"journal":{"name":"2008 IEEE International Symposium on Parallel and Distributed Processing","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127661308","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
The impact of out-of-order commit in coarse-grain, fine-grain and simultaneous multithreaded architectures 乱序提交在粗粒度、细粒度和并发多线程架构中的影响
2008 IEEE International Symposium on Parallel and Distributed Processing Pub Date : 2008-04-14 DOI: 10.1109/IPDPS.2008.4536284
R. Ubal, J. Sahuquillo, S. Petit, P. López, J. Duato
{"title":"The impact of out-of-order commit in coarse-grain, fine-grain and simultaneous multithreaded architectures","authors":"R. Ubal, J. Sahuquillo, S. Petit, P. López, J. Duato","doi":"10.1109/IPDPS.2008.4536284","DOIUrl":"https://doi.org/10.1109/IPDPS.2008.4536284","url":null,"abstract":"Multithreaded processors in their different organizations (simultaneous, coarse grain and fine grain) have been shown as effective architectures to reduce the issue waste. On the other hand, retiring instructions from the pipeline in an out-of-order fashion helps to unclog the ROB when a long latency instruction reaches its head. This further contributes to maintain a higher utilization of the available issue bandwidth. In this paper, we evaluate the impact of retiring instructions out of order on different multithreaded architectures and different instruction fetch policies, using the recently proposed Validation Buffer microarchitecture as baseline out-of-order commit technique. Experimental results show that, for the same performance, out-of-order commit permits to reduce multithread hardware complexity (e.g., fine grain multithreading with a lower number of supported threads).","PeriodicalId":162608,"journal":{"name":"2008 IEEE International Symposium on Parallel and Distributed Processing","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131302920","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Monitoring for multi-middleware grid 多中间件网格监控
2008 IEEE International Symposium on Parallel and Distributed Processing Pub Date : 2008-04-14 DOI: 10.1109/IPDPS.2008.4536209
G. Poghosyan, M. Kunze
{"title":"Monitoring for multi-middleware grid","authors":"G. Poghosyan, M. Kunze","doi":"10.1109/IPDPS.2008.4536209","DOIUrl":"https://doi.org/10.1109/IPDPS.2008.4536209","url":null,"abstract":"Within the framework of the German Grid Computing Initiative (D-Grid), we study the monitoring systems and software suites that are used to collect the information from computational grids working with single or multiple middleware systems. Based on these investigations we build the prototypes of monitoring systems and implement it in the D-Grid infrastructure. A concept of Site Check Center (SCC) suggested to providing a unified interface for access to data from different test-benchmark systems working with more than one middleware software. A Vertical hierarchal architecture for exchange of information and building the network of monitoring systems is suggested and employed. A concept for separation between consumer and resource/service provider related monitoring information is proposed. Furthermore, we study the integration of monitoring components into general computational multi- middleware grid infrastructure developed according to specific community needs.","PeriodicalId":162608,"journal":{"name":"2008 IEEE International Symposium on Parallel and Distributed Processing","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133802672","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A survey of concurrent priority queue algorithms 并发优先队列算法综述
2008 IEEE International Symposium on Parallel and Distributed Processing Pub Date : 2008-04-14 DOI: 10.1109/IPDPS.2008.4536331
Kristijan Dragicevic, D. Bauer
{"title":"A survey of concurrent priority queue algorithms","authors":"Kristijan Dragicevic, D. Bauer","doi":"10.1109/IPDPS.2008.4536331","DOIUrl":"https://doi.org/10.1109/IPDPS.2008.4536331","url":null,"abstract":"Algorithms for concurrent data structures have gained attention in recent years as multi-core processors have become ubiquitous. Using the example of a concurrent priority queue, this paper investigates different synchronization methods and concurrent algorithms. It covers traditional lock-based approaches, non-blocking algorithms as well as a method based on software transactional memory. Besides discussing correctness criteria for the various approaches, we also present performance results for all algorithms for various scenarios. Somewhat surprisingly, we find that a simple lock-based approach performs reasonable well, even though it does not scale with the number of threads. Better scalability is achieved by non-blocking approaches.","PeriodicalId":162608,"journal":{"name":"2008 IEEE International Symposium on Parallel and Distributed Processing","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115434348","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
Parallel, scalable, memory-efficient backtracking for combinatoria modeling of large-scale biological systems 大规模生物系统组合建模的并行、可扩展、内存高效回溯
2008 IEEE International Symposium on Parallel and Distributed Processing Pub Date : 2008-04-14 DOI: 10.1109/IPDPS.2008.4536180
Byung-Hoon Park, Matthew C. Schmidt, K. Thomas, T. Karpinets, N. Samatova
{"title":"Parallel, scalable, memory-efficient backtracking for combinatoria modeling of large-scale biological systems","authors":"Byung-Hoon Park, Matthew C. Schmidt, K. Thomas, T. Karpinets, N. Samatova","doi":"10.1109/IPDPS.2008.4536180","DOIUrl":"https://doi.org/10.1109/IPDPS.2008.4536180","url":null,"abstract":"Data-driven modeling of biological systems such as protein- protein interaction networks is data-intensive and combinatorially challenging. Backtracking can constrain a combinatorial search space. Yet, its recursive nature, exacerbated by data-intensity, limits its applicability for large-scale systems. Parallel, scalable, and memory-efficient backtracking is a promising approach. Parallel backtracking suffers from unbalanced loads. Load rebalancing via synchronization and data movement is prohibitively expensive. Balancing these discrepancies, while minimizing end-to-end execution time and memory requirements, is desirable. This paper introduces such a framework. Its scalability and efficiency, demonstrated on the maximal clique enumeration problem, are attributed to the proposed: (a) representation of search tree decomposition to enable parallelization; (b) depth-first parallel search to minimize memory requirement; (c) least stringent synchronization to minimize data movement; and (d) on-demand work stealing with stack splitting to minimize processors' idle time. The applications of this framework to real biological problems related to bioethanol production are discussed.","PeriodicalId":162608,"journal":{"name":"2008 IEEE International Symposium on Parallel and Distributed Processing","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115493643","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Scheduling divisibleworkloads on heterogeneous platforms under bounded multi-port model 基于有界多端口模型的异构平台可分工作负载调度
2008 IEEE International Symposium on Parallel and Distributed Processing Pub Date : 2008-04-14 DOI: 10.1109/IPDPS.2008.4536170
Olivier Beaumont, N. Bonichon, Lionel Eyraud-Dubois
{"title":"Scheduling divisibleworkloads on heterogeneous platforms under bounded multi-port model","authors":"Olivier Beaumont, N. Bonichon, Lionel Eyraud-Dubois","doi":"10.1109/IPDPS.2008.4536170","DOIUrl":"https://doi.org/10.1109/IPDPS.2008.4536170","url":null,"abstract":"In this paper, we discuss complexity issues for scheduling divisible workloads on heterogeneous systems under the bounded multi-port model. To our best knowledge, this paper is the first attempt to consider divisible load scheduling under a realistic communication model, where the master node can communicate simultaneously to several slaves, provided that bandwidth constraints are not exceeded. In this paper, we concentrate on one round distribution schemes, where a given node starts its processing only once all data has been received. Our main contributions are (i) the proof that processors start working immediately after receiving their work (ii) the study of the optimal schedule in the case of 2 processors and (iii) the proof that scheduling divisible load under the bounded multi-port model is NP-complete. This last result strongly differs from divisible load literature and represents the first NP-completeness result when latencies are not taken into account.","PeriodicalId":162608,"journal":{"name":"2008 IEEE International Symposium on Parallel and Distributed Processing","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124519391","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
Designing and parameterizing a workflow for optimization: A case study in biomedical imaging 设计和参数化优化工作流程:生物医学成像的案例研究
2008 IEEE International Symposium on Parallel and Distributed Processing Pub Date : 2008-04-14 DOI: 10.1109/IPDPS.2008.4536411
Vijay S. Kumar, Mary W. Hall, J. Kim, Y. Gil, T. Kurç, E. Deelman, V. Ratnakar, J. Saltz
{"title":"Designing and parameterizing a workflow for optimization: A case study in biomedical imaging","authors":"Vijay S. Kumar, Mary W. Hall, J. Kim, Y. Gil, T. Kurç, E. Deelman, V. Ratnakar, J. Saltz","doi":"10.1109/IPDPS.2008.4536411","DOIUrl":"https://doi.org/10.1109/IPDPS.2008.4536411","url":null,"abstract":"This paper describes our experience to date employing the systematic mapping and optimization of large- scale scientific application workflows to current and future parallel platforms. The overall goal of the project is to integrate a set of system layers - application program, compiler, run-time environment, knowledge representation, optimization framework, and workflow manager - and through a systematic strategy for workflow mapping, our approach will exploit the vast machine resources available in such parallel platforms to dramatically increase the productivity of application programmers. In this paper, we describe the representation of a biomedical imaging application as a workflow, our early experiences in integrating the set of tools brought together for this project, and implications for future applications.","PeriodicalId":162608,"journal":{"name":"2008 IEEE International Symposium on Parallel and Distributed Processing","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115044363","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Synchronized send operations for efficient streaming block I/O over Myrinet 同步发送操作,有效的流块I/O在Myrinet上
2008 IEEE International Symposium on Parallel and Distributed Processing Pub Date : 2008-04-14 DOI: 10.1109/IPDPS.2008.4536142
Evangelos Koukis, Anastassios Nanos, N. Koziris
{"title":"Synchronized send operations for efficient streaming block I/O over Myrinet","authors":"Evangelos Koukis, Anastassios Nanos, N. Koziris","doi":"10.1109/IPDPS.2008.4536142","DOIUrl":"https://doi.org/10.1109/IPDPS.2008.4536142","url":null,"abstract":"Providing scalable clustered storage in a cost-effective way depends on the availability of an efficient network block device (nbd) layer. We study the performance of gmblock, an nbd server over Myrinet utilizing a direct disk-to-NIC data path which bypasses the CPU and main memory bus. To overcome the architectural limitation of a low number of outstanding requests, we focus on overlapping read and network I/O for a single request, in order to improve throughput. To this end, we introduce the concept of synchronized send operations and present an implementation on Myrinet/GM, based on custom modifications to the NIC firmware and associated userspace library. Compared to a network block sharing system over standard GM and the base version of gmblock, our enhanced implementation supporting synchronized sends delivers 81% and 44% higher throughput for streaming block I/O, respectively.","PeriodicalId":162608,"journal":{"name":"2008 IEEE International Symposium on Parallel and Distributed Processing","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116945088","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A predicate-based approach to dynamic protocol update in group communication 基于谓词的组通信动态协议更新方法
2008 IEEE International Symposium on Parallel and Distributed Processing Pub Date : 2008-04-14 DOI: 10.1109/IPDPS.2008.4536238
Olivier Rütti, A. Schiper
{"title":"A predicate-based approach to dynamic protocol update in group communication","authors":"Olivier Rütti, A. Schiper","doi":"10.1109/IPDPS.2008.4536238","DOIUrl":"https://doi.org/10.1109/IPDPS.2008.4536238","url":null,"abstract":"In this paper we study dynamic protocol updates (DPU), which consist in replacing, without interruption, a given protocol during execution. We focus especially on group communication protocols. The paper proposes a methodology to conveniently describe which protocols are correctly replaced by a given DPU algorithm. More precisely, our methodology characterizes DPU algorithms by a set of inference rules. To validate our approach, we illustrate our methodology with a new DPU algorithm.","PeriodicalId":162608,"journal":{"name":"2008 IEEE International Symposium on Parallel and Distributed Processing","volume":"86 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116963286","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Receiver-initiated message passing over RDMA Networks 接收方发起的消息在RDMA网络上传递
2008 IEEE International Symposium on Parallel and Distributed Processing Pub Date : 2008-04-14 DOI: 10.1109/IPDPS.2008.4536262
S. Pakin
{"title":"Receiver-initiated message passing over RDMA Networks","authors":"S. Pakin","doi":"10.1109/IPDPS.2008.4536262","DOIUrl":"https://doi.org/10.1109/IPDPS.2008.4536262","url":null,"abstract":"Providing point-to-point messaging-passing semantics atop Put/Get hardware traditionally involves implementing a protocol comprising three network latencies. In this paper, we analyze the performance of an alternative implementation approach - receiver-initiated message passing - that eliminates one of the three network latencies. Performance measurements taken on the Cell Broadband Engine indicate that receiver-initiated message passing exhibits substantially lower latency than standard, sender-initiated message passing.","PeriodicalId":162608,"journal":{"name":"2008 IEEE International Symposium on Parallel and Distributed Processing","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116991669","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 60
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信