2008 IEEE International Symposium on Parallel and Distributed Processing最新文献

We have it easy, but do we have it right? 我们做得很容易，但我们做得对吗?

2008 IEEE International Symposium on Parallel and Distributed Processing Pub Date : 2008-09-30 DOI: 10.1109/IISWC.2008.4636085

Amer Diwan

引用次数: 0

Workshop 22 introduction: Workshop on Large-Scale Parallel Processing - LSPP 工作坊22简介:大规模并行处理工作坊

2008 IEEE International Symposium on Parallel and Distributed Processing Pub Date : 2008-09-10 DOI: 10.1109/IPDPS.2008.4536110

D. Kerbyson, R. Rajamony, C. Weems, J. Baker, H. Siegel, G. Almási, T. Boku, B. Chapman, H. Dietz, D. Katz, J. Levesque, J. Michalakes, C. Mendes, B. Mohr, Stathis Papaefstathiou, Michael Scherger, R. Walker, H. Wasserman, G. Wellein, P. Worley

{"title":"Workshop 22 introduction: Workshop on Large-Scale Parallel Processing - LSPP","authors":"D. Kerbyson, R. Rajamony, C. Weems, J. Baker, H. Siegel, G. Almási, T. Boku, B. Chapman, H. Dietz, D. Katz, J. Levesque, J. Michalakes, C. Mendes, B. Mohr, Stathis Papaefstathiou, Michael Scherger, R. Walker, H. Wasserman, G. Wellein, P. Worley","doi":"10.1109/IPDPS.2008.4536110","DOIUrl":"https://doi.org/10.1109/IPDPS.2008.4536110","url":null,"abstract":"The workshop on Large-Scale Parallel Processing is a forum that focuses on computer systems that utilize thousands of processors and beyond. This is a very active area given the goals by many worldwide to enhance science-by-simulation by installing large-scale peta-flop systems at the start of the next decade. Large-scale systems, referred to by some as extreme-scale and Ultra-scale, have many important research aspects that need detailed examination in order for their effective design, deployment, and utilization to take place. These include handling the substantial increase in multi-core on a chip, the ensuing interconnection hierarchy, communication, and synchronization mechanisms. The workshop aims to bring together researchers from different communities working on challenging problems in this area for a dynamic exchange of ideas. Work at early stages of development as well as work that has been demonstrated in practice is equally welcome.","PeriodicalId":162608,"journal":{"name":"2008 IEEE International Symposium on Parallel and Distributed Processing","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116785507","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A lightweight scalable I/O utility for optimizing High-End Computing applications 用于优化高端计算应用程序的轻量级可扩展I/O实用程序

2008 IEEE International Symposium on Parallel and Distributed Processing Pub Date : 2008-06-03 DOI: 10.1109/IPDPS.2008.4536462

Shujia Zhou, Bruce H. Van Aartser, T. Clune

引用次数: 8

Development of laboratory and computational techniques for optimal and quantitative understanding of cellular metabolic networks 开发实验室和计算技术，以优化和定量理解细胞代谢网络

2008 IEEE International Symposium on Parallel and Distributed Processing Pub Date : 2008-06-03 DOI: 10.1109/IPDPS.2008.4536416

Xiao-Jiang Feng, J. Rabinowitz, H. Rabitz

引用次数: 0

ECG segmentation in a body sensor network using Hidden Markov Models 基于隐马尔可夫模型的身体传感器网络心电分割

2008 IEEE International Symposium on Parallel and Distributed Processing Pub Date : 2008-06-01 DOI: 10.1109/ISSMDBS.2008.4575075

Huaming Li, Jindong Tan

引用次数: 0

Steps toward activity-oriented computing 面向活动的计算的步骤

2008 IEEE International Symposium on Parallel and Distributed Processing Pub Date : 2008-04-14 DOI: 10.1109/IPDPS.2008.4536432

J. Sousa, V. Poladian, D. Garlan, B. Schmerl, P. Steenkiste

引用次数: 2

Avoiding communication in sparse matrix computations 稀疏矩阵计算中避免通信

2008 IEEE International Symposium on Parallel and Distributed Processing Pub Date : 2008-04-14 DOI: 10.1109/IPDPS.2008.4536305

J. Demmel, M. Hoemmen, M. Mohiyuddin, K. Yelick

{"title":"Avoiding communication in sparse matrix computations","authors":"J. Demmel, M. Hoemmen, M. Mohiyuddin, K. Yelick","doi":"10.1109/IPDPS.2008.4536305","DOIUrl":"https://doi.org/10.1109/IPDPS.2008.4536305","url":null,"abstract":"The performance of sparse iterative solvers is typically limited by sparse matrix-vector multiplication, which is itself limited by memory system and network performance. As the gap between computation and communication speed continues to widen, these traditional sparse methods will suffer. In this paper we focus on an alternative building block for sparse iterative solvers, the \"matrix powers kernel\" [x, Ax, A2x, ..., Akx], and show that by organizing computations around this kernel, we can achieve near-minimal communication costs. We consider communication very broadly as both network communication in parallel code and memory hierarchy access in sequential code. In particular, we introduce a parallel algorithm for which the number of messages (total latency cost) is independent of the power k, and a sequential algorithm, that reduces both the number and volume of accesses, so that it is independent of k in both latency and bandwidth costs. This is part of a larger project to develop \"communication-avoiding Krylov subspace methods,\" which also addresses the numerical issues associated with these methods. Our algorithms work for general sparse matrices that \"partition well\". We introduce parallel performance models of matrices arising from 2D and 3D problems and show predicted speedups over a conventional algorithm of up to 7times on a petaflop-scale machine and up to 22times on computation across the grid. Analogous sequential performance models of the same problems predict speedups over a conventional algorithm of up to 10times on an out-of-core implementation, and up to 2.5times when we use our ideas to reduce off-chip latency and bandwidth to DRAM. Finally, we validate the model on an out-of-core sequential implementation and measured a speedup of over 3times, which is close to the predicted speedup.","PeriodicalId":162608,"journal":{"name":"2008 IEEE International Symposium on Parallel and Distributed Processing","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115646727","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 151

A software-hardware hybrid steering mechanism for clustered microarchitectures 一种用于集群微架构的软硬件混合转向机制

2008 IEEE International Symposium on Parallel and Distributed Processing Pub Date : 2008-04-14 DOI: 10.1109/IPDPS.2008.4536229

Qiong Cai, J. M. Codina, José González, Antonio González

{"title":"A software-hardware hybrid steering mechanism for clustered microarchitectures","authors":"Qiong Cai, J. M. Codina, José González, Antonio González","doi":"10.1109/IPDPS.2008.4536229","DOIUrl":"https://doi.org/10.1109/IPDPS.2008.4536229","url":null,"abstract":"Clustered microarchitectures provide a promising paradigm to solve or alleviate the problems of increasing microprocessor complexity and wire delays. High- performance out-of-order processors rely on hardware-only steering mechanisms to achieve balanced workload distribution among clusters. However, the additional steering logic results in a significant increase on complexity, which actually decreases the benefits of the clustered design. In this paper, we address this complexity issue and present a novel software-hardware hybrid steering mechanism for out-of-order processors. The proposed software- hardware cooperative scheme makes use of the concept of virtual clusters. Instructions are distributed to virtual clusters at compile time using static properties of the program such as data dependences. Then, at runtime, virtual clusters are mapped into physical clusters by considering workload information. Experiments using SPEC CPU2000 benchmarks show that our hybrid approach can achieve almost the same performance as a state-of-the-art hardware-only steering scheme, while requiring low hardware complexity. In addition, the proposed mechanism outperforms state-of-the-art software-only steering mechanisms by 5% and 10% on average for 2-cluster and 4-cluster machines, respectively.","PeriodicalId":162608,"journal":{"name":"2008 IEEE International Symposium on Parallel and Distributed Processing","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127184344","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Improving software reliability and productivity via mining program source code 通过挖掘程序源代码提高软件的可靠性和生产力

2008 IEEE International Symposium on Parallel and Distributed Processing Pub Date : 2008-04-14 DOI: 10.1109/IPDPS.2008.4536384

Tao Xie, Mithun P. Acharya, Suresh Thummalapenta, Kunal Taneja

{"title":"Improving software reliability and productivity via mining program source code","authors":"Tao Xie, Mithun P. Acharya, Suresh Thummalapenta, Kunal Taneja","doi":"10.1109/IPDPS.2008.4536384","DOIUrl":"https://doi.org/10.1109/IPDPS.2008.4536384","url":null,"abstract":"A software system interacts with third-party libraries through various APIs. Insufficient documentation and constant refactorings of third-party libraries make API library reuse difficult and error prone. Using these library APIs often needs to follow certain usage patterns. These patterns aid developers in addressing commonly faced programming problems such as what checks should precede or follow API calls, how to use a given set of APIs for a given task, or what API method sequence should be used to obtain one object from another. Ordering rules (specifications) also exist between APIs, and these rules govern the secure and robust operation of the system using these APIs. These patterns and rules may not be well documented by the API developers. Furthermore, usage patterns and specifications might change with library refactorings, requiring changes in the software that reuse the library. To address these issues, we develop novel techniques (and their supporting tools) based on mining source code, assisting developers in productively reusing third party libraries to build reliable and secure software.","PeriodicalId":162608,"journal":{"name":"2008 IEEE International Symposium on Parallel and Distributed Processing","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125184595","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13

MVAPICH-Aptus: Scalable high-performance multi-transport MPI over InfiniBand MVAPICH-Aptus:基于ib的可扩展高性能多传输MPI

2008 IEEE International Symposium on Parallel and Distributed Processing Pub Date : 2008-04-14 DOI: 10.1109/IPDPS.2008.4536283

Matthew J. Koop, T. Jones, D. Panda

{"title":"MVAPICH-Aptus: Scalable high-performance multi-transport MPI over InfiniBand","authors":"Matthew J. Koop, T. Jones, D. Panda","doi":"10.1109/IPDPS.2008.4536283","DOIUrl":"https://doi.org/10.1109/IPDPS.2008.4536283","url":null,"abstract":"The need for computational cycles continues to exceed availability, driving commodity clusters to increasing scales. With upcoming clusters containing tens-of-thousands of cores, InfiniBand is a popular interconnect on these clusters, due to its low latency (1.5 musec) and high bandwidth (1.5 GB/sec). Since most scientific applications running on these clusters are written using the message passing interface (MPI) as the parallel programming model, the MPI library plays a key role in the performance and scalability of the system. Nearly all MPIs implemented over InfiniBand currently use the reliable connection (RC) transport of InfiniBand to implement message passing. Using this transport exclusively, however, has been shown to potentially reach a memory footprint of over 200 MB/task at 16 K tasks for the MPI library. The Unreliable Datagram (UD) transport, however, offers higher scalability, but at the cost of medium and large message performance. In this paper we present a multi-transport MPI design, MVAPICH-Aptus, that uses both the RC and UD transports of InfiniBand to deliver scalability and performance higher than that of a single-transport MPI design. Evaluation of our hybrid design on 512 cores shows a 12% improvement over an RC-based design and 4% better than a UD-based design for the SMG2000 application benchmark. In addition, for the molecular dynamics application NAMD we show a 10% improvement over an RC-only design. To the best of our knowledge, this is the first such analysis and design of optimized MPI using both UD and RC.","PeriodicalId":162608,"journal":{"name":"2008 IEEE International Symposium on Parallel and Distributed Processing","volume":"208 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125909075","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 50