Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis最新文献_第7页

Beyond homogeneous decomposition: scaling long-range forces on Massively Parallel Systems 超越同质分解:大规模并行系统上的缩放远程力

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis Pub Date : 2009-11-14 DOI: 10.1145/1654059.1654121

D. Richards, J. Glosli, B. Chan, M. Dorr, E. Draeger, J. Fattebert, W. D. Krauss, T. Spelce, F. Streitz, M. Surh, John A. Gunnels

{"title":"Beyond homogeneous decomposition: scaling long-range forces on Massively Parallel Systems","authors":"D. Richards, J. Glosli, B. Chan, M. Dorr, E. Draeger, J. Fattebert, W. D. Krauss, T. Spelce, F. Streitz, M. Surh, John A. Gunnels","doi":"10.1145/1654059.1654121","DOIUrl":"https://doi.org/10.1145/1654059.1654121","url":null,"abstract":"With supercomputers anticipated to expand from thousands to millions of cores, one of the challenges facing scientists is how to effectively utilize this ever-increasing number. We report here an approach that creates a heterogeneous decomposition by partitioning effort according to the scaling properties of the component algorithms. We demonstrate our strategy by developing a capability to model hot dense plasma. We have performed benchmark calculations ranging from millions to billions of charged particles, including a 2.8 billion particle simulation that achieved 259.9 TFlop/s (26% of peak performance) on the 294,912 cpu JUGENE computer at the Jülich Supercomputing Centre in Germany. With this unprecedented simulation capability we have begun an investigation of plasma fusion physics under conditions where both theory and experiment are lacking-in the strongly-coupled regime as the plasma begins to burn. Our strategy is applicable to other problems involving long-range forces (i.e., biological or astrophysical simulations). We believe that the flexible heterogeneous decomposition approach demonstrated here will allow many problems to scale across current and next-generation machines.","PeriodicalId":371415,"journal":{"name":"Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130937996","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 32

A microdriver architecture for error correcting codes inside the Linux kernel Linux内核中用于纠错代码的微驱动架构

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis Pub Date : 2009-11-14 DOI: 10.1145/1654059.1654095

A. Brinkmann, D. Eschweiler

{"title":"A microdriver architecture for error correcting codes inside the Linux kernel","authors":"A. Brinkmann, D. Eschweiler","doi":"10.1145/1654059.1654095","DOIUrl":"https://doi.org/10.1145/1654059.1654095","url":null,"abstract":"Coding tasks, such as encryption of data or the generation of failure-tolerant codes, belong to the most computationaly expensive tasks inside the Linux kernel. Their integration into the kernel enables the user to transparently access these functionalities, encrypted hard disks can be used in the same way as unencrypted ones. Nevertheless, Linux as a monolithic kernel is not prepared to support these expensive tasks by accessing modern hardware accelerators, like graphics processing units (GPUs), as the corresponding accelerator libraries, like the CUDA-API for NVIDIA GPUs, only offer user-space APIs. Linux is often used in conjunction with parallel file systems in high performance cluster environments and the tremendous storage growth in these environments leads to the requirement of multi-error correcting codes. Parallel file systems, which often run on a storage cluster, are required to store the calculated results without huge waiting times. Whereas the frontend of such a storage cluster can be build with standard PCs, it is in contrast nearly impossible to build a capable RAID backend with end user hardware up to now. This work investigated the potential of graphic cards for such coding applications like RAID in the Linux kernel. For this purpose, a special microdriver concept (Barracuda) has been designed that can be integrated into Linux without changing kernel APIs. For the investigation of the performance of this concept, the Linux RAID 6-system and the applied Reed-Solomon code have been exemplary extended and studied. The resulting measurements outline opportunities and limitations of our microdriver concept. On the one hand, the concept achieves a speed-up of 72 for complex, 8-failure correcting codes, while no additional speed-up can be generated for simpler, 2-error correcting codes. An example application for Barracuda could therefore be replacement of expensive RAID systems in cluster storage environments.","PeriodicalId":371415,"journal":{"name":"Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114392709","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

FACT: fast communication trace collection for parallel applications through program slicing 事实:通过程序切片实现并行应用程序的快速通信跟踪收集

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis Pub Date : 2009-11-14 DOI: 10.1145/1654059.1654087

Jidong Zhai, Tianwei Sheng, Jiangzhou He, Wenguang Chen, Weimin Zheng

{"title":"FACT: fast communication trace collection for parallel applications through program slicing","authors":"Jidong Zhai, Tianwei Sheng, Jiangzhou He, Wenguang Chen, Weimin Zheng","doi":"10.1145/1654059.1654087","DOIUrl":"https://doi.org/10.1145/1654059.1654087","url":null,"abstract":"A proper understanding of communication patterns of parallel applications is important to optimize application performance and design better communication subsystems. Communication patterns can be obtained by analyzing communication traces. However, existing approaches to generate communication traces need to execute the entire parallel applications on full-scale systems that are time-consuming and expensive. In this paper, we propose a novel technique, called Fact, which can perform FAst Communication Trace collection for large-scale parallel applications on small-scale systems. Our idea is to reduce the original program to obtain a program slice through static analysis, and to execute the program slice to acquire the communication traces. The program slice preserves all the variables and statements in the original program relevant to spatial and volume communication attributes. Our idea is based on an observation that most computation and message contents in message-passing parallel applications are independent of these attributes, and therefore can be removed from the programs for the purpose of communication trace collection. We have implemented Fact and evaluated it with NPB programs and Sweep3D. The results show that Fact can preserve the spatial and volume communication attributes of original programs and reduce resource consumptions by two orders of magnitude in most cases. For example, Fact collects the communication traces of the Sweep3D for 512 processes on a 4-node (32 cores) platform in just 6.79 seconds, consuming 1.25 GB memory, while the original program takes 256.63 seconds and consumes 213.83 GB memory on a 32-node (512 cores) platform. Finally, we present an application of Fact.","PeriodicalId":371415,"journal":{"name":"Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115820993","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 30

Flexible cache error protection using an ECC FIFO 灵活的缓存错误保护使用ECC FIFO

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis Pub Date : 2009-11-14 DOI: 10.1145/1654059.1654109

D. Yoon, M. Erez

引用次数: 28

Scalable computation of streamlines on very large datasets 在非常大的数据集上进行流线的可扩展计算

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis Pub Date : 2009-11-14 DOI: 10.1145/1654059.1654076

D. Pugmire, H. Childs, C. Garth, Sean Ahern, G. Weber

引用次数: 87

Compact multi-dimensional kernel extraction for register tiling 紧凑的多维核提取寄存器平铺

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis Pub Date : 2009-11-14 DOI: 10.1145/1654059.1654105

Lakshminarayanan Renganarayanan, Uday Bondhugula, Salem Derisavi, A. Eichenberger, K. O'Brien

引用次数: 15

Future scaling of processor-memory interfaces 处理器-内存接口的未来扩展

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis Pub Date : 2009-11-14 DOI: 10.1145/1654059.1654102

Jung Ho Ahn, N. Jouppi, C. Kozyrakis, J. Leverich, R. Schreiber

引用次数: 121

Millisecond-scale molecular dynamics simulations on Anton 在安东身上进行毫秒级分子动力学模拟

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis Pub Date : 1900-01-01 DOI: 10.1145/1654059.1654126

D. Shaw, R. Dror, J. Salmon, J. P. Grossman, Kenneth M. Mackenzie, Joseph A. Bank, C. Young, Martin M. Deneroff, Brannon Batson, K. Bowers, Edmond Chow, M. Eastwood, D. Ierardi, J. L. Klepeis, J. Kuskin, Richard H. Larson, K. Lindorff-Larsen, P. Maragakis, Mark A. Moraes, S. Piana, Yibing Shan, Brian Towles

{"title":"Millisecond-scale molecular dynamics simulations on Anton","authors":"D. Shaw, R. Dror, J. Salmon, J. P. Grossman, Kenneth M. Mackenzie, Joseph A. Bank, C. Young, Martin M. Deneroff, Brannon Batson, K. Bowers, Edmond Chow, M. Eastwood, D. Ierardi, J. L. Klepeis, J. Kuskin, Richard H. Larson, K. Lindorff-Larsen, P. Maragakis, Mark A. Moraes, S. Piana, Yibing Shan, Brian Towles","doi":"10.1145/1654059.1654126","DOIUrl":"https://doi.org/10.1145/1654059.1654126","url":null,"abstract":"Anton is a recently completed special-purpose supercomputer designed for molecular dynamics (MD) simulations of biomolecular systems. The machine's specialized hardware dramatically increases the speed of MD calculations, making possible for the first time the simulation of biologicl molecules at an atomic level of detail for periods on the order of a millisecond---about two orders of magnitude beyond the previous state of the art. Anton is now running simulations on a timescale at which many critically important, but poorly understood phenomena are known to occur, allowing the observation of aspects of protein dynamics that were previously inaccessible to both computational and experimental study. Here, we report Anton's performance when executing actual MD simulations whose accuracy has been validated against both existing MD software and experimental observations. We also discuss the manner in which novel algorithms have been coordinated with Anton's co-designed, application-specific hardware to achieve these results.","PeriodicalId":371415,"journal":{"name":"Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125858991","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 101