SC14: International Conference for High Performance Computing, Networking, Storage and Analysis最新文献_第6页

Efficient Sparse Matrix-Vector Multiplication on GPUs Using the CSR Storage Format 使用CSR存储格式的gpu上的高效稀疏矩阵向量乘法

SC14: International Conference for High Performance Computing, Networking, Storage and Analysis Pub Date : 2014-11-16 DOI: 10.1109/SC.2014.68

J. Greathouse, Mayank Daga

引用次数: 182

Scaling the Power Wall: A Path to Exascale 扩展功率墙:通往百亿亿级的道路

SC14: International Conference for High Performance Computing, Networking, Storage and Analysis Pub Date : 2014-11-16 DOI: 10.1109/SC.2014.73

Oreste Villa, Daniel R. Johnson, Mike O'Connor, Evgeny Bolotin, D. Nellans, J. Luitjens, Nikolai Sakharnykh, Peng Wang, P. Micikevicius, Anthony Scudiero, S. Keckler, W. Dally

{"title":"Scaling the Power Wall: A Path to Exascale","authors":"Oreste Villa, Daniel R. Johnson, Mike O'Connor, Evgeny Bolotin, D. Nellans, J. Luitjens, Nikolai Sakharnykh, Peng Wang, P. Micikevicius, Anthony Scudiero, S. Keckler, W. Dally","doi":"10.1109/SC.2014.73","DOIUrl":"https://doi.org/10.1109/SC.2014.73","url":null,"abstract":"Modern scientific discovery is driven by an insatiable demand for computing performance. The HPC community is targeting development of supercomputers able to sustain 1 ExaFlops by the year 2020 and power consumption is the primary obstacle to achieving this goal. A combination of architectural improvements, circuit design, and manufacturing technologies must provide over a 20× improvement in energy efficiency. In this paper, we present some of the progress NVIDIA Research is making toward the design of Exascale systems by tailoring features to address the scaling challenges of performance and energy efficiency. We evaluate several architectural concepts for a set of HPC applications demonstrating expected energy efficiency improvements resulting from circuit and packaging innovations such as low-voltage SRAM, low-energy signalling, and on-package memory. Finally, we discuss the scaling of these features with respect to future process technologies and provide power and performance projections for our Exascale research architecture.","PeriodicalId":275261,"journal":{"name":"SC14: International Conference for High Performance Computing, Networking, Storage and Analysis","volume":"117 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124135014","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 126

pTatin3D: High-Performance Methods for Long-Term Lithospheric Dynamics pTatin3D:长期岩石圈动力学的高性能方法

SC14: International Conference for High Performance Computing, Networking, Storage and Analysis Pub Date : 2014-11-16 DOI: 10.1109/SC.2014.28

D. May, Jed Brown, L. Pourhiet

{"title":"pTatin3D: High-Performance Methods for Long-Term Lithospheric Dynamics","authors":"D. May, Jed Brown, L. Pourhiet","doi":"10.1109/SC.2014.28","DOIUrl":"https://doi.org/10.1109/SC.2014.28","url":null,"abstract":"Simulations of long-term lithospheric deformation involve post-failure analysis of high-contrast brittle materials driven by buoyancy and processes at the free surface. Geodynamic phenomena such as subduction and continental rifting take place over millions year time scales, thus require efficient solution methods. We present pTatin3D, a geodynamics modeling package utilising the material-point-method for tracking material composition, combined with a multigrid finite-element method to solve heterogeneous, incompressible visco-plastic Stokes problems. Here we analyze the performance and algorithmic tradeoffs of pTatin3D's multigrid preconditioner. Our matrix-free geometric multigrid preconditioner trades flops for memory bandwidth to produce a time-to-solution > 2× faster than the best available methods utilising stored matrices (plagued by memory bandwidth limitations), exploits local element structure to achieve weak scaling at 30% of FPU peak on Cray XC-30, has improved dynamic range due to smaller memory footprint, and has more consistent timing and better intra-node scalability due to reduced memory-bus and cache pressure.","PeriodicalId":275261,"journal":{"name":"SC14: International Conference for High Performance Computing, Networking, Storage and Analysis","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129142533","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 62

A User-Friendly Approach for Tuning Parallel File Operations 一个用户友好的方法来调整并行文件操作

SC14: International Conference for High Performance Computing, Networking, Storage and Analysis Pub Date : 2014-11-16 DOI: 10.1109/SC.2014.24

R. McLay, D. James, Si Liu, J. Cazes, W. Barth

引用次数: 21

In-Situ Feature Extraction of Large Scale Combustion Simulations Using Segmented Merge Trees 基于分割合并树的大规模燃烧模拟现场特征提取

SC14: International Conference for High Performance Computing, Networking, Storage and Analysis Pub Date : 2014-11-16 DOI: 10.1109/SC.2014.88

Aaditya G. Landge, Valerio Pascucci, A. Gyulassy, Janine Bennett, H. Kolla, Jacqueline H. Chen, P. Bremer

{"title":"In-Situ Feature Extraction of Large Scale Combustion Simulations Using Segmented Merge Trees","authors":"Aaditya G. Landge, Valerio Pascucci, A. Gyulassy, Janine Bennett, H. Kolla, Jacqueline H. Chen, P. Bremer","doi":"10.1109/SC.2014.88","DOIUrl":"https://doi.org/10.1109/SC.2014.88","url":null,"abstract":"The ever increasing amount of data generated by scientific simulations coupled with system I/O constraints are fueling a need for in-situ analysis techniques. Of particular interest are approaches that produce reduced data representations while maintaining the ability to redefine, extract, and study features in a post-process to obtain scientific insights. This paper presents two variants of in-situ feature extraction techniques using segmented merge trees, which encode a wide range of threshold based features. The first approach is a fast, low communication cost technique that generates an exact solution but has limited scalability. The second is a scalable, local approximation that nevertheless is guaranteed to correctly extract all features up to a predefined size. We demonstrate both variants using some of the largest combustion simulations available on leadership class supercomputers. Our approach allows state-of-the-art, feature-based analysis to be performed in-situ at significantly higher frequency than currently possible and with negligible impact on the overall simulation runtime.","PeriodicalId":275261,"journal":{"name":"SC14: International Conference for High Performance Computing, Networking, Storage and Analysis","volume":"139 4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134366363","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 66

Structure Slicing: Extending Logical Regions with Fields 结构切片:用字段扩展逻辑区域

SC14: International Conference for High Performance Computing, Networking, Storage and Analysis Pub Date : 2014-11-16 DOI: 10.1109/SC.2014.74

Michael A. Bauer, Sean Treichler, Elliott Slaughter, A. Aiken

{"title":"Structure Slicing: Extending Logical Regions with Fields","authors":"Michael A. Bauer, Sean Treichler, Elliott Slaughter, A. Aiken","doi":"10.1109/SC.2014.74","DOIUrl":"https://doi.org/10.1109/SC.2014.74","url":null,"abstract":"Applications on modern supercomputers are increasingly limited by the cost of data movement, but mainstream programming systems have few abstractions for describing the structure of a program's data. Consequently, the burden of managing data movement, placement, and layout currently falls primarily upon the programmer. To address this problem we previously proposed a data model based on logical regions and described Legion, a programming system incorporating logical regions. In this paper, we present structure slicing, which incorporates fields into the logical region data model. We show that structure slicing enables Legion to automatically infer task parallelism from field non-interference, decouple the specification of data usage from layout, and reduce the overall amount of data moved. We demonstrate that structure slicing enables both strong and weak scaling of three Legion applications including S3D, a production combustion simulation that uses logical regions with thousands of fields, with speedups of up to 3.68X over a vectorized CPU-only Fortran implementation and 1.88X over an independently hand-tuned OpenACC code.","PeriodicalId":275261,"journal":{"name":"SC14: International Conference for High Performance Computing, Networking, Storage and Analysis","volume":"36 4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131327594","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 27

An Image-Based Approach to Extreme Scale in Situ Visualization and Analysis 一种基于图像的极端尺度原位可视化与分析方法

SC14: International Conference for High Performance Computing, Networking, Storage and Analysis Pub Date : 2014-11-16 DOI: 10.1109/SC.2014.40

J. Ahrens, S. Jourdain, P. O’leary, J. Patchett, D. Rogers, M. Petersen

引用次数: 191

Metascalable Quantum Molecular Dynamics Simulations of Hydrogen-on-Demand 按需氢的元可伸缩量子分子动力学模拟

SC14: International Conference for High Performance Computing, Networking, Storage and Analysis Pub Date : 2014-11-16 DOI: 10.1109/SC.2014.59

K. Nomura, R. Kalia, A. Nakano, P. Vashishta, K. Shimamura, F. Shimojo, Manaschai Kunaseth, P. Messina, N. A. Romero

{"title":"Metascalable Quantum Molecular Dynamics Simulations of Hydrogen-on-Demand","authors":"K. Nomura, R. Kalia, A. Nakano, P. Vashishta, K. Shimamura, F. Shimojo, Manaschai Kunaseth, P. Messina, N. A. Romero","doi":"10.1109/SC.2014.59","DOIUrl":"https://doi.org/10.1109/SC.2014.59","url":null,"abstract":"We enabled an unprecedented scale of quantum molecular dynamics simulations through algorithmic innovations. A new lean divide-and-conquer density functional theory algorithm significantly reduces the prefactor of the O(N) computational cost based on complexity and error analyses. A globally scalable and locally fast solver hybridizes a global real-space multigrid with local plane-wave bases. The resulting weak-scaling parallel efficiency was 0.984 on 786,432 IBM Blue Gene/Q cores for a 50.3 million-atom (39.8 trillion degrees-of-freedom) system. The time-to-solution was 60-times less than the previous state-of-the art, owing to enhanced strong scaling by hierarchical band-space domain decomposition and high floating-point performance (50.5% of the peak). Production simulation involving 16,661 atoms for 21,140 time steps (or 129,208 self-consistent-field iterations) revealed a novel nanostructural design for on-demand hydrogen production from water, advancing renewable energy technologies. This metascalable (or \"design once, scale on new architectures\") algorithm is used for broader applications within a recently proposed divide-conquer-recombine paradigm.","PeriodicalId":275261,"journal":{"name":"SC14: International Conference for High Performance Computing, Networking, Storage and Analysis","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128849376","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 14

RAHTM: Routing Algorithm Aware Hierarchical Task Mapping 路由算法感知分层任务映射

SC14: International Conference for High Performance Computing, Networking, Storage and Analysis Pub Date : 2014-11-16 DOI: 10.1109/SC.2014.32

Ahmed H. Abdel-Gawad, Mithuna Thottethodi, A. Bhatele

引用次数: 20

Finding Constant from Change: Revisiting Network Performance Aware Optimizations on IaaS Clouds 从变化中寻找常数:重新审视IaaS云上的网络性能感知优化

SC14: International Conference for High Performance Computing, Networking, Storage and Analysis Pub Date : 2014-11-16 DOI: 10.1109/SC.2014.85

Yifan Gong, Bingsheng He, Dan Li

{"title":"Finding Constant from Change: Revisiting Network Performance Aware Optimizations on IaaS Clouds","authors":"Yifan Gong, Bingsheng He, Dan Li","doi":"10.1109/SC.2014.85","DOIUrl":"https://doi.org/10.1109/SC.2014.85","url":null,"abstract":"Network performance aware optimizations have long been an effective approach to optimizing distributed applications on traditional network environments. However, the assumptions of network topology or direct use of several measurements of pair-wise network performance for optimizations are no longer valid on IaaS clouds. Virtualization hides network topology from users, and direct use of network performance measurements may not represent long-term performance. To enable existing network performance aware optimizations on IaaS clouds, we propose to decouple constant component from dynamic network performance while minimizing the difference by a mathematical method called RPCA (Robust Principal Component Analysis). We use the constant component to guide network performance aware optimizations and demonstrate the efficiency of our approach by adopting network aware optimizations for collective communications of MPI and generic topology mapping as well as two real-world applications, N-body and conjugate gradient (CG). Our experiments on Amazon EC2 and simulations demonstrate significant performance improvement on guiding the optimizations.","PeriodicalId":275261,"journal":{"name":"SC14: International Conference for High Performance Computing, Networking, Storage and Analysis","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125721396","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 19