SC14: International Conference for High Performance Computing, Networking, Storage and Analysis最新文献

IndexFS: Scaling File System Metadata Performance with Stateless Caching and Bulk Insertion IndexFS:通过无状态缓存和大容量插入扩展文件系统元数据性能

SC14: International Conference for High Performance Computing, Networking, Storage and Analysis Pub Date : 2014-11-16 DOI: 10.1109/SC.2014.25

Kai Ren, Qing Zheng, Swapnil Patil, Garth A. Gibson

{"title":"IndexFS: Scaling File System Metadata Performance with Stateless Caching and Bulk Insertion","authors":"Kai Ren, Qing Zheng, Swapnil Patil, Garth A. Gibson","doi":"10.1109/SC.2014.25","DOIUrl":"https://doi.org/10.1109/SC.2014.25","url":null,"abstract":"The growing size of modern storage systems is expected to exceed billions of objects, making metadata scalability critical to overall performance. Many existing distributed file systems only focus on providing highly parallel fast access to file data, and lack a scalable metadata service. In this paper, we introduce a middleware design called Index FS that adds support to existing file systems such as PVFS, Lustre, and HDFS for scalable high-performance operations on metadata and small files. Index FS uses a table-based architecture that incrementally partitions the namespace on a per-directory basis, preserving server and disk locality for small directories. An optimized log-structured layout is used to store metadata and small files efficiently. We also propose two client-based storm free caching techniques: bulk namespace insertion for creation intensive workloads such as N-N check pointing, and stateless consistent metadata caching for hot spot mitigation. By combining these techniques, we have demonstrated Index FS scaled to 128 metadata servers. Experiments show our out-of-core metadata throughput out-performing existing solutions such as PVFS, Lustre, and HDFS by 50% to two orders of magnitude.","PeriodicalId":275261,"journal":{"name":"SC14: International Conference for High Performance Computing, Networking, Storage and Analysis","volume":"41 5","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120864000","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 146

Domain Decomposition Preconditioners for Communication-Avoiding Krylov Methods on a Hybrid CPU/GPU Cluster CPU/GPU混合集群中避免通信的Krylov方法的域分解前置条件

SC14: International Conference for High Performance Computing, Networking, Storage and Analysis Pub Date : 2014-11-16 DOI: 10.1109/SC.2014.81

I. Yamazaki, S. Rajamanickam, E. Boman, M. Hoemmen, M. Heroux, S. Tomov

{"title":"Domain Decomposition Preconditioners for Communication-Avoiding Krylov Methods on a Hybrid CPU/GPU Cluster","authors":"I. Yamazaki, S. Rajamanickam, E. Boman, M. Hoemmen, M. Heroux, S. Tomov","doi":"10.1109/SC.2014.81","DOIUrl":"https://doi.org/10.1109/SC.2014.81","url":null,"abstract":"Krylov subspace projection methods are widely used iterative methods for solving large-scale linear systems of equations. Researchers have demonstrated that communication avoiding (CA) techniques can improve Krylov methods' performance on modern computers, where communication is becoming increasingly expensive compared to arithmetic operations. In this paper, we extend these studies by two major contributions. First, we present our implementation of a CA variant of the Generalized Minimum Residual (GMRES) method, called CAGMRES, for solving no symmetric linear systems of equations on a hybrid CPU/GPU cluster. Our performance results on up to 120 GPUs show that CA-GMRES gives a speedup of up to 2.5x in total solution time over standard GMRES on a hybrid cluster with twelve Intel Xeon CPUs and three Nvidia Fermi GPUs on each node. We then outline a domain decomposition framework to introduce a family of preconditioners that are suitable for CA Krylov methods. Our preconditioners do not incur any additional communication and allow the easy reuse of existing algorithms and software for the sub domain solves. Experimental results on the hybrid CPU/GPU cluster demonstrate that CA-GMRES with preconditioning achieve a speedup of up to 7.4x over CAGMRES without preconditioning, and speedup of up to 1.7x over GMRES with preconditioning in total solution time. These results confirm the potential of our framework to develop a practical and effective preconditioned CA Krylov method.","PeriodicalId":275261,"journal":{"name":"SC14: International Conference for High Performance Computing, Networking, Storage and Analysis","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124938877","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 29

Orion: Scaling Genomic Sequence Matching with Fine-Grained Parallelization 猎户座:缩放基因组序列匹配与细粒度并行

SC14: International Conference for High Performance Computing, Networking, Storage and Analysis Pub Date : 2014-11-16 DOI: 10.1109/SC.2014.42

K. Mahadik, S. Chaterji, Bowen Zhou, Milind Kulkarni, S. Bagchi

引用次数: 22

Pardicle: Parallel Approximate Density-Based Clustering 粒子:并行近似密度聚类

SC14: International Conference for High Performance Computing, Networking, Storage and Analysis Pub Date : 2014-11-16 DOI: 10.1109/SC.2014.51

Md. Mostofa Ali Patwary, N. Satish, N. Sundaram, F. Manne, S. Habib, P. Dubey

{"title":"Pardicle: Parallel Approximate Density-Based Clustering","authors":"Md. Mostofa Ali Patwary, N. Satish, N. Sundaram, F. Manne, S. Habib, P. Dubey","doi":"10.1109/SC.2014.51","DOIUrl":"https://doi.org/10.1109/SC.2014.51","url":null,"abstract":"DBSCAN is a widely used is density-based clustering algorithm for particle data well-known for its ability to isolate arbitrarily-shaped clusters and to filter noise data. The algorithm is super-linear (O(nlogn)) and computationally expensive for large datasets. Given the need for speed, we propose a fast heuristic algorithm for DBSCAN using density based sampling, which performs equally well in quality compared to exact algorithms, but is more than an order of magnitude faster. Our experiments on astrophysics and synthetic massive datasets (8.5 billion numbers) shows that our approximate algorithm is up to 56× faster than exact algorithms with almost identical quality (Omega-Index ≥ 0.99). We develop a new parallel DBSCAN algorithm, which uses dynamic partitioning to improve load balancing and locality. We demonstrate near-linear speedup on shared memory (15× using 16 cores, single node Intel® Xeon® processor) and distributed memory (3917× using 4096 cores, multinode) computers, with 2× additional performance improvement using Intel® Xeon Phi™ coprocessors. Additionally, existing exact algorithms can achieve up to 3.4 times speedup using dynamic partitioning.","PeriodicalId":275261,"journal":{"name":"SC14: International Conference for High Performance Computing, Networking, Storage and Analysis","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128530323","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 36

High-Productivity Framework on GPU-Rich Supercomputers for Operational Weather Prediction Code ASUCA 操作天气预报代码ASUCA在gpu丰富的超级计算机上的高生产力框架

SC14: International Conference for High Performance Computing, Networking, Storage and Analysis Pub Date : 2014-11-16 DOI: 10.1109/SC.2014.26

T. Shimokawabe, T. Aoki, Naoyuki Onodera

{"title":"High-Productivity Framework on GPU-Rich Supercomputers for Operational Weather Prediction Code ASUCA","authors":"T. Shimokawabe, T. Aoki, Naoyuki Onodera","doi":"10.1109/SC.2014.26","DOIUrl":"https://doi.org/10.1109/SC.2014.26","url":null,"abstract":"The weather prediction code demands large computational performance to achieve fast and high-resolution simulations. Skillful programming techniques are required for obtaining good parallel efficiency on GPU supercomputers. Our framework-based weather prediction code ASUCA has achieved good scalability with hiding complicated implementation and optimizations required for distributed GPUs, contributing to increasing the maintainability, ASUCA is a next-generation high resolution meso-scale atmospheric model being developed by the Japan Meteorological Agency. Our framework automatically translates user-written stencil functions that update grid points and generates both GPU and CPU codes. User-written codes are parallelized by MPI with intra-node GPU peer-to-peer direct access. These codes can easily utilize optimizations such as overlapping technique to hide communication overhead by computation. Our simulations on the GPU-rich supercomputer TSUBAME 2.5 at the Tokyo Institute of Technology have demonstrated good strong and weak scalability achieving 209.6 TFlops in single precision for our largest model using 4,108 NVIDIA K20X GPUs.","PeriodicalId":275261,"journal":{"name":"SC14: International Conference for High Performance Computing, Networking, Storage and Analysis","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128287907","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 24

24.77 Pflops on a Gravitational Tree-Code to Simulate the Milky Way Galaxy with 18600 GPUs 24.77 pflop在引力树上-代码模拟银河系与18600 gpu

SC14: International Conference for High Performance Computing, Networking, Storage and Analysis Pub Date : 2014-11-16 DOI: 10.1109/SC.2014.10

J. Bédorf, E. Gaburov, M. Fujii, Keigo Nitadori, T. Ishiyama, S. Zwart

引用次数: 57

Physics-Based Urban Earthquake Simulation Enhanced by 10.7 BlnDOF × 30 K Time-Step Unstructured FE Non-Linear Seismic Wave Simulation 10.7 BlnDOF × 30k时间步非结构有限元非线性地震波模拟增强的基于物理的城市地震模拟

SC14: International Conference for High Performance Computing, Networking, Storage and Analysis Pub Date : 2014-11-16 DOI: 10.1109/SC.2014.7

T. Ichimura, K. Fujita, Seizo Tanaka, M. Hori, Lalith Maddegedara, Y. Shizawa, Hiroshi Kobayashi

引用次数: 56

Quantitatively Modeling Application Resilience with the Data Vulnerability Factor 基于数据脆弱性因子的应用弹性定量建模

SC14: International Conference for High Performance Computing, Networking, Storage and Analysis Pub Date : 2014-11-16 DOI: 10.1109/SC.2014.62

Li Yu, Dong Li, Sparsh Mittal, J. Vetter

{"title":"Quantitatively Modeling Application Resilience with the Data Vulnerability Factor","authors":"Li Yu, Dong Li, Sparsh Mittal, J. Vetter","doi":"10.1109/SC.2014.62","DOIUrl":"https://doi.org/10.1109/SC.2014.62","url":null,"abstract":"Recent strategies to improve the observable resilience of applications require the ability to classify vulnerabilities of individual components (e.g., Data structures, instructions) of an application, and then, selectively apply protection mechanisms to its critical components. To facilitate this vulnerability classification, it is important to have accurate, quantitative techniques that can be applied uniformly and automatically across real-world applications. Traditional methods cannot effectively quantify vulnerability, because they lack a holistic view to examine system resilience, and come with prohibitive evaluation costs. In this paper, we introduce a data-driven, practical methodology to analyze these application vulnerabilities using a novel resilience metric: the data vulnerability factor (DVF). DVF integrates knowledge from both the application and target hardware into the calculation. To calculate DVF, we extend a performance modeling language to provide a structured, fast modeling solution. We evaluate our methodology on six representative computational kernels, we demonstrate the significance of DVF by quantifying the impact of algorithm optimization on vulnerability, and by quantifying the effectiveness of specific hardware protection mechanisms.","PeriodicalId":275261,"journal":{"name":"SC14: International Conference for High Performance Computing, Networking, Storage and Analysis","volume":"319 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130149442","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 35

Omnisc'IO: A Grammar-Based Approach to Spatial and Temporal I/O Patterns Prediction Omnisc'IO:基于语法的空间和时间I/O模式预测方法

SC14: International Conference for High Performance Computing, Networking, Storage and Analysis Pub Date : 2014-11-16 DOI: 10.1109/SC.2014.56

Matthieu Dorier, Shadi Ibrahim, Gabriel Antoniu, R. Ross

引用次数: 43

Petascale High Order Dynamic Rupture Earthquake Simulations on Heterogeneous Supercomputers 在异构超级计算机上的千万亿次高阶动态破裂地震模拟

SC14: International Conference for High Performance Computing, Networking, Storage and Analysis Pub Date : 2014-11-16 DOI: 10.1109/SC.2014.6

A. Heinecke, Alexander Breuer, Sebastian Rettenberger, M. Bader, A. Gabriel, C. Pelties, A. Bode, W. Barth, Xiangke Liao, K. Vaidyanathan, M. Smelyanskiy, P. Dubey

{"title":"Petascale High Order Dynamic Rupture Earthquake Simulations on Heterogeneous Supercomputers","authors":"A. Heinecke, Alexander Breuer, Sebastian Rettenberger, M. Bader, A. Gabriel, C. Pelties, A. Bode, W. Barth, Xiangke Liao, K. Vaidyanathan, M. Smelyanskiy, P. Dubey","doi":"10.1109/SC.2014.6","DOIUrl":"https://doi.org/10.1109/SC.2014.6","url":null,"abstract":"We present an end-to-end optimization of the innovative Arbitrary high-order DERivative Discontinuous Galerkin (ADER-DG) software SeisSol targeting Intel® Xeon Phi coprocessor platforms, achieving unprecedented earthquake model complexity through coupled simulation of full frictional sliding and seismic wave propagation. SeisSol exploits unstructured meshes to flexibly adapt for complicated geometries in realistic geological models. Seismic wave propagation is solved simultaneously with earthquake faulting in a multiphysical manner leading to a heterogeneous solver structure. Our architecture aware optimizations deliver up to 50% of peak performance, and introduce an efficient compute-communication overlapping scheme shadowing the multiphysics computations. SeisSol delivers near-optimal weak scaling, reaching 8.6 DP-PFLOPS on 8,192 nodes of the Tianhe-2 supercomputer. Our performance model projects reaching 18 -- 20 DP-PFLOPS on the full Tianhe-2 machine. Of special relevance to modern civil engineering needs, our pioneering simulation of the 1992 Landers earthquake shows highly detailed rupture evolution and ground motion at frequencies up to 10 Hz.","PeriodicalId":275261,"journal":{"name":"SC14: International Conference for High Performance Computing, Networking, Storage and Analysis","volume":"680 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127584541","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 119