2020 IEEE 27th International Conference on High Performance Computing, Data, and Analytics (HiPC)最新文献_第2页

27th IEEE International Conference on High Performance Computing, Data, and Analytics (HiPC 2020) Technical program 第27届IEEE高性能计算，数据和分析国际会议(HiPC 2020)技术方案

2020 IEEE 27th International Conference on High Performance Computing, Data, and Analytics (HiPC) Pub Date : 2020-12-01 DOI: 10.1109/hipc50609.2020.00013

引用次数: 0

Understanding HPC Application I/O Behavior Using System Level Statistics 使用系统级统计理解HPC应用程序I/O行为

2020 IEEE 27th International Conference on High Performance Computing, Data, and Analytics (HiPC) Pub Date : 2020-12-01 DOI: 10.1109/HiPC50609.2020.00034

A. Paul, Olaf Faaland, A. Moody, Elsa Gonsiorowski, K. Mohror, A. Butt

{"title":"Understanding HPC Application I/O Behavior Using System Level Statistics","authors":"A. Paul, Olaf Faaland, A. Moody, Elsa Gonsiorowski, K. Mohror, A. Butt","doi":"10.1109/HiPC50609.2020.00034","DOIUrl":"https://doi.org/10.1109/HiPC50609.2020.00034","url":null,"abstract":"The processor performance of high performance computing (HPC) systems is increasing at a much higher rate than storage performance. This imbalance leads to I/O performance bottlenecks in massively parallel HPC applications. Therefore, there is a need for improvements in storage and file system designs to meet the ever-growing I/O needs of HPC applications. Storage and file system designers require a deep understanding of how HPC application I/O behavior affects current storage system installations in order to improve them. In this work, we contribute to this understanding using application-agnostic file system statistics gathered on compute nodes as well as metadata and object storage file system servers. We analyze file system statistics of more than 4 million jobs over a period of three years on two systems at Lawrence Livermore National Laboratory that include a 15 PiB Lustre file system for storage. The results of our study add to the state-of-the-art in I/O understanding by providing insight into how general HPC workloads affect the performance of large-scale storage systems. Some key observations in our study show that reads and writes are evenly distributed across the storage system; applications which perform I/O, spread that I/O across ∼78% of the minutes of their runtime on average; less than 22% of HPC users who submit write-intensive jobs perform efficient writes to the file system; and I/O contention seriously impacts I/O performance.","PeriodicalId":375004,"journal":{"name":"2020 IEEE 27th International Conference on High Performance Computing, Data, and Analytics (HiPC)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133040078","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 14

GPU-FPtuner: Mixed-precision Auto-tuning for Floating-point Applications on GPU GPU- fptuner: GPU上浮点应用的混合精度自动调谐

2020 IEEE 27th International Conference on High Performance Computing, Data, and Analytics (HiPC) Pub Date : 2020-12-01 DOI: 10.1109/HiPC50609.2020.00043

Ruidong Gu, M. Becchi

{"title":"GPU-FPtuner: Mixed-precision Auto-tuning for Floating-point Applications on GPU","authors":"Ruidong Gu, M. Becchi","doi":"10.1109/HiPC50609.2020.00043","DOIUrl":"https://doi.org/10.1109/HiPC50609.2020.00043","url":null,"abstract":"GPUs have been extensively used to accelerate scientific applications from a variety of domains: computational fluid dynamics, astronomy and astrophysics, climate modeling, numerical analysis, to name a few. Many of these applications rely on floating-point arithmetic, which is approximate in nature. High-precision libraries have been proposed to mitigate accuracy issues due to the use of floating-point arithmetic. However, these libraries offer increased accuracy at a significant performance cost. Previous work, primarily focusing on CPU code and on standard IEEE floating-point data types, has explored mixed precision as a compromise between performance and accuracy. In this work, we propose a mixed precision autotuner for GPU applications that rely on floating-point arithmetic. Our tool supports standard 32- and 64-bit floating-point arithmetic, as well as high precision through the QD library. Our autotuner relies on compiler analysis to reduce the size of the tuning space. In particular, our tuning strategy takes into account code patterns prone to error propagation and GPU-specific considerations to generate a tuning plan that balances performance and accuracy. Our autotuner pipeline, implemented using the ROSE compiler and Python scripts, is fully automated and the code is available in open source. Our experimental results collected on benchmark applications with various code complexities show performance-accuracy tradeoffs for these applications and the effectiveness of our tool in identifying representative tuning points.","PeriodicalId":375004,"journal":{"name":"2020 IEEE 27th International Conference on High Performance Computing, Data, and Analytics (HiPC)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122985739","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

On the Marriage of Asynchronous Many Task Runtimes and Big Data: A Glance 论异步多任务运行时与大数据的结合

2020 IEEE 27th International Conference on High Performance Computing, Data, and Analytics (HiPC) Pub Date : 2020-12-01 DOI: 10.1109/HiPC50609.2020.00037

Joshua D. Suetterlein, J. Manzano, A. Márquez, G. Gao

{"title":"On the Marriage of Asynchronous Many Task Runtimes and Big Data: A Glance","authors":"Joshua D. Suetterlein, J. Manzano, A. Márquez, G. Gao","doi":"10.1109/HiPC50609.2020.00037","DOIUrl":"https://doi.org/10.1109/HiPC50609.2020.00037","url":null,"abstract":"The rise of the accelerator-based architectures and reconfigurable computing have showcased the weakness of software stack toolchains that still maintain a static view of the hardware instead of relying on a symbiotic relationship between static (e.g., compilers) and dynamic tools (e.g., runtimes). In the past decades, this need has given rise to adaptive runtimes with increasingly finer computational tasks. These finer tasks help to take advantage of the hardware by switching out when a long latency operation is encountered (because of the deeper memory hierarchies and new memory technologies that might target streaming instead of random access), thus trading off idle time for unrelated work. Examples of these finer task runtimes are Asynchronous Many Task (AMT) runtimes, in which highly efficient computational graphs run on a variety of hardware. Due to its inherent latency tolerant characteristics, latency-sensitive applications, such as Graph Analytics and Big Data can effectively use these runtimes. This paper aims to present an example of how the careful design of an AMT can exploit the hardware substrate when faced with high latency applications such as the ones given in the Big Data domain. Moreover, with its introspection and adaptive capabilities, we aim to show the power of these runtimes when facing the changing requirements of application workloads. We use the Performance Open Community Runtime (P-OCR) as our vehicle to demonstrate the concepts presented here.","PeriodicalId":375004,"journal":{"name":"2020 IEEE 27th International Conference on High Performance Computing, Data, and Analytics (HiPC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115791195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Pipelined Preconditioned Conjugate Gradient Methods for Distributed Memory Systems 分布式存储系统的流水线预条件共轭梯度方法

2020 IEEE 27th International Conference on High Performance Computing, Data, and Analytics (HiPC) Pub Date : 2020-12-01 DOI: 10.1109/HiPC50609.2020.00029

Manas Tiwari, Sathish S. Vadhiyar

引用次数: 0

[Title page] (标题页)

2020 IEEE 27th International Conference on High Performance Computing, Data, and Analytics (HiPC) Pub Date : 2020-12-01 DOI: 10.1109/hipc50609.2020.00002

引用次数: 0

Blink: Towards Efficient RDMA-based Communication Coroutines for Parallel Python Applications 面向并行Python应用的基于rdma的高效通信协程

2020 IEEE 27th International Conference on High Performance Computing, Data, and Analytics (HiPC) Pub Date : 2020-12-01 DOI: 10.1109/HiPC50609.2020.00025

A. Shafi, J. Hashmi, H. Subramoni, D. Panda

{"title":"Blink: Towards Efficient RDMA-based Communication Coroutines for Parallel Python Applications","authors":"A. Shafi, J. Hashmi, H. Subramoni, D. Panda","doi":"10.1109/HiPC50609.2020.00025","DOIUrl":"https://doi.org/10.1109/HiPC50609.2020.00025","url":null,"abstract":"Python is emerging as a popular language in the data science community due to its ease-of-use, vibrant community, and rich set of libraries. Dask is a popular Python-based distributed computing framework that allows users to process large amounts of data on parallel hardware. The Dask distributed package is a non-blocking, asynchronous, and concurrent library that offers support for distributed execution of tasks on datacenter and HPC environments. A few key requirements of designing high-performance communication backends for Dask distributed is to provide scalable support for coroutines that are unlike regular Python functions and can only be invoked from asynchronous applications. In this paper, we present Blink—a high-performance communication library for Dask on high-performance RDMA networks like InfiniBand. Blink offers a multi-layered architecture that matches the communication requirements of Dask and exploits high-performance interconnects using a Cython wrapper layer to the C backend. We evaluate the performance of Blink against other counterparts using various micro-benchmarks and application kernels on three different cluster testbeds with varying interconnect speeds. Our micro-benchmark evaluation reveals that Blink outperforms other communication backends by more than 3× for message sizes ranging from 1 Byte to 64 KByte, and by a factor of 2× for message sizes ranging from 128 KByte to 8 MByte. Using various application-level evaluations, we demonstrate that Dask achieves up to 7% improvement in application throughput (e.g., total worker throughput).","PeriodicalId":375004,"journal":{"name":"2020 IEEE 27th International Conference on High Performance Computing, Data, and Analytics (HiPC)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124138887","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Parallel Hierarchical Clustering using Rank-Two Nonnegative Matrix Factorization 基于二阶非负矩阵分解的并行分层聚类

2020 IEEE 27th International Conference on High Performance Computing, Data, and Analytics (HiPC) Pub Date : 2020-12-01 DOI: 10.1109/HiPC50609.2020.00028

Lawton Manning, Grey Ballard, R. Kannan, Haesun Park

{"title":"Parallel Hierarchical Clustering using Rank-Two Nonnegative Matrix Factorization","authors":"Lawton Manning, Grey Ballard, R. Kannan, Haesun Park","doi":"10.1109/HiPC50609.2020.00028","DOIUrl":"https://doi.org/10.1109/HiPC50609.2020.00028","url":null,"abstract":"Nonnegative Matrix Factorization (NMF) is an effective tool for clustering nonnegative data, either for computing a flat partitioning of a dataset or for determining a hierarchy of similarity. In this paper, we propose a parallel algorithm for hierarchical clustering that uses a divide-and-conquer approach based on rank-two NMF to split a data set into two cohesive parts. Not only does this approach uncover more structure in the data than a flat NMF clustering, but also rank-two NMF can be computed more quickly than for general ranks, providing comparable overall time to solution. Our data distribution and parallelization strategies are designed to maintain computational load balance throughout the data-dependent hierarchy of computation while limiting interprocess communication, allowing the algorithm to scale to large dense and sparse data sets. We demonstrate the scalability of our parallel algorithm in terms of data size (up to 800 GB) and number of processors (up to 80 nodes of the Summit supercomputer), applying the hierarchical clustering approach to hyperspectral imaging and image classification data. Our algorithm for Rank-2 NMF scales perfectly on up to 1000s of cores and the entire hierarchical clustering method achieves 5.9x speedup scaling from 10 to 80 nodes on the 800 GB dataset.","PeriodicalId":375004,"journal":{"name":"2020 IEEE 27th International Conference on High Performance Computing, Data, and Analytics (HiPC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130860892","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Message from the General Co-Chairs 一般共同主席的致辞

2020 IEEE 27th International Conference on High Performance Computing, Data, and Analytics (HiPC) Pub Date : 2020-12-01 DOI: 10.1109/micro.2006.35

A. Luque, Yousef Ibrahim, J. J. Rodríguez

引用次数: 0

HiPC 2020 Technical Program Committee 重债穷国2020技术规划委员会

2020 IEEE 27th International Conference on High Performance Computing, Data, and Analytics (HiPC) Pub Date : 2020-12-01 DOI: 10.1109/hipc50609.2020.00008

引用次数: 0