SC17: International Conference for High Performance Computing, Networking, Storage and Analysis最新文献

PapyrusKV: A High-Performance Parallel Key-Value Store for Distributed NVM Architectures PapyrusKV:分布式NVM架构的高性能并行键值存储

SC17: International Conference for High Performance Computing, Networking, Storage and Analysis Pub Date : 2017-11-12 DOI: 10.1145/3126908.3126943

Jungwon Kim, Seyong Lee, J. Vetter

{"title":"PapyrusKV: A High-Performance Parallel Key-Value Store for Distributed NVM Architectures","authors":"Jungwon Kim, Seyong Lee, J. Vetter","doi":"10.1145/3126908.3126943","DOIUrl":"https://doi.org/10.1145/3126908.3126943","url":null,"abstract":"This paper introduces PapyrusKV, a parallel embedded key-value store (KVS) for distributed high-performance computing (HPC) architectures that offer potentially massive pools of nonvolatile memory (NVM). PapyrusKV stores keys with their values in arbitrary byte arrays across multiple NVMs in a distributed system. PapyrusKV provides standard KVS operations such as put, get, and delete. More importantly, PapyrusKV provides advanced features for HPC such as dynamic consistency control, zero-copy workflow, and asynchronous checkpoint/restart. Beyond filesystems, PapyrusKV provides HPC programmers with a high-level interface to exploit distributed NVM in the system, and it transparently organizes data to achieve high performance. Also, it allows HPC applications to specialize PapyrusKV to meet their specific requirements. We empirically evaluate PapyrusKV on three HPC systems with real NVM devices: OLCF’s Summitdev, TACC’s Stampede, and NERSC’s Cori. Our results show that PapyrusKV can offer high performance, scalability, and portability across these various distributed NVM architectures. CCS CONCEPTS • Information systems → Key-value stores; • Hardware → Non-volatile memory; • Software and its engineering → Distributed programming languages;","PeriodicalId":204241,"journal":{"name":"SC17: International Conference for High Performance Computing, Networking, Storage and Analysis","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125162381","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 43

TagIt: An Integrated Indexing and Search Service for File Systems TagIt:文件系统的综合索引和搜索服务

SC17: International Conference for High Performance Computing, Networking, Storage and Analysis Pub Date : 2017-11-12 DOI: 10.1145/3126908.3126929

Hyogi Sim, Youngjae Kim, Sudharshan S. Vazhkudai, Geoffroy R. Vallée, Seung-Hwan Lim, A. Butt

{"title":"TagIt: An Integrated Indexing and Search Service for File Systems","authors":"Hyogi Sim, Youngjae Kim, Sudharshan S. Vazhkudai, Geoffroy R. Vallée, Seung-Hwan Lim, A. Butt","doi":"10.1145/3126908.3126929","DOIUrl":"https://doi.org/10.1145/3126908.3126929","url":null,"abstract":"Data services such as search, discovery, and management in scalable distributed environments have traditionally been decoupled from the underlying file systems, and are often deployed using external databases and indexing services. However, modern data production rates, looming data movement costs, and the lack of metadata, entail revisiting the decoupled file system-data services design philosophy. In this paper, we present TagIt, a scalable data management service framework aimed at scientific datasets, which is tightly integrated into a shared-nothing distributed file system. A key feature of TagIt is a scalable, distributed metadata indexing framework, using which we implement a flexible tagging capability to support data discovery. The tags can also be associated with an active operator, for pre-processing, filtering, or automatic metadata extraction, which we seamlessly offload to file servers in a load-aware fashion. Our evaluation shows that TagIt can expedite data search by up to 10$times$ over the extant decoupled approach. CCS CONCEPTS • Software and its engineering $rightarrow$ File systems management; • Information systems $rightarrow$ Distributed storage;","PeriodicalId":204241,"journal":{"name":"SC17: International Conference for High Performance Computing, Networking, Storage and Analysis","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130521266","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 31

Understanding Error Propagation in Deep Learning Neural Network (DNN) Accelerators and Applications 理解深度学习神经网络(DNN)加速器及其应用中的误差传播

SC17: International Conference for High Performance Computing, Networking, Storage and Analysis Pub Date : 2017-11-12 DOI: 10.1145/3126908.3126964

Guanpeng Li, S. Hari, Michael B. Sullivan, Timothy Tsai, K. Pattabiraman, J. Emer, S. Keckler

{"title":"Understanding Error Propagation in Deep Learning Neural Network (DNN) Accelerators and Applications","authors":"Guanpeng Li, S. Hari, Michael B. Sullivan, Timothy Tsai, K. Pattabiraman, J. Emer, S. Keckler","doi":"10.1145/3126908.3126964","DOIUrl":"https://doi.org/10.1145/3126908.3126964","url":null,"abstract":"Deep learning neural networks (DNNs) have been successful in solving a wide range of machine learning problems. Specialized hardware accelerators have been proposed to accelerate the execution of DNN algorithms for high-performance and energy efficiency. Recently, they have been deployed in datacenters (potentially for business-critical or industrial applications) and safety-critical systems such as self-driving cars. Soft errors caused by high-energy particles have been increasing in hardware systems, and these can lead to catastrophic failures in DNN systems. Traditional methods for building resilient systems, e.g., Triple Modular Redundancy (TMR), are agnostic of the DNN algorithm and the DNN accelerator’s architecture. Hence, these traditional resilience approaches incur high overheads, which makes them challenging to deploy. In this paper, we experimentally evaluate the resilience characteristics of DNN systems (i.e., DNN software running on specialized accelerators). We find that the error resilience of a DNN system depends on the data types, values, data reuses, and types of layers in the design. Based on our observations, we propose two efficient protection techniques for DNN systems.","PeriodicalId":204241,"journal":{"name":"SC17: International Conference for High Performance Computing, Networking, Storage and Analysis","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128909078","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 364

A Framework for Scalable Biophysics-based Image Analysis 基于可扩展生物物理的图像分析框架

SC17: International Conference for High Performance Computing, Networking, Storage and Analysis Pub Date : 2017-11-12 DOI: 10.1145/3126908.3126930

A. Gholami, A. Mang, Klaudius Scheufele, C. Davatzikos, M. Mehl, G. Biros

{"title":"A Framework for Scalable Biophysics-based Image Analysis","authors":"A. Gholami, A. Mang, Klaudius Scheufele, C. Davatzikos, M. Mehl, G. Biros","doi":"10.1145/3126908.3126930","DOIUrl":"https://doi.org/10.1145/3126908.3126930","url":null,"abstract":"We present SIBIA (Scalable Integrated Biophysics-based Image Analysis), a framework for coupling biophysical models with medical image analysis. It provides solvers for an image-driven inverse brain tumor growth model and an image registration problem, the combination of which can eventually help in diagnosis and prognosis of brain tumors. The two main computational kernels of SIBIA are a Fast Fourier Transformation (FFT) implemented in the library AccFFT to discretize differential operators, and a cubic interpolation kernel for semi-Lagrangian based advection. We present efficiency and scalability results for the computational kernels, the inverse tumor solver and image registration on two x86 systems, Lonestar 5 at the Texas Advanced Computing Center and Hazel Hen at the Stuttgart High Performance Computing Center. We showcase results that demonstrate that our solver can be used to solve registration problems of unprecedented scale, 40963 resulting in ~ 200 billion unknowns-a problem size that is 64× larger than the state-of-the-art. For problem sizes of clinical interest, SIBIA is about 8× faster than the state-of-the-art. CCS CONCEPTS • Computing methodologies $rightarrow$ Image segmentation; • Mathematics of computing $rightarrow$ Bio-inspired optimization;","PeriodicalId":204241,"journal":{"name":"SC17: International Conference for High Performance Computing, Networking, Storage and Analysis","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129996072","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 18

Input-Aware Auto-Tuning of Compute-Bound HPC Kernels 计算绑定HPC内核的输入感知自动调优

SC17: International Conference for High Performance Computing, Networking, Storage and Analysis Pub Date : 2017-11-12 DOI: 10.1145/3126908.3126939

Philippe Tillet, David D. Cox

引用次数: 27

Egeria: A Framework for Automatic Synthesis of HPC Advising Tools through Multi-Layered Natural Language Processing 通过多层自然语言处理实现HPC建议工具自动合成的框架

SC17: International Conference for High Performance Computing, Networking, Storage and Analysis Pub Date : 2017-11-12 DOI: 10.1145/3126908.3126961

Hui Guan, Xipeng Shen, H. Krim

{"title":"Egeria: A Framework for Automatic Synthesis of HPC Advising Tools through Multi-Layered Natural Language Processing","authors":"Hui Guan, Xipeng Shen, H. Krim","doi":"10.1145/3126908.3126961","DOIUrl":"https://doi.org/10.1145/3126908.3126961","url":null,"abstract":"Achieving high performance on modern systems is challenging. Even with a detailed profile from a performance tool, writing or refactoring a program to remove its performance issues is still a daunting task for application programmers: it demands lots of program optimization expertise that is often system specific. Vendors often provide some detailed optimization guides to assist programmers in the process. However, these guides are frequently hundreds of pages long, making it difficult for application programmers to master and memorize all the rules and guidelines and properly apply them to a specific problem instance. In this work, we develop a framework named Egeria to alleviate the difficulty. Through Egeria, one can easily construct an advising tool for a certain high performance computing (HPC) domain (e.g., GPU programming) by providing Egeria with a optimization guide or other related documents for the target domain. An advising tool produced by Egeria provides a concise list of essential rules automatically extracted from the documents. At the same time, the advising tool serves as a question-answer agent that can interactively offers suggestions for specific optimization questions. Egeria is made possible through a distinctive multi-layered design that leverages natural language processing techniques and extends them with knowledge of HPC domains and how to extract information relevant to code optimization Experiments on CUDA, OpenCL, and Xeon Phi programming guides demonstrate, both qualitatively and quantitatively, the usefulness of Egeria for HPC. CCS CONCEPTS • General and reference → Performance; • Computing methodologies → Natural language processing;","PeriodicalId":204241,"journal":{"name":"SC17: International Conference for High Performance Computing, Networking, Storage and Analysis","volume":"91 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115809240","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Understanding Object-level Memory Access Patterns Across the Spectrum 理解对象级内存访问模式

SC17: International Conference for High Performance Computing, Networking, Storage and Analysis Pub Date : 2017-11-12 DOI: 10.1145/3126908.3126917

Xu Ji, Chao Wang, Nosayba El-Sayed, Xiaosong Ma, Youngjae Kim, Sudharshan S. Vazhkudai, W. Xue, Daniel Sánchez

{"title":"Understanding Object-level Memory Access Patterns Across the Spectrum","authors":"Xu Ji, Chao Wang, Nosayba El-Sayed, Xiaosong Ma, Youngjae Kim, Sudharshan S. Vazhkudai, W. Xue, Daniel Sánchez","doi":"10.1145/3126908.3126917","DOIUrl":"https://doi.org/10.1145/3126908.3126917","url":null,"abstract":"Memory accesses limit the performance and scalability of countless applications. Many design and optimization efforts will benefit from an in-depth understanding of memory access behavior, which is not offered by extant access tracing and profiling methods.In this paper, we adopt a holistic memory access profiling approach to enable a better understanding of program-system memory interactions. We have developed a two-pass tool adopting fast online and slow offline profiling, with which we have profiled, at the variable/object level, a collection of 38 representative applications spanning major domains (HPC, personal computing, data analytics, AI, graph processing, and datacenter workloads), at varying problem sizes. We have performed detailed result analysis and code examination. Our findings provide new insights into application memory behavior, including insights on per-object access patterns, adoption of data structures, and memory-access changes at different problem sizes. We find that scientific computation applications exhibit distinct behaviors compared to datacenter workloads, motivating separate memory system design/optimizations.","PeriodicalId":204241,"journal":{"name":"SC17: International Conference for High Performance Computing, Networking, Storage and Analysis","volume":"198 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123015142","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 17

LocoFS: A Loosely-Coupled Metadata Service for Distributed File Systems locfs:分布式文件系统的松散耦合元数据服务

SC17: International Conference for High Performance Computing, Networking, Storage and Analysis Pub Date : 2017-11-12 DOI: 10.1145/3126908.3126928

Siyang Li, Youyou Lu, J. Shu, Yang Hu, Tao Li

引用次数: 32

Run-to-run Variability on Xeon Phi based Cray XC Systems 基于Xeon Phi的Cray XC系统的运行间可变性

SC17: International Conference for High Performance Computing, Networking, Storage and Analysis Pub Date : 2017-11-12 DOI: 10.1145/3126908.3126926

Sudheer Chunduri, K. Harms, Scott Parker, V. Morozov, Samuel Oshin, N. Cherukuri, Kalyan Kumaran

引用次数: 60

ParaStack: Efficient Hang Detection for MPI Programs at Large Scale ParaStack:大规模MPI程序的有效挂起检测

SC17: International Conference for High Performance Computing, Networking, Storage and Analysis Pub Date : 2017-11-12 DOI: 10.1145/3126908.3126938

Hongbo Li, Zizhong Chen, Rajiv Gupta

{"title":"ParaStack: Efficient Hang Detection for MPI Programs at Large Scale","authors":"Hongbo Li, Zizhong Chen, Rajiv Gupta","doi":"10.1145/3126908.3126938","DOIUrl":"https://doi.org/10.1145/3126908.3126938","url":null,"abstract":"While program hangs on large parallel systems can be detected via the widely used timeout mechanism, it is difficult for the users to set the timeout-too small a timeout leads to high false alarm rates and too large a timeout wastes a vast amount of valuable computing resources. To address the above problems with hang detection, this paper presents ParaStack, an extremely lightweight tool to detect hangs in a timely manner with high accuracy, negligible overhead with great scalability, and without requiring the user to select a timeout value. For a detected hang, it provides direction for further analysis by telling users whether the hang is the result of an error in the computation phase or the communication phase. For a computation-error induced hang, our tool pinpoints the faulty process by excluding hundreds and thousands of other processes. We have adapted ParaStack to work with the Torque and Slurm parallel batch schedulers and validated its functionality and performance on Tianhe-2 and Stampede that are respectively the world’s current 2nd and 12th fastest supercomputers. Experimental results demonstrate that ParaStack detects hangs in a timely manner at negligible overhead with over 99% accuracy. No false alarm is observed in correct runs taking 66 hours at scale of 256 processes and 39.7 hours at scale of 1024 processes. ParaStack accurately reports the faulty process for computation-error induced hangs.","PeriodicalId":204241,"journal":{"name":"SC17: International Conference for High Performance Computing, Networking, Storage and Analysis","volume":"134 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131243935","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11