2019 IEEE/ACM Fourth International Parallel Data Systems Workshop (PDSW)最新文献

Understanding Data Motion in the Modern HPC Data Center 理解现代HPC数据中心中的数据运动

2019 IEEE/ACM Fourth International Parallel Data Systems Workshop (PDSW) Pub Date : 2019-11-01 DOI: 10.1109/PDSW49588.2019.00012

Glenn K. Lockwood, S. Snyder, S. Byna, P. Carns, N. Wright

{"title":"Understanding Data Motion in the Modern HPC Data Center","authors":"Glenn K. Lockwood, S. Snyder, S. Byna, P. Carns, N. Wright","doi":"10.1109/PDSW49588.2019.00012","DOIUrl":"https://doi.org/10.1109/PDSW49588.2019.00012","url":null,"abstract":"The utilization and performance of storage, compute, and network resources within HPC data centers have been studied extensively, but much less work has gone toward characterizing how these resources are used in conjunction to solve larger scientific challenges. To address this gap, we present our work in characterizing workloads and workflows at a data-center-wide level by examining all data transfers that occurred between storage, compute, and the external network at the National Energy Research Scientific Computing Center over a three-month period in 2019. Using a simple abstract representation of data transfers, we analyze over 100 million transfer logs from Darshan, HPSS user interfaces, and Globus to quantify the load on data paths between compute, storage, and the wide-area network based on transfer direction, user, transfer tool, source, destination, and time. We show that parallel I/O from user jobs, while undeniably important, is only one of several major I/O workloads that occurs throughout the execution of scientific workflows. We also show that this approach can be used to connect anomalous data traffic to specific users and file access patterns, and we construct time-resolved user transfer traces to demonstrate that one can systematically identify coupled data motion for individual workflows.","PeriodicalId":130430,"journal":{"name":"2019 IEEE/ACM Fourth International Parallel Data Systems Workshop (PDSW)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128624976","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

[Copyright notice] (版权)

2019 IEEE/ACM Fourth International Parallel Data Systems Workshop (PDSW) Pub Date : 2019-11-01 DOI: 10.1109/pdsw49588.2019.00002

引用次数: 0

Towards Physical Design Management in Storage Systems 面向存储系统的物理设计管理

2019 IEEE/ACM Fourth International Parallel Data Systems Workshop (PDSW) Pub Date : 2019-11-01 DOI: 10.1109/PDSW49588.2019.00009

K. Dahlgren, J. LeFevre, Ashay Shirwadkar, Ken Iizawa, Aldrin Montana, P. Alvaro, C. Maltzahn

{"title":"Towards Physical Design Management in Storage Systems","authors":"K. Dahlgren, J. LeFevre, Ashay Shirwadkar, Ken Iizawa, Aldrin Montana, P. Alvaro, C. Maltzahn","doi":"10.1109/PDSW49588.2019.00009","DOIUrl":"https://doi.org/10.1109/PDSW49588.2019.00009","url":null,"abstract":"In the post-Moore era, systems and devices with new architectures will arrive at a rapid rate with significant impacts on the software stack. Applications will not be able to fully benefit from new architectures unless they can delegate adapting to new devices in lower layers of the stack. In this paper we introduce physical design management which deals with the problem of identifying and executing transformations on physical designs of stored data, i.e. how data is mapped to storage abstractions like files, objects, or blocks, in order to improve performance. Physical design is traditionally placed with applications, access libraries, and databases, using hard-wired assumptions about underlying storage systems. Yet, storage systems increasingly not only contain multiple kinds of storage devices with vastly different performance profiles but also move data among those storage devices, thereby changing the benefit of a particular physical design. We advocate placing physical design management in storage, identify interesting research challenges, provide a brief description of a prototype implementation in Ceph, and discuss the results of initial experiments at scale that are replicable using Cloudlab. These experiments show performance and resource utilization trade-offs associated with choosing different physical designs and choosing to transform between physical designs.","PeriodicalId":130430,"journal":{"name":"2019 IEEE/ACM Fourth International Parallel Data Systems Workshop (PDSW)","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116931338","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Applying Machine Learning to Understand Write Performance of Large-scale Parallel Filesystems 应用机器学习理解大规模并行文件系统的写性能

2019 IEEE/ACM Fourth International Parallel Data Systems Workshop (PDSW) Pub Date : 2019-11-01 DOI: 10.1109/PDSW49588.2019.00008

Bing Xie, Zilong Tan, P. Carns, J. Chase, K. Harms, J. Lofstead, S. Oral, Sudharshan S. Vazhkudai, Feiyi Wang

{"title":"Applying Machine Learning to Understand Write Performance of Large-scale Parallel Filesystems","authors":"Bing Xie, Zilong Tan, P. Carns, J. Chase, K. Harms, J. Lofstead, S. Oral, Sudharshan S. Vazhkudai, Feiyi Wang","doi":"10.1109/PDSW49588.2019.00008","DOIUrl":"https://doi.org/10.1109/PDSW49588.2019.00008","url":null,"abstract":"In high-performance computing (HPC), I/O performance prediction offers the potential to improve the efficiency of scientific computing. In particular, accurate prediction can make runtime estimates more precise, guide users toward optimal checkpoint strategies, and better inform facility provisioning and scheduling policies. HPC I/O performance is notoriously difficult to predict and model, however, in large part because of inherent variability and a lack of transparency in the behaviors of constituent storage system components. In this work we seek to advance the state of the art in HPC I/O performance prediction by (1) modeling the mean performance to address high variability, (2) deriving model features from write patterns, system architecture and system configurations, and (3) employing Lasso regression model to improve model accuracy. We demonstrate the efficacy of our approach by applying it to a crucial subset of common HPC I/O motifs, namely, file-per-process checkpoint write workloads. We conduct experiments on two distinct production HPC platforms — Titan at the Oak Ridge Leadership Computing Facility and Cetus at the Argonne Leadership Computing Facility — to train and evaluate our models. We find that we can attain ≤ 30% relative error for 92.79% and 99.64% of the samples in our test set on these platforms, respectively.","PeriodicalId":130430,"journal":{"name":"2019 IEEE/ACM Fourth International Parallel Data Systems Workshop (PDSW)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115154953","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

A Foundation for Automated Placement of Data 数据自动放置的基础

2019 IEEE/ACM Fourth International Parallel Data Systems Workshop (PDSW) Pub Date : 2019-11-01 DOI: 10.1109/PDSW49588.2019.00010

Douglas Otstott, Ming Zhao, Latchesar Ionkov

引用次数: 0

Profiling Platform Storage Using IO500 and Mistral 使用IO500和Mistral分析平台存储

2019 IEEE/ACM Fourth International Parallel Data Systems Workshop (PDSW) Pub Date : 2019-11-01 DOI: 10.1109/PDSW49588.2019.00011

Nolan D. Monnier, J. Lofstead, Margaret Lawson, M. Curry

引用次数: 4

Enabling Transparent Asynchronous I/O using Background Threads 使用后台线程启用透明异步I/O

2019 IEEE/ACM Fourth International Parallel Data Systems Workshop (PDSW) Pub Date : 2019-11-01 DOI: 10.1109/PDSW49588.2019.00006

Houjun Tang, Q. Koziol, S. Byna, J. Mainzer, Tonglin Li

{"title":"Enabling Transparent Asynchronous I/O using Background Threads","authors":"Houjun Tang, Q. Koziol, S. Byna, J. Mainzer, Tonglin Li","doi":"10.1109/PDSW49588.2019.00006","DOIUrl":"https://doi.org/10.1109/PDSW49588.2019.00006","url":null,"abstract":"With scientific applications moving toward exascale levels, an increasing amount of data is being produced and analyzed. Providing efficient data access is crucial to the productivity of the scientific discovery process. Compared to improvements in CPU and network speeds, I/O performance lags far behind, such that moving data across the storage hierarchy can take longer than data generation or analysis. To alleviate this I/O bottleneck, asynchronous read and write operations have been provided by the POSIX and MPI-I/O interfaces and can overlap I/O operations with computation, and thus hide I/O latency. However, these standards lack support for non-data operations such as file open, stat, and close, and their read and write operations require users to both manually manage data dependencies and use low-level byte offsets. This requires significant effort and expertise for applications to utilize. To overcome these issues, we present an asynchronous I/O framework that provides support for all I/O operations and manages data dependencies transparently and automatically. Our prototype asynchronous I/O implementation as an HDF5 VOL connector demonstrates the effectiveness of hiding the I/O cost from the application with low overhead and easy-to-use programming interface.","PeriodicalId":130430,"journal":{"name":"2019 IEEE/ACM Fourth International Parallel Data Systems Workshop (PDSW)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122920793","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

Active Learning-based Automatic Tuning and Prediction of Parallel I/O Performance 基于主动学习的并行I/O性能自动调优与预测

2019 IEEE/ACM Fourth International Parallel Data Systems Workshop (PDSW) Pub Date : 2019-11-01 DOI: 10.1109/PDSW49588.2019.00007

Megha Agarwal, Divyansh Singhvi, Preeti Malakar, S. Byna

{"title":"Active Learning-based Automatic Tuning and Prediction of Parallel I/O Performance","authors":"Megha Agarwal, Divyansh Singhvi, Preeti Malakar, S. Byna","doi":"10.1109/PDSW49588.2019.00007","DOIUrl":"https://doi.org/10.1109/PDSW49588.2019.00007","url":null,"abstract":"Parallel I/O is an indispensable part of scientific applications. The current stack of parallel I/O contains many tunable parameters. While changing these parameters can increase I/O performance many-fold, the application developers usually resort to default values because tuning is a cumbersome process and requires expertise. We propose two auto-tuning models, based on active learning that recommend a good set of parameter values (currently tested with Lustre parameters and MPI-IO hints) for an application on a given system. These models use Bayesian optimization to find the values of parameters by minimizing an objective function. The first model runs the application to determine these values, whereas, the second model uses an I/O prediction model for the same. Thus the training time is significantly reduced in comparison to the first model (e.g., from 800 seconds to 18 seconds). Also both the models provide flexibility to focus on improvement of either read or write performance. To keep the tuning process generic, we have focused on both read and write performance. We have validated our models using an I/O benchmark (IOR) and 3 scientific application I/O kernels (S3D-IO, BT-IO and GenericIO) on two supercomputers (HPC2010 and Cori). Using the two models, we achieve an increase in I/O bandwidth of up to 11× over the default parameters. We got up to 3× improvements for 37 TB writes, corresponding to 1 billion particles in GenericIO. We also achieved up to 3.2× higher bandwidth for 4.8 TB of noncontiguous I/O in BT-IO benchmark.","PeriodicalId":130430,"journal":{"name":"2019 IEEE/ACM Fourth International Parallel Data Systems Workshop (PDSW)","volume":"178 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122875620","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

In Search of a Fast and Efficient Serverless DAG Engine 寻找快速高效的无服务器DAG引擎

2019 IEEE/ACM Fourth International Parallel Data Systems Workshop (PDSW) Pub Date : 2019-10-14 DOI: 10.1109/PDSW49588.2019.00005

Benjamin Carver, Jingyuan Zhang, Ao Wang, Yue Cheng

{"title":"In Search of a Fast and Efficient Serverless DAG Engine","authors":"Benjamin Carver, Jingyuan Zhang, Ao Wang, Yue Cheng","doi":"10.1109/PDSW49588.2019.00005","DOIUrl":"https://doi.org/10.1109/PDSW49588.2019.00005","url":null,"abstract":"Python-written data analytics applications can be modeled as and compiled into a directed acyclic graph (DAG) based workflow, where the nodes are fine-grained tasks and the edges are task dependencies.Such analytics workflow jobs are increasingly characterized by short, fine-grained tasks with large fan-outs. These characteristics make them well-suited for a new cloud computing model called serverless computing or Function-as-a-Service (FaaS), which has become prevalent in recent years. The auto-scaling property of serverless computing platforms accommodates short tasks and bursty workloads, while the pay-per-use billing model of serverless computing providers keeps the cost of short tasks low. In this paper, we thoroughly investigate the problem space of DAG scheduling in serverless computing. We identify and evaluate a set of techniques to make DAG schedulers serverless-aware. These techniques have been implemented in WUKONG , a serverless, DAG scheduler attuned to AWS Lambda. WUKONG provides decentralized scheduling through a combination of static and dynamic scheduling. We present the results of an empirical study in which WUKONG is applied to a range of microbenchmark and real-world DAG applications. Results demonstrate the efficacy of WUKONG in minimizing the performance overhead introduced by AWS Lambda — WUKONG achieves competitive performance compared to a serverful DAG scheduler, while improving the performance of real-world DAG jobs by as much as 4.1x at larger scale.","PeriodicalId":130430,"journal":{"name":"2019 IEEE/ACM Fourth International Parallel Data Systems Workshop (PDSW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131465740","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 27