Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing最新文献_第2页

CuMF_SGD: Parallelized Stochastic Gradient Descent for Matrix Factorization on GPUs 基于gpu的矩阵分解并行化随机梯度下降

Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing Pub Date : 2017-06-26 DOI: 10.1145/3078597.3078602

Xiaolong Xie, Wei Tan, L. Fong, Yun Liang

引用次数: 37

ArrayUDF: User-Defined Scientific Data Analysis on Arrays ArrayUDF:用户自定义的阵列科学数据分析

Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing Pub Date : 2017-06-26 DOI: 10.1145/3078597.3078599

Bin Dong, Kesheng Wu, S. Byna, Jialin Liu, Weijie Zhao, Florin Rusu

{"title":"ArrayUDF: User-Defined Scientific Data Analysis on Arrays","authors":"Bin Dong, Kesheng Wu, S. Byna, Jialin Liu, Weijie Zhao, Florin Rusu","doi":"10.1145/3078597.3078599","DOIUrl":"https://doi.org/10.1145/3078597.3078599","url":null,"abstract":"User-Defined Functions (UDF) allow application programmers to specify analysis operations on data, while leaving the data management tasks to the system. This general approach enables numerous custom analysis functions and is at the heart of the modern Big Data systems. Even though the UDF mechanism can theoretically support arbitrary operations, a wide variety of common operations -- such as computing the moving average of a time series, the vorticity of a fluid flow, etc., -- are hard to express and slow to execute. Since these operations are traditionally performed on multi-dimensional arrays, we propose to extend the expressiveness of structural locality for supporting UDF operations on arrays. We further propose an in situ UDF mechanism, called ArrayUDF, to implement the structural locality. ArrayUDF allows users to define computations on adjacent array cells without the use of join operations and executes the UDF directly on arrays stored in data files without requiring to load their content into a data management system. Additionally, we present a thorough theoretical analysis of the data access cost to exploit the structural locality, which enables ArrayUDF to automatically select the best array partitioning strategy for a given UDF operation. In a series of performance evaluations on large scientific datasets, we have observed that -- using the generic UDF interface -- ArrayUDF consistently outperforms Spark, SciDB, and RasDaMan.","PeriodicalId":436194,"journal":{"name":"Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132439376","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 27

Using Scientific Computing to Advance Wildland Fire Monitoring and Prediction 利用科学计算推进野火监测与预测

Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing Pub Date : 2017-06-26 DOI: 10.1145/3078597.3078619

J. Coen

{"title":"Using Scientific Computing to Advance Wildland Fire Monitoring and Prediction","authors":"J. Coen","doi":"10.1145/3078597.3078619","DOIUrl":"https://doi.org/10.1145/3078597.3078619","url":null,"abstract":"New technologies have transformed our understanding of wildland fire behavior, providing a better ability to observe them from a variety of platforms, simulate their growth with computational models, and interpret their frequency and controls in a global context. These tools have shown how wildland fires are among the extremes of weather events and can produce behaviors such as fire whirls, blow-ups, bursts of flame along the surface, and winds ten times stronger than ambient conditions, all of which result from the interactions between a fire and its atmospheric environment. I will highlight current research in integrated weather -- wildland fire computational modeling, fire detection, and observation, and their application to understanding and prediction. Coupled weather-wildland fire models tie numerical weather prediction models to wildland fire behavior modules to simulate the impact of a fire on the atmosphere and the subsequent feedback of these fire-induced winds on fire behavior, i.e. how a fire \"creates its own weather\". NCAR's CAWFE® modeling system has been used to explain fundamental fire phenomena and reproduce the unfolding of past fire events. Recent work, in which CAWFE has been integrated with satellite-based active fire detection data, addresses the challenges of applying it as an operational forecast tool. This newer generation of tools brought many goals within sight -- rapid fire detection, nearly ubiquitous monitoring, and recognition that many of the distinctive characteristics of fire events are reproducible and perhaps predictable in real time. Concurrently, these more complex tools raise new challenges. I conclude with innovative model-data fusion approaches to overcome some of these remaining puzzles.","PeriodicalId":436194,"journal":{"name":"Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127885176","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

To Push or To Pull: On Reducing Communication and Synchronization in Graph Computations 推还是拉:减少图计算中的通信和同步

Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing Pub Date : 2017-06-26 DOI: 10.1145/3078597.3078616

Maciej Besta, Michal Podstawski, Linus Groner, Edgar Solomonik, T. Hoefler

{"title":"To Push or To Pull: On Reducing Communication and Synchronization in Graph Computations","authors":"Maciej Besta, Michal Podstawski, Linus Groner, Edgar Solomonik, T. Hoefler","doi":"10.1145/3078597.3078616","DOIUrl":"https://doi.org/10.1145/3078597.3078616","url":null,"abstract":"We reduce the cost of communication and synchronization in graph processing by analyzing the fastest way to process graphs: pushing the updates to a shared state or pulling the updates to a private state. We investigate the applicability of this push-pull dichotomy to various algorithms and its impact on complexity, performance, and the amount of used locks, atomics, and reads/writes. We consider 11 graph algorithms, 3 programming models, 2 graph abstractions, and various families of graphs. The conducted analysis illustrates surprising differences between push and pull variants of different algorithms in performance, speed of convergence, and code complexity; the insights are backed up by performance data from hardware counters. We use these findings to illustrate which variant is faster for each algorithm and to develop generic strategies that enable even higher speedups. Our insights can be used to accelerate graph processing engines or libraries on both massively-parallel shared-memory machines as well as distributed-memory systems.","PeriodicalId":436194,"journal":{"name":"Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130243522","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 126

Towards a More Complete Understanding of SDC Propagation 更全面地了解SDC传播

Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing Pub Date : 2017-06-26 DOI: 10.1145/3078597.3078617

Jon C. Calhoun, M. Snir, Luke N. Olson, W. Gropp

引用次数: 25

Enabling Workflow-Aware Scheduling on HPC Systems 在HPC系统上启用工作流感知调度

Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing Pub Date : 2017-06-26 DOI: 10.1145/3078597.3078604

G. P. R. Álvarez, E. Elmroth, Per-Olov Östberg, L. Ramakrishnan

{"title":"Enabling Workflow-Aware Scheduling on HPC Systems","authors":"G. P. R. Álvarez, E. Elmroth, Per-Olov Östberg, L. Ramakrishnan","doi":"10.1145/3078597.3078604","DOIUrl":"https://doi.org/10.1145/3078597.3078604","url":null,"abstract":"Scientific workflows are increasingly common in the workloads of current High Performance Computing (HPC) systems. However, HPC schedulers do not incorporate workflow-specific mechanisms beyond the capacity to declare dependencies between their jobs. Thus, workflows are run as sets of batch jobs with dependencies, which induces long intermediate wait times and, consequently, long workflow turnaround times. Alternatively, to reduce their turnaround time, workflows may be submitted as single pilot jobs that are allocated their maximum required resources for their entire runtime. Pilot jobs achieve shorter turnaround times but reduce the HPC system's utilization because resources may idle during the workflow's execution. We present a workflow-aware scheduling (WoAS) system that enables existing scheduling algorithms to exploit fine-grained information on a workflow's resource requirements and structure without modification. The current implementation of WoAS is integrated into Slurm, a widely used HPC batch scheduler. We evaluate the system using a simulator using real and synthetic workflows and a synthetic baseline workload that captures job patterns observed over three years of workload data from Edison, a large supercomputer hosted at the National Energy Research Scientific Computing Center. Our results show that WoAS reduces workflow turnaround times and improves system utilization without significantly slowing down conventional jobs.","PeriodicalId":436194,"journal":{"name":"Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing","volume":"768 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134614050","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 28

Predicting Output Performance of a Petascale Supercomputer 预测千兆级超级计算机的输出性能

Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing Pub Date : 2017-06-26 DOI: 10.1145/3078597.3078614

Bing Xie, Yezhou Huang, J. Chase, J. Choi, S. Klasky, J. Lofstead, S. Oral

{"title":"Predicting Output Performance of a Petascale Supercomputer","authors":"Bing Xie, Yezhou Huang, J. Chase, J. Choi, S. Klasky, J. Lofstead, S. Oral","doi":"10.1145/3078597.3078614","DOIUrl":"https://doi.org/10.1145/3078597.3078614","url":null,"abstract":"In this paper, we develop a predictive model useful for output performance prediction of supercomputer file systems under production load. Our target environment is Titan---the 3rd fastest supercomputer in the world---and its Lustre-based multi-stage write path. We observe from Titan that although output performance is highly variable at small time scales, the mean performance is stable and consistent over typical application run times. Moreover, we find that output performance is non-linearly related to its correlated parameters due to interference and saturation on individual stages on the path. These observations enable us to build a predictive model of expected write times of output patterns and I/O configurations, using feature transformations to capture non-linear relationships. We identify the candidate features based on the structure of the Lustre/Titan write path, and use feature transformation functions to produce a model space with 135,000 candidate models. By searching for the minimal mean square error in this space we identify a good model and show that it is effective.","PeriodicalId":436194,"journal":{"name":"Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125468411","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 50

Diagnosing Machine Learning Pipelines with Fine-grained Lineage 用细粒度谱系诊断机器学习管道

Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing Pub Date : 2017-06-26 DOI: 10.1145/3078597.3078603

Zhao Zhang, Evan R. Sparks, M. Franklin

引用次数: 19

TCP Throughput Profiles Using Measurements over Dedicated Connections 使用专用连接测量的TCP吞吐量配置文件

Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing Pub Date : 2017-06-26 DOI: 10.1145/3078597.3078615

N. Rao, Qiang Liu, S. Sen, D. Towsley, Gayane Vardoyan, R. Kettimuthu, Ian T Foster

{"title":"TCP Throughput Profiles Using Measurements over Dedicated Connections","authors":"N. Rao, Qiang Liu, S. Sen, D. Towsley, Gayane Vardoyan, R. Kettimuthu, Ian T Foster","doi":"10.1145/3078597.3078615","DOIUrl":"https://doi.org/10.1145/3078597.3078615","url":null,"abstract":"ide-area data transfers in high-performance computing infrastructures are increasingly being carried over dynamically provisioned dedicated network connections that provide high capacities with no competing traffic. We present extensive TCP throughput measurements and time traces over a suite of physical and emulated 10 Gbps connections with 0-366 ms round-trip times (RTTs). Contrary to the general expectation, they show significant statistical and temporal variations, in addition to the overall dependencies on the congestion control mechanism, buffer size, and the number of parallel streams. We analyze several throughput profiles that have highly desirable concave regions wherein the throughput decreases slowly with RTTs, in stark contrast to the convex profiles predicted by various TCP analytical models. We present a generic throughput model that abstracts the ramp-up and sustainment phases of TCP flows, which provides insights into qualitative trends observed in measurements across TCP variants: (i) slow-start followed by well-sustained throughput leads to concave regions; (ii) large buffers and multiple parallel streams expand the concave regions in addition to improving the throughput; and (iii) stable throughput dynamics, indicated by a smoother Poincare map and smaller Lyapunov exponents, lead to wider concave regions. These measurements and analytical results together enable us to select a TCP variant and its parameters for a given connection to achieve high throughput with statistical guarantees.","PeriodicalId":436194,"journal":{"name":"Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127807903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 18

NICE: Network-Integrated Cluster-Efficient Storage NICE:网络集成集群高效存储

Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing Pub Date : 2017-06-26 DOI: 10.1145/3078597.3078612

S. Al-Kiswany, Suli Yang, A. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau

引用次数: 10