2019 International Conference on High Performance Computing & Simulation (HPCS)最新文献

A Performance Analysis of Vector Length Agnostic Code 向量长度不可知码的性能分析

2019 International Conference on High Performance Computing & Simulation (HPCS) Pub Date : 2019-07-01 DOI: 10.1109/HPCS48598.2019.9188238

Angela Pohl, Mirko Greese, Biagio Cosenza, B. Juurlink

引用次数: 8

High Performance and Scalable Simulations of a Bio-inspired Computational Model 生物启发计算模型的高性能和可扩展模拟

2019 International Conference on High Performance Computing & Simulation (HPCS) Pub Date : 2019-07-01 DOI: 10.1109/HPCS48598.2019.9188187

Sandra Gómez Canaval, V. Mitrana, M. Păun, Stanislav Vararuk

{"title":"High Performance and Scalable Simulations of a Bio-inspired Computational Model","authors":"Sandra Gómez Canaval, V. Mitrana, M. Păun, Stanislav Vararuk","doi":"10.1109/HPCS48598.2019.9188187","DOIUrl":"https://doi.org/10.1109/HPCS48598.2019.9188187","url":null,"abstract":"The Network of Polarized Evolutionary Processors (NPEP) is a rather new variant of the bio-inspired computing model called Network of Evolutionary Processors (NEP). This model, together with its variants, is able to provide theoretical feasible solutions to hard computational problems. NPEPE is a software engine able to simulate NPEP which is deployed over Giraph, an ultra-scalable platform based on the Bulk Synchronous Parallel (BSP) programming model. Rather surprisingly, the BSP model and the underlying architecture of NPEP have many common points. Moreover, these similarities are also shared with all variants in the NEP family. We take advantage of these similarities and propose an extension of NPEPE (named gNEP) that enhances it to simulate any variant of the NEP’s family. Our extended gNEP framework, presents a twofold contribution. Firstly, a flexible architecture able to extend software components in order to include other NEP models (including the seminal NEP model and new ones). Secondly, a component able to translate input configuration files representing the instance of a problem and an algorithm based on different variants of the NEP model into some suitable input files for gNEP framework. In this work, we simulate a solution to the “3-colorability” problem which is based on NPEP. We compare the results for a specific experiment using NPEPE engine and gNEP. Moreover, we show several experiments in the aim of studying, in a preliminary way, the scalability offered by gNEP to easily deploy and execute instances of problems requiring more intensive computations.","PeriodicalId":371856,"journal":{"name":"2019 International Conference on High Performance Computing & Simulation (HPCS)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126134616","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Node-Level optimization of a 3D Block-Based Multiresolution Compressible Flow Solver with Emphasis on Performance Portability 基于3D块的多分辨率可压缩流求解器的节点级优化，重点是性能可移植性

2019 International Conference on High Performance Computing & Simulation (HPCS) Pub Date : 2019-07-01 DOI: 10.1109/HPCS48598.2019.9188088

N. Hoppe, S. Adami, N. Adams, I. Pasichnyk, M. Allalen

{"title":"Node-Level optimization of a 3D Block-Based Multiresolution Compressible Flow Solver with Emphasis on Performance Portability","authors":"N. Hoppe, S. Adami, N. Adams, I. Pasichnyk, M. Allalen","doi":"10.1109/HPCS48598.2019.9188088","DOIUrl":"https://doi.org/10.1109/HPCS48598.2019.9188088","url":null,"abstract":"Despite the enormous increase in computational power in the last decades, the numerical study of complex flows remains challenging. State-of-the-art techniques to simulate hyperbolic flows with discontinuities rely on computationally demanding nonlinear schemes, such as Riemann solvers with weighted essentially non-oscillatory (WENO) stencils and characteristic decompositioning. To handle this complexity the numerical load can be reduced via a multiresolution (MR) algorithm with local time stepping (LTS) running on modern high-performance computing (HPC) systems. Eventually, the main challenge lies in an efficitent utilization of the available HPC hardware. In this work, we evaluate the performance improvement for a Message Passing Interface (MPI)-parallelized MR solver using single instruction multiple data (SIMD) optimizations. We present straight-forward code modifications that allow for auto-vectorization by the compiler, while maintaining the modularity of the code at comparable performance. We demonstrate performance improvements for representative Euler flow examples on both Intel Haswell and Intel Knights Landing Xeon Phi microarchitecture (KNL) clusters. The tests show single-core speedups of 1.7 (1.9) and average speedups of 1.4 (1.6) for the Haswell (KNL).","PeriodicalId":371856,"journal":{"name":"2019 International Conference on High Performance Computing & Simulation (HPCS)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123433761","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

An Incremental Parallel PGAS-based Tree Search Algorithm 一种基于增量并行pgas的树搜索算法

2019 International Conference on High Performance Computing & Simulation (HPCS) Pub Date : 2019-07-01 DOI: 10.1109/HPCS48598.2019.9188106

T. Carneiro, N. Melab

{"title":"An Incremental Parallel PGAS-based Tree Search Algorithm","authors":"T. Carneiro, N. Melab","doi":"10.1109/HPCS48598.2019.9188106","DOIUrl":"https://doi.org/10.1109/HPCS48598.2019.9188106","url":null,"abstract":"In this work, we show that the Chapel highproductivity language is suitable for the design and implementation of all aspects involved in the conception of parallel tree search algorithms for solving combinatorial problems. Initially, it is possible to hand-optimize the data structures involved in the search process in a way equivalent to C. As a consequence, the single-threaded search in Chapel is on average only 7% slower than its counterpart written in C. Whereas programming a multicore tree search in Chapel is equivalent to C-OpenMP in terms of performance and programmability, its productivityaware features for distributed programming stand out. It is possible to incrementally conceive a distributed tree search algorithm starting from its multicore counterpart by adding few lines of code. The distributed implementation performs load balancing among different computer nodes and also exploits all CPU cores of the system. Chapel presents an interesting tradeoff between programmability and performance despite the high level of its features. The distributed tree search in Chapel is on average 16% slower and reaches up to 80% of the scalability achieved by its C-MPI + OpenMP counterpart.","PeriodicalId":371856,"journal":{"name":"2019 International Conference on High Performance Computing & Simulation (HPCS)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123691853","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

rDLB: A Novel Approach for Robust Dynamic Load Balancing of Scientific Applications with Independent Tasks rDLB:具有独立任务的科学应用鲁棒动态负载平衡的新方法

2019 International Conference on High Performance Computing & Simulation (HPCS) Pub Date : 2019-07-01 DOI: 10.1109/HPCS48598.2019.9188153

Ali Mohammed, Aurélien Cavelan, F. Ciorba

{"title":"rDLB: A Novel Approach for Robust Dynamic Load Balancing of Scientific Applications with Independent Tasks","authors":"Ali Mohammed, Aurélien Cavelan, F. Ciorba","doi":"10.1109/HPCS48598.2019.9188153","DOIUrl":"https://doi.org/10.1109/HPCS48598.2019.9188153","url":null,"abstract":"Parallel scientific applications that execute on high performance computing (HPC) systems often contain large and computationally-intensive parallel loops. The independent loop iterations of such applications represent independent tasks. Dynamic toad balancing (DLB) is used to achieve a balanced execution of such applications. However, most of the self-scheduling-based techniques that are typically used to achieve DLB are not robust against component (e.g., processors, network) failures or perturbations that arise on large HPC systems. The self-scheduling-based techniques that tolerate failures and/or perturbations rely on the existence of fault-and/or perturbation-detection mechanisms to trigger the rescheduling of tasks scheduled onto failed and/or perturbed components. This work proposes a novel robust dynamic load balancing (rDLB) approach for the robust self-scheduling of scientific applications with independent tasks on HPC systems under failures and/or perturbations. rDLB proactively reschedules already allocated tasks and requires no detection of failures or perturbations. Moreover, rDLB is integrated into an MPI-based DLB library. An analytical modeling of rDLB shows that for a fixed problem size, the fault-tolerance overhead linearly decreases with the number of processors. The experimental evaluation shows that applications using rDLB tolerate up to P-l worker processor failures (P-is the number of processors allocated to the application) and that their performance in the presence of perturbations improved by a factor of 7 compared to the case without rDLB. Moreover, the robustness of applications against perturbations (i.e., flexibility) is boosted by a factor of 30 using rDLB compared to the case without rDLB.","PeriodicalId":371856,"journal":{"name":"2019 International Conference on High Performance Computing & Simulation (HPCS)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123728598","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

optimizations in CuSNP Simulator for Spiking Neural P Systems on CUDA GPUs CUDA gpu上峰值神经P系统的CuSNP模拟器优化

2019 International Conference on High Performance Computing & Simulation (HPCS) Pub Date : 2019-07-01 DOI: 10.1109/HPCS48598.2019.9188174

Blaine Corwyn D. Aboy, Edward James A. Bariring, J. P. Carandang, F. Cabarle, R. T. Cruz, H. Adorna, Miguel A. Martínez-del-Amor

引用次数: 6

The impact of the AC922 Architecture on Performance of Deep Neural Network Training AC922架构对深度神经网络训练性能的影响

2019 International Conference on High Performance Computing & Simulation (HPCS) Pub Date : 2019-07-01 DOI: 10.1109/HPCS48598.2019.9188164

P. Rosciszewski, Michał Iwański, P. Czarnul

{"title":"The impact of the AC922 Architecture on Performance of Deep Neural Network Training","authors":"P. Rosciszewski, Michał Iwański, P. Czarnul","doi":"10.1109/HPCS48598.2019.9188164","DOIUrl":"https://doi.org/10.1109/HPCS48598.2019.9188164","url":null,"abstract":"Practical deep learning applications require more and more computing power. New computing architectures emerge, specifically designed for the artificial intelligence applications, including the IBM Power System AC922. In this paper we confront an AC922 (8335-GTG) server equipped with 4 NVIDIA Volta V100 GPUs with selected deep neural network training applications, including four convolutional and one recurrent model. We report performance results depending on batch sizes and GPU selection and compare them with the results from another contemporary workstation based on the same set of GPUs – NVIDIA® DGX Station™. The results show that the AC922 performs better in all tested configurations, achieving improvements up to 10.3%. Profiling indicates that the improvement is due to the efficient I/O pipeline. The performance differences depend on the specific model, rather than on the model class (RNN/CNN). Both systems offer good scalability up to 4 GPUs. In certain cases there is a significant difference in performance depending on exactly which GPUs are used for computations.","PeriodicalId":371856,"journal":{"name":"2019 International Conference on High Performance Computing & Simulation (HPCS)","volume":"61 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122387718","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Queue Waiting Time Prediction for Large-scale High-performance Computing System 大规模高性能计算系统的队列等待时间预测

2019 International Conference on High Performance Computing & Simulation (HPCS) Pub Date : 2019-07-01 DOI: 10.1109/HPCS48598.2019.9188119

Ju-Won Park

{"title":"Queue Waiting Time Prediction for Large-scale High-performance Computing System","authors":"Ju-Won Park","doi":"10.1109/HPCS48598.2019.9188119","DOIUrl":"https://doi.org/10.1109/HPCS48598.2019.9188119","url":null,"abstract":"Traditionally, high-performance computing (HPC) systems have been extensively utilized in many science fields including big data analysis and machine learning. Such large-scale HPC resources typically use the queue management systems which prefer space-sharing method to allocate resources. In space-sharing method, it is natural that a queue waiting time occurs until the resources are available if resources are not sufficient. When the predicted information on such a waiting time is provided, it is possible to improve the performance of scheduler. For this, in this paper, we propose a prediction method of queue waiting time based on the job log file created in the HPC system actually under operation. The prediction technique presented in this paper largely comprises three phases. The first phase is a pre-processing of data in constant time intervals. In the second phase, major features are selected through a factor analysis and clustering is conducted based on the selected features. In the third phase, a waiting time of the next job is predicted using the sliding window method based on the jobs that were clustered.","PeriodicalId":371856,"journal":{"name":"2019 International Conference on High Performance Computing & Simulation (HPCS)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114132193","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Comparing Neuromorphic Systems by Solving Sudoku Problems 通过解决数独问题比较神经形态系统

2019 International Conference on High Performance Computing & Simulation (HPCS) Pub Date : 2019-07-01 DOI: 10.1109/HPCS48598.2019.9188207

Christoph Ostrau, Christian Klarhorst, Michael Thies, U. Rückert

引用次数: 6

DP2: A Highly Parallel Range Join for Genome Analysis on Distributed Computing Platform 分布式计算平台上基因组分析的高度并行范围连接DP2

2019 International Conference on High Performance Computing & Simulation (HPCS) Pub Date : 2019-07-01 DOI: 10.1109/HPCS48598.2019.9188222

Aman Sinha, B. Lai

{"title":"DP2: A Highly Parallel Range Join for Genome Analysis on Distributed Computing Platform","authors":"Aman Sinha, B. Lai","doi":"10.1109/HPCS48598.2019.9188222","DOIUrl":"https://doi.org/10.1109/HPCS48598.2019.9188222","url":null,"abstract":"Rapid growth of the sheer amount of genome data and intense computation become great challenges for downstream genome analytics. Efficient parallel processing and distributed computing are the two effective schemes to address the analysis of big data. Range join is a widely used, effective, yet time-consuming operation that finds the overlap between two different sets of genome features. The current widely adopted BEDTools [6] pipeline adopts single-node binary tree approach, while the distributed GenAp scheme fails to exploit the massive parallel computation on modern throughput processors, such as GPU (Graphic Processing Unit). This paper proposes a novel Distributed Parallel P-ary search (DP2) that applies novel P-ary analysis to enable high parallelism at algorithmic level, and extensively utilize multiple GPUs at system and architecture level. Efficient computation allocation is implemented to leverage the distributed computing on clusters. The proposed framework can be well integrated with current BEDTools [6] pipeline, and achieves an average of 25x speedup for the actual range-join operation when compared with Binary tree approach of GenAp and a 13x end-to-end (total execution time) speedup in comparison to ADAM.","PeriodicalId":371856,"journal":{"name":"2019 International Conference on High Performance Computing & Simulation (HPCS)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124650044","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0