2017 International Conference on High Performance Computing & Simulation (HPCS)最新文献_第2页

Accelerating Matrix Multiplication in Deep Learning by Using Low-Rank Approximation 利用低秩逼近加速深度学习中的矩阵乘法

2017 International Conference on High Performance Computing & Simulation (HPCS) Pub Date : 2017-07-17 DOI: 10.1109/HPCS.2017.37

Kazuki Osawa, Akira Sekiya, Hiroki Naganuma, Rio Yokota

引用次数: 12

Formalization of a Big Graph API in Coq Coq中一个大图API的形式化

2017 International Conference on High Performance Computing & Simulation (HPCS) Pub Date : 2017-07-17 DOI: 10.1109/HPCS.2017.140

Jolan Philippe, Wadoud Bousdira, F. Loulergue

引用次数: 0

Scalable NUMA-Aware Wilson-Dirac on Supercomputers 超级计算机上可扩展的NUMA-Aware Wilson-Dirac

2017 International Conference on High Performance Computing & Simulation (HPCS) Pub Date : 2017-07-17 DOI: 10.1109/HPCS.2017.56

C. Tadonki

{"title":"Scalable NUMA-Aware Wilson-Dirac on Supercomputers","authors":"C. Tadonki","doi":"10.1109/HPCS.2017.56","DOIUrl":"https://doi.org/10.1109/HPCS.2017.56","url":null,"abstract":"We revisit the Wilson-Dirac operator, also referred as Dslash, on NUMA manycore vector machines and thereby seek an efficient supercomputing implementation. Quantum Chro- moDynamics (QCD) is the theory of the strong nuclear force and its discrete formalism is the so-called Lattice Quantum ChromoDynamics (LQCD). Wilson-Dirac is the major computing kernel in LQCD, where a special attention is paid to large scale simulations. The corresponding computing demand is tremendous at various levels from storage to floating-point operations, thus the crucial need for powerful supercomputers. Designing efficient LQCD codes on modern (mostly hybrid) supercomputers requires to efficiently exploit all available levels of parallelism including accelerators. Since Wilson-Dirac is a coarse-grain stencil computation performed on a huge volume of data, any performance and scalability related investigation should skillfully address memory accesses and interprocessor communication overheads. In order to lower the latter, explicit shared memory implementations should be considered at the level of a compute node, since this will lead to a less complex data communication graph and thus (at least intuitively) reduce the overall communication latency. We focus on this aspect and propose a novel efficient NUMA-aware scheduling, together with a combination of the major HPC strategies for large-scale LQCD. We reach nearly optimal performances on a single core and a significant scalability improvement on several NUMA nodes. Then, using a classical domain decomposition approach, we extend our scheduling to a large cluster of many-core nodes, thus illustrating the global efficiency of our hybrid implementation.","PeriodicalId":115758,"journal":{"name":"2017 International Conference on High Performance Computing & Simulation (HPCS)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123850010","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

ICARO-PAPM: Congestion Management with Selective Queue Power-Gating ICARO-PAPM:选择性队列功率门控的拥塞管理

2017 International Conference on High Performance Computing & Simulation (HPCS) Pub Date : 2017-07-17 DOI: 10.1109/HPCS.2017.47

J. V. Escamilla, J. Flich, M. Casu

{"title":"ICARO-PAPM: Congestion Management with Selective Queue Power-Gating","authors":"J. V. Escamilla, J. Flich, M. Casu","doi":"10.1109/HPCS.2017.47","DOIUrl":"https://doi.org/10.1109/HPCS.2017.47","url":null,"abstract":"The growing demand for performance and technology advances drive manufacturers to integrate more and more cores in the same die. However, this increment of interconnected computing elements implies more pressure over the network-on-chip, which might saturate, leading to congestion and, thus, degrading system's performance. To deal with this, ICARO was recently proposed as a congestion control mechanism which identifies congested points and isolates congested traffic in separate queues, removing the HoL-blocking effect, hence, leaving congestion harmless. However, ICARO's additional buffers incur in significant power overhead. In this paper, we propose a new version of ICARO (ICARO-PAPM) which is integrated with a novel path-oriented fine-grained power-gating mechanism (PAPM). PAPM can selectively power on and off paths partially shared by different sources. When driven by ICARO, unused queues for congested traffic can be powered down, thus saving energy. We demonstrate that ICARO-PAPM does not interfere with the original ICARO performance, while it achieves a significant reduction of 35% in power consumption by keeping all additional buffers powered off when no congestion arises on the network, and up to 27% under congested traffic by powering on only those queues needed by the congested traffic.","PeriodicalId":115758,"journal":{"name":"2017 International Conference on High Performance Computing & Simulation (HPCS)","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121573264","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Speedup and Parallelization Models for Energy-Efficient Many-Core Systems Using Performance Counters 基于性能计数器的节能多核系统加速与并行化模型

2017 International Conference on High Performance Computing & Simulation (HPCS) Pub Date : 2017-07-17 DOI: 10.1109/HPCS.2017.68

M. A. N. Al-hayanni, R. Shafik, A. Rafiev, F. Xia, A. Yakovlev

{"title":"Speedup and Parallelization Models for Energy-Efficient Many-Core Systems Using Performance Counters","authors":"M. A. N. Al-hayanni, R. Shafik, A. Rafiev, F. Xia, A. Yakovlev","doi":"10.1109/HPCS.2017.68","DOIUrl":"https://doi.org/10.1109/HPCS.2017.68","url":null,"abstract":"Traditional speedup models, such as Amdahls, facilitate the study of the impact of running parallel workloads on manycore systems. However, these models are typically based on software characteristics, assuming ideal hardware behaviors. As such, the applicability of these models for energy and/or performance-driven system optimization is limited by two factors. Firstly, speedup cannot be measured without instrumenting the original software codes, and secondly, the parallelization factor of an application running on specific hardware is generally unknown. In this paper, we propose a novel method, whereby standard performance counters found in modern many-core platforms can be used to derive speedup without instrumenting applications for time measurements. We postulate that speedup can be accurately estimated as a ratio of instructions per cycle for a parallel manycore system to the instructions per cycle of a single core system. By studying the application instructions and system instructions for the first time, our method leads to the determination of the parallelization factor and the optimal system configuration for energy and/or performance. The method is extensively demonstrated through experiments on three different platforms with core numbers ranging from 4 to 61, running parallel benchmark applications (including synthetic and PARSEC benchmarks) on Linux operating system. Speedup and parallelization estimations using our method and their extensive cross-validations show negligible errors (up to 8%) in these systems. Additionally, we demonstrate the effectiveness of our method to explore parallelization-aware energy-efficient system configurations for many-core systems using energy-delay-product based formulations.","PeriodicalId":115758,"journal":{"name":"2017 International Conference on High Performance Computing & Simulation (HPCS)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132877971","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Evolvable Systems for Big Data Management in Business 商业大数据管理的可进化系统

2017 International Conference on High Performance Computing & Simulation (HPCS) Pub Date : 2017-07-17 DOI: 10.1109/HPCS.2017.14

R. McClatchey, A. Branson, Jetendr Shamdasani, Patrick Emin

{"title":"Evolvable Systems for Big Data Management in Business","authors":"R. McClatchey, A. Branson, Jetendr Shamdasani, Patrick Emin","doi":"10.1109/HPCS.2017.14","DOIUrl":"https://doi.org/10.1109/HPCS.2017.14","url":null,"abstract":"Big Data systems are increasingly having to be longer lasting, enterprise-wide and interoperable with other (legacy or new) systems. Furthermore many organizations operate in an external environment which dictates change at an unforeseeable rate and requires evolution in system requirements. In these cases system development does not have a definitive end point, rather it continues in a mutually constitutive cycle with the organization and its requirements. Also when the period of design is of such duration that the technology may well evolve or when the required technology is not mature at the outset, then the design process becomes considerably more difficult. Not only that but if the system must inter-operate with other systems then the design process becomes considerably more difficult. Ideally in these circumstances the design must also be able to evolve in order to react to changing technologies and requirements and to ensure traceability between the design and the evolving system specification. For interoperability Big Data systems need to be discoverable and to work with information about other systems with which they need to cooperate over time. We have developed software called CRISTAL-ISE that enables dynamic system evolution and interoperability for Big Data systems; it has been commercialised as the Agilium-NG BPM product and is outlined in this paper.","PeriodicalId":115758,"journal":{"name":"2017 International Conference on High Performance Computing & Simulation (HPCS)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129184253","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

A Case for PARAM Shavak: Ready-to-Use and Affordable Supercomputing Solution PARAM Shavak的案例:即用型和可负担的超级计算解决方案

2017 International Conference on High Performance Computing & Simulation (HPCS) Pub Date : 2017-07-01 DOI: 10.1109/HPCS.2017.66

Sandeep R. Agrawal, Shweta Das, M. Valmiki, Sanjay Wandhekar, R. Moona

{"title":"A Case for PARAM Shavak: Ready-to-Use and Affordable Supercomputing Solution","authors":"Sandeep R. Agrawal, Shweta Das, M. Valmiki, Sanjay Wandhekar, R. Moona","doi":"10.1109/HPCS.2017.66","DOIUrl":"https://doi.org/10.1109/HPCS.2017.66","url":null,"abstract":"High Performance Computing (HPC) Systems are usually large systems which require specialized infrastructure. For a variety of small time users, who need performance of the parallel computing for their applications, such systems are unaffordable and inaccessible for a number of reasons. Even to setup a small state-of-the-art HPC system, such users would require vast efforts and expertise to design system specifications and to identify and install system software, tools and user applications. Also, going through such process would consume time and can be expensive. Clearly, there is a requirement of a small and low-cost ready-to- use HPC system which can be straightway put to utilization by end-users. In this paper, we present a case of a small, affordable and personalized supercomputing solution named PARAM Shavak [8, 9] which offers ready-to-use supercomputing-in-a-box solution based on commercial off-the-shelf HPC hardware resources. This solution is aimed as a support tool for research, design and development — often related to the education or small time designers. The solution is so architected that it provides scalability and power efficiency. We also discuss the uniqueness of our solution compared to several related initiatives which have been around and show its efficacy.","PeriodicalId":115758,"journal":{"name":"2017 International Conference on High Performance Computing & Simulation (HPCS)","volume":"76 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115270991","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Implementation and Performance of a GPU-Based Monte-Carlo Framework for Determining Design Ice Load 基于gpu的设计冰荷载蒙特卡罗框架的实现与性能

2017 International Conference on High Performance Computing & Simulation (HPCS) Pub Date : 2017-07-01 DOI: 10.1109/HPCS.2017.27

Sara Ayubian, Shadi G. Alawneh, M. Richard, Jan Thijssen

引用次数: 6

A Parallel RBF Mesh Deformation Method with Multi-greedy Algorithm in OpenFOAM OpenFOAM中基于多贪婪算法的并行RBF网格变形方法

2017 International Conference on High Performance Computing & Simulation (HPCS) Pub Date : 2017-07-01 DOI: 10.1109/HPCS.2017.25

Chao Li, Wenjing Yang, Jinyu Wang, Xiaoguang Ren, S. Ye, Yufei Lin

{"title":"A Parallel RBF Mesh Deformation Method with Multi-greedy Algorithm in OpenFOAM","authors":"Chao Li, Wenjing Yang, Jinyu Wang, Xiaoguang Ren, S. Ye, Yufei Lin","doi":"10.1109/HPCS.2017.25","DOIUrl":"https://doi.org/10.1109/HPCS.2017.25","url":null,"abstract":"Radial Basis Function(RBF) mesh deformation method has been widely used in CFD simulations with moving boundaries due to its high robustness and accuracy. The original implementation of the RBF mesh deformation method in OpenFOAM(a widely used CFD software) is purely serial with relatively low computational performance. To reduce the time cost of the mesh motion in large-scale simulations, this paper proposes a parallel RBF mesh deformation method with multi-greedy algorithm in OpenFOAM. The proposed multi- greedy method could reduce the control points used by the RBF interpolation on both the moving boundary and the static boundary, which is more applicable than the previous typical greedy algorithm. Based on a master-worker algorithm, the computation of the mesh deformation is highly parallelized. Tests on the benchmark of a three-dimensional moving fish show that with an error tolerance of 1e-4, the interpolation time of the internal mesh motion using our multi-greedy method is about 10.2 times faster than the original one, and with a parallelism of 132, the time cost of the whole mesh motion is greatly reduced with a speedup of 37.","PeriodicalId":115758,"journal":{"name":"2017 International Conference on High Performance Computing & Simulation (HPCS)","volume":"148 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127153324","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

A Performance Evaluation of an Automatic Web Services Composition System 自动Web服务组合系统的性能评估

2017 International Conference on High Performance Computing & Simulation (HPCS) Pub Date : 2017-07-01 DOI: 10.1109/HPCS.2017.127

A. Pinto, O. Carpinteiro, B. Batista, Dionisio Machado Leite Filho, M. Peixoto, B. Kuehne

引用次数: 2