2017 International Conference on High Performance Computing & Simulation (HPCS)最新文献

筛选
英文 中文
Accelerating Matrix Multiplication in Deep Learning by Using Low-Rank Approximation 利用低秩逼近加速深度学习中的矩阵乘法
Kazuki Osawa, Akira Sekiya, Hiroki Naganuma, Rio Yokota
{"title":"Accelerating Matrix Multiplication in Deep Learning by Using Low-Rank Approximation","authors":"Kazuki Osawa, Akira Sekiya, Hiroki Naganuma, Rio Yokota","doi":"10.1109/HPCS.2017.37","DOIUrl":"https://doi.org/10.1109/HPCS.2017.37","url":null,"abstract":"The open source frameworks of deep learning including TensorFlow, Caffe, Torch, etc. are widely used all over the world and its acceleration have great meaning. In these frameworks, a lot of computation time is spent on convolution, and highly tuned libraries such as cuDNN play important role on accelerating convolution. In these libraries, however, a convolution computation is performed without approximating a dense matrices. In this research, we propose a method to introduce the low-rank approximation method, widely used in the field of scientific and technical computation, into the convolution computation. As a result of investigating the influence on the recognition accuracy of the existing model, it is possible to reduce up to about 90% of rank of data matrices while keeping recognition accuracy −2% of baseline.","PeriodicalId":115758,"journal":{"name":"2017 International Conference on High Performance Computing & Simulation (HPCS)","volume":"363 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124562637","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Formalization of a Big Graph API in Coq Coq中一个大图API的形式化
Jolan Philippe, Wadoud Bousdira, F. Loulergue
{"title":"Formalization of a Big Graph API in Coq","authors":"Jolan Philippe, Wadoud Bousdira, F. Loulergue","doi":"10.1109/HPCS.2017.140","DOIUrl":"https://doi.org/10.1109/HPCS.2017.140","url":null,"abstract":"We now live surrounded by sensors, we create information continuously and we leave constantly computer traces of our activities. The processing and analysis of this huge volume data, so called Big Data, offer innumerable and still largely unexplored: health (epidemiology, genomics complex energy networks, intelligent cities, forecasting and management of environmental risks, etc. Big Data has, and will increasingly, a very significant impact at the societal economic and commercial levels. Many interesting Big Data problems can be modeled as problems on graphs/networks.","PeriodicalId":115758,"journal":{"name":"2017 International Conference on High Performance Computing & Simulation (HPCS)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131147718","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Scalable NUMA-Aware Wilson-Dirac on Supercomputers 超级计算机上可扩展的NUMA-Aware Wilson-Dirac
C. Tadonki
{"title":"Scalable NUMA-Aware Wilson-Dirac on Supercomputers","authors":"C. Tadonki","doi":"10.1109/HPCS.2017.56","DOIUrl":"https://doi.org/10.1109/HPCS.2017.56","url":null,"abstract":"We revisit the Wilson-Dirac operator, also referred as Dslash, on NUMA manycore vector machines and thereby seek an efficient supercomputing implementation. Quantum Chro- moDynamics (QCD) is the theory of the strong nuclear force and its discrete formalism is the so-called Lattice Quantum ChromoDynamics (LQCD). Wilson-Dirac is the major computing kernel in LQCD, where a special attention is paid to large scale simulations. The corresponding computing demand is tremendous at various levels from storage to floating-point operations, thus the crucial need for powerful supercomputers. Designing efficient LQCD codes on modern (mostly hybrid) supercomputers requires to efficiently exploit all available levels of parallelism including accelerators. Since Wilson-Dirac is a coarse-grain stencil computation performed on a huge volume of data, any performance and scalability related investigation should skillfully address memory accesses and interprocessor communication overheads. In order to lower the latter, explicit shared memory implementations should be considered at the level of a compute node, since this will lead to a less complex data communication graph and thus (at least intuitively) reduce the overall communication latency. We focus on this aspect and propose a novel efficient NUMA-aware scheduling, together with a combination of the major HPC strategies for large-scale LQCD. We reach nearly optimal performances on a single core and a significant scalability improvement on several NUMA nodes. Then, using a classical domain decomposition approach, we extend our scheduling to a large cluster of many-core nodes, thus illustrating the global efficiency of our hybrid implementation.","PeriodicalId":115758,"journal":{"name":"2017 International Conference on High Performance Computing & Simulation (HPCS)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123850010","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
ICARO-PAPM: Congestion Management with Selective Queue Power-Gating ICARO-PAPM:选择性队列功率门控的拥塞管理
J. V. Escamilla, J. Flich, M. Casu
{"title":"ICARO-PAPM: Congestion Management with Selective Queue Power-Gating","authors":"J. V. Escamilla, J. Flich, M. Casu","doi":"10.1109/HPCS.2017.47","DOIUrl":"https://doi.org/10.1109/HPCS.2017.47","url":null,"abstract":"The growing demand for performance and technology advances drive manufacturers to integrate more and more cores in the same die. However, this increment of interconnected computing elements implies more pressure over the network-on-chip, which might saturate, leading to congestion and, thus, degrading system's performance. To deal with this, ICARO was recently proposed as a congestion control mechanism which identifies congested points and isolates congested traffic in separate queues, removing the HoL-blocking effect, hence, leaving congestion harmless. However, ICARO's additional buffers incur in significant power overhead. In this paper, we propose a new version of ICARO (ICARO-PAPM) which is integrated with a novel path-oriented fine-grained power-gating mechanism (PAPM). PAPM can selectively power on and off paths partially shared by different sources. When driven by ICARO, unused queues for congested traffic can be powered down, thus saving energy. We demonstrate that ICARO-PAPM does not interfere with the original ICARO performance, while it achieves a significant reduction of 35% in power consumption by keeping all additional buffers powered off when no congestion arises on the network, and up to 27% under congested traffic by powering on only those queues needed by the congested traffic.","PeriodicalId":115758,"journal":{"name":"2017 International Conference on High Performance Computing & Simulation (HPCS)","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121573264","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Speedup and Parallelization Models for Energy-Efficient Many-Core Systems Using Performance Counters 基于性能计数器的节能多核系统加速与并行化模型
M. A. N. Al-hayanni, R. Shafik, A. Rafiev, F. Xia, A. Yakovlev
{"title":"Speedup and Parallelization Models for Energy-Efficient Many-Core Systems Using Performance Counters","authors":"M. A. N. Al-hayanni, R. Shafik, A. Rafiev, F. Xia, A. Yakovlev","doi":"10.1109/HPCS.2017.68","DOIUrl":"https://doi.org/10.1109/HPCS.2017.68","url":null,"abstract":"Traditional speedup models, such as Amdahls, facilitate the study of the impact of running parallel workloads on manycore systems. However, these models are typically based on software characteristics, assuming ideal hardware behaviors. As such, the applicability of these models for energy and/or performance-driven system optimization is limited by two factors. Firstly, speedup cannot be measured without instrumenting the original software codes, and secondly, the parallelization factor of an application running on specific hardware is generally unknown. In this paper, we propose a novel method, whereby standard performance counters found in modern many-core platforms can be used to derive speedup without instrumenting applications for time measurements. We postulate that speedup can be accurately estimated as a ratio of instructions per cycle for a parallel manycore system to the instructions per cycle of a single core system. By studying the application instructions and system instructions for the first time, our method leads to the determination of the parallelization factor and the optimal system configuration for energy and/or performance. The method is extensively demonstrated through experiments on three different platforms with core numbers ranging from 4 to 61, running parallel benchmark applications (including synthetic and PARSEC benchmarks) on Linux operating system. Speedup and parallelization estimations using our method and their extensive cross-validations show negligible errors (up to 8%) in these systems. Additionally, we demonstrate the effectiveness of our method to explore parallelization-aware energy-efficient system configurations for many-core systems using energy-delay-product based formulations.","PeriodicalId":115758,"journal":{"name":"2017 International Conference on High Performance Computing & Simulation (HPCS)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132877971","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Evolvable Systems for Big Data Management in Business 商业大数据管理的可进化系统
R. McClatchey, A. Branson, Jetendr Shamdasani, Patrick Emin
{"title":"Evolvable Systems for Big Data Management in Business","authors":"R. McClatchey, A. Branson, Jetendr Shamdasani, Patrick Emin","doi":"10.1109/HPCS.2017.14","DOIUrl":"https://doi.org/10.1109/HPCS.2017.14","url":null,"abstract":"Big Data systems are increasingly having to be longer lasting, enterprise-wide and interoperable with other (legacy or new) systems. Furthermore many organizations operate in an external environment which dictates change at an unforeseeable rate and requires evolution in system requirements. In these cases system development does not have a definitive end point, rather it continues in a mutually constitutive cycle with the organization and its requirements. Also when the period of design is of such duration that the technology may well evolve or when the required technology is not mature at the outset, then the design process becomes considerably more difficult. Not only that but if the system must inter-operate with other systems then the design process becomes considerably more difficult. Ideally in these circumstances the design must also be able to evolve in order to react to changing technologies and requirements and to ensure traceability between the design and the evolving system specification. For interoperability Big Data systems need to be discoverable and to work with information about other systems with which they need to cooperate over time. We have developed software called CRISTAL-ISE that enables dynamic system evolution and interoperability for Big Data systems; it has been commercialised as the Agilium-NG BPM product and is outlined in this paper.","PeriodicalId":115758,"journal":{"name":"2017 International Conference on High Performance Computing & Simulation (HPCS)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129184253","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Case for PARAM Shavak: Ready-to-Use and Affordable Supercomputing Solution PARAM Shavak的案例:即用型和可负担的超级计算解决方案
Sandeep R. Agrawal, Shweta Das, M. Valmiki, Sanjay Wandhekar, R. Moona
{"title":"A Case for PARAM Shavak: Ready-to-Use and Affordable Supercomputing Solution","authors":"Sandeep R. Agrawal, Shweta Das, M. Valmiki, Sanjay Wandhekar, R. Moona","doi":"10.1109/HPCS.2017.66","DOIUrl":"https://doi.org/10.1109/HPCS.2017.66","url":null,"abstract":"High Performance Computing (HPC) Systems are usually large systems which require specialized infrastructure. For a variety of small time users, who need performance of the parallel computing for their applications, such systems are unaffordable and inaccessible for a number of reasons. Even to setup a small state-of-the-art HPC system, such users would require vast efforts and expertise to design system specifications and to identify and install system software, tools and user applications. Also, going through such process would consume time and can be expensive. Clearly, there is a requirement of a small and low-cost ready-to- use HPC system which can be straightway put to utilization by end-users. In this paper, we present a case of a small, affordable and personalized supercomputing solution named PARAM Shavak [8, 9] which offers ready-to-use supercomputing-in-a-box solution based on commercial off-the-shelf HPC hardware resources. This solution is aimed as a support tool for research, design and development — often related to the education or small time designers. The solution is so architected that it provides scalability and power efficiency. We also discuss the uniqueness of our solution compared to several related initiatives which have been around and show its efficacy.","PeriodicalId":115758,"journal":{"name":"2017 International Conference on High Performance Computing & Simulation (HPCS)","volume":"76 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115270991","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Implementation and Performance of a GPU-Based Monte-Carlo Framework for Determining Design Ice Load 基于gpu的设计冰荷载蒙特卡罗框架的实现与性能
Sara Ayubian, Shadi G. Alawneh, M. Richard, Jan Thijssen
{"title":"Implementation and Performance of a GPU-Based Monte-Carlo Framework for Determining Design Ice Load","authors":"Sara Ayubian, Shadi G. Alawneh, M. Richard, Jan Thijssen","doi":"10.1109/HPCS.2017.27","DOIUrl":"https://doi.org/10.1109/HPCS.2017.27","url":null,"abstract":"Modern Graphics Processing Units (GPUs) with massive number of threads and many-core architecture support both graphics and general purpose computing. NVIDIA's compute unified device architecture (CUDA) takes advantage of parallel computing and utilizes the tremendous power of GPUs. The present study demonstrates a high performance computing (HPC) framework for a Monte-Carlo simulation to determine design sea ice loads which is implemented in both GPU and CPU. Results show a speedup of up to 130 times for the 4 Tesla K80 GPUs over an optimized CPU OpenMP implementation and speedup of up to 8 times for the 4 Tesla K80 over a single Tesla K80 GPU implementation. The elapsed time of the different implementations has been reduced from about 2.5 hours to 0.7 seconds.","PeriodicalId":115758,"journal":{"name":"2017 International Conference on High Performance Computing & Simulation (HPCS)","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115654033","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
A Parallel RBF Mesh Deformation Method with Multi-greedy Algorithm in OpenFOAM OpenFOAM中基于多贪婪算法的并行RBF网格变形方法
Chao Li, Wenjing Yang, Jinyu Wang, Xiaoguang Ren, S. Ye, Yufei Lin
{"title":"A Parallel RBF Mesh Deformation Method with Multi-greedy Algorithm in OpenFOAM","authors":"Chao Li, Wenjing Yang, Jinyu Wang, Xiaoguang Ren, S. Ye, Yufei Lin","doi":"10.1109/HPCS.2017.25","DOIUrl":"https://doi.org/10.1109/HPCS.2017.25","url":null,"abstract":"Radial Basis Function(RBF) mesh deformation method has been widely used in CFD simulations with moving boundaries due to its high robustness and accuracy. The original implementation of the RBF mesh deformation method in OpenFOAM(a widely used CFD software) is purely serial with relatively low computational performance. To reduce the time cost of the mesh motion in large-scale simulations, this paper proposes a parallel RBF mesh deformation method with multi-greedy algorithm in OpenFOAM. The proposed multi- greedy method could reduce the control points used by the RBF interpolation on both the moving boundary and the static boundary, which is more applicable than the previous typical greedy algorithm. Based on a master-worker algorithm, the computation of the mesh deformation is highly parallelized. Tests on the benchmark of a three-dimensional moving fish show that with an error tolerance of 1e-4, the interpolation time of the internal mesh motion using our multi-greedy method is about 10.2 times faster than the original one, and with a parallelism of 132, the time cost of the whole mesh motion is greatly reduced with a speedup of 37.","PeriodicalId":115758,"journal":{"name":"2017 International Conference on High Performance Computing & Simulation (HPCS)","volume":"148 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127153324","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Performance Evaluation of an Automatic Web Services Composition System 自动Web服务组合系统的性能评估
A. Pinto, O. Carpinteiro, B. Batista, Dionisio Machado Leite Filho, M. Peixoto, B. Kuehne
{"title":"A Performance Evaluation of an Automatic Web Services Composition System","authors":"A. Pinto, O. Carpinteiro, B. Batista, Dionisio Machado Leite Filho, M. Peixoto, B. Kuehne","doi":"10.1109/HPCS.2017.127","DOIUrl":"https://doi.org/10.1109/HPCS.2017.127","url":null,"abstract":"The automatic composition of Web Services has been explored in the literature from different standpoints. It aims to create an execution plan for the flow of Web Services based on requests made and sent by the client, by following the stages necessary for the generation of composite services and then carrying out the execution of the workflow that has been designed. However, no research studies have been found that undertake the whole process of an automatic composition and execution, from the user's request to the execution of the services chosen as a solution. Therefore, the goal of this paper is to evaluate the performance of an automatic Web Service composition, since the request made by the client, to the delivery of the results of the executed composition. This article examines the integration between two tools, the automatic web service composition system and the extensible platform to evaluate semantic web services with the aim of conducting a performance evaluation of an entire process of automatic composition of Web services.","PeriodicalId":115758,"journal":{"name":"2017 International Conference on High Performance Computing & Simulation (HPCS)","volume":"142 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116542518","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信