2017 International Conference on High Performance Computing & Simulation (HPCS)最新文献_第7页

MERCATOR: A GPGPU Framework for Irregular Streaming Applications 不规则流应用的GPGPU框架

2017 International Conference on High Performance Computing & Simulation (HPCS) Pub Date : 2017-07-01 DOI: 10.1109/HPCS.2017.111

Stephen V. Cole, J. Buhler

{"title":"MERCATOR: A GPGPU Framework for Irregular Streaming Applications","authors":"Stephen V. Cole, J. Buhler","doi":"10.1109/HPCS.2017.111","DOIUrl":"https://doi.org/10.1109/HPCS.2017.111","url":null,"abstract":"GPUs have a natural affinity for streaming applications exhibiting consistent, predictable dataflow. However, many high-impact irregular streaming applications, including sequence pattern matching, decision-tree and decision-cascade evaluation, and large-scale graph processing, exhibit unpredictable dataflow due to data-dependent filtering or expansion of the data stream. Existing GPU frameworks do not support arbitrary irregular streaming dataflow tasks, and developing application-specific optimized implementations for such tasks requires expert GPU knowledge. We introduce MERCATOR, a lightweight framework supporting modular CUDA streaming application development for irregular applications. A developer can use MERCATOR to decompose an irregular application for the GPU without explicitly remapping work to threads at runtime. MERCATOR applications are efficiently parallelized on the GPU through a combination of replication across blocks and queueing between nodes to accommodate irregularity. We quantify the performance impact of MERCATOR's support for irregularity and illustrate its utility by implementing a biological sequence comparison pipeline similar to the well-known NCBI BLASTN algorithm. MERCATOR code is available by request to the first author.","PeriodicalId":115758,"journal":{"name":"2017 International Conference on High Performance Computing & Simulation (HPCS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134341087","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 12

PHAST Library — Enabling Single-Source and High Performance Code for GPUs and Multi-cores PHAST库-为gpu和多核启用单源和高性能代码

2017 International Conference on High Performance Computing & Simulation (HPCS) Pub Date : 2017-07-01 DOI: 10.1109/HPCS.2017.109

Biagio Peccerillo, S. Bartolini

{"title":"PHAST Library — Enabling Single-Source and High Performance Code for GPUs and Multi-cores","authors":"Biagio Peccerillo, S. Bartolini","doi":"10.1109/HPCS.2017.109","DOIUrl":"https://doi.org/10.1109/HPCS.2017.109","url":null,"abstract":"The simulation of parallel heterogeneous architectures such as multi-cores and GPUs sets new challenges in the programming language/framework domain. Applications for simulators need to be expressed in a way that can be easily adapted for the specific architectures, effectively tuned for on each of them while preventing from introducing biases due to non-uniform hand-made optimizations. The most common heterogeneous programming frameworks are too low-level, so we propose PHAST, a high-level heterogeneous C++ library targetable on multi-cores and Nvidia GPUs. It permits to write code at a high level of abstraction, to reach good performance while allowing for fine parameter tuning and not shielding code from low-level optimizations. We evaluate PHAST in the case of DCT8x8 on both supported architectures. On multi-cores, we found that PHAST implementation is around ten times faster than OpenCL (AMD vendor) implementation, but up to about 4x slower than OpenCL (Intel vendor) one, which effectively leverages auto-vectorization. On Nvidia GPUs, PHAST code performs up to 55.14% better than CUDA SDK reference version.","PeriodicalId":115758,"journal":{"name":"2017 International Conference on High Performance Computing & Simulation (HPCS)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131997798","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Large-Scale Memory of Sequences Using Binary Sparse Neural Networks on GPU 基于GPU的二值稀疏神经网络大规模序列存储

2017 International Conference on High Performance Computing & Simulation (HPCS) Pub Date : 2017-07-01 DOI: 10.1109/HPCS.2017.88

M. R. S. Marques, G. B. Hacene, C. Lassance, Pierre-Henri Horrein

引用次数: 1

Understanding the Performances of Sparse Compression Formats Using Data Parallel Programming Model 利用数据并行编程模型理解稀疏压缩格式的性能

2017 International Conference on High Performance Computing & Simulation (HPCS) Pub Date : 2017-07-01 DOI: 10.1109/HPCS.2017.103

Ichrak Mehrez, O. Hamdi-Larbi, T. Dufaud, N. Emad

引用次数: 3

Analyzing Performance of Multi-cores and Applications with Cache-aware Roofline Model 基于缓存感知的rooline模型的多核及应用性能分析

2017 International Conference on High Performance Computing & Simulation (HPCS) Pub Date : 2017-07-01 DOI: 10.1109/HPCS.2017.158

Diogo Marques, Helder Duarte, L. Sousa, A. Ilic

引用次数: 0

High Performance Recursive Matrix Inversion for Multicore Architectures 多核架构的高性能递归矩阵反演

2017 International Conference on High Performance Computing & Simulation (HPCS) Pub Date : 2017-07-01 DOI: 10.1109/HPCS.2017.104

R. Mahfoudhi, Sami Achour, O. Hamdi-Larbi, Z. Mahjoub

{"title":"High Performance Recursive Matrix Inversion for Multicore Architectures","authors":"R. Mahfoudhi, Sami Achour, O. Hamdi-Larbi, Z. Mahjoub","doi":"10.1109/HPCS.2017.104","DOIUrl":"https://doi.org/10.1109/HPCS.2017.104","url":null,"abstract":"There are several approaches for computing the inverse of a dense square matrix, say A, namely Gaussian elimination, block wise inversion, and LU factorization (LUF). The latter is used in mathematical software libraries such as SCALAPACK, PBLAS and MATLAB. The inversion routine in SCALAPACK library (called PDGETRI) consists, once the two factors L and U are known (where ALU), in first inverting U (PDGETRF) then solving a triangular matrix system giving A−1. A symmetric way consists in first inverting L, then solving a matrix system giving A−1. Alternatively, one could compute the inverses of both U and L, then their product and get A−1. On the other hand, the Strassen fast matrix inversion algorithm is known as an efficient alternative for solving our problem. We propose in this paper a series of different versions for parallel dense matrix inversion based on the 'Divide and Conquer' paradigm. A theoretical performance study permits to establish an accurate comparison between the designed algorithms. We achieved a series of experiments that permit to validate the contribution and lead to efficient performances obtained for large matrix sizes i.e. up to 40% faster than SCALAPACK.","PeriodicalId":115758,"journal":{"name":"2017 International Conference on High Performance Computing & Simulation (HPCS)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126197791","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Cryptanalysis on GPUs with the Cube Attack: Design, Optimization and Performances Gains 基于立方体攻击的gpu密码分析:设计、优化和性能提升

2017 International Conference on High Performance Computing & Simulation (HPCS) Pub Date : 2017-07-01 DOI: 10.1109/HPCS.2017.114

Marco Cianfriglia, Stefano Guarino

{"title":"Cryptanalysis on GPUs with the Cube Attack: Design, Optimization and Performances Gains","authors":"Marco Cianfriglia, Stefano Guarino","doi":"10.1109/HPCS.2017.114","DOIUrl":"https://doi.org/10.1109/HPCS.2017.114","url":null,"abstract":"The cube attack is a flexible cryptanalysis technique, with a simple and fascinating theoretical implant. It combines offline exhaustive searches over selected tweakable public/IV bits (the sides of the “cube“), with an online key-recovery phase. Although virtually applicable to any cipher, and generally praised by the research community, the real potential of the attack is still in question, and no implementation so far succeeded in breaking a real-world strong cipher. In this paper, we present, validate and analyze the first thorough implementation of the cube attack on a GPU cluster. The framework is conceived so as to be usable out-of-the-box for any cipher featuring up to 128-bit key and IV, and easily adaptable to larger key/IV, at just the cost of some fine (performance) tuning, mostly related to memory allocation. As a test case, we consider previous state-of-the-art results against a reduced-round version of a well-known cipher (Trivium). We evaluate the computational speedup with respect to a CPU-parallel benchmark, the performance dependence on system parameters and GPU architectures (Nvidia Kepler vs Nvidia Pascal), and the scalability of our solution on multi-GPU systems. All design choices are carefully described, and their respective advantages and drawbacks are discussed. By exhibiting the benefits of a complete GPU-tailored implementation of the cube attack, we provide novel and strong elements in support of the general feasibility of the attack, thus paving the way for future work in the area.","PeriodicalId":115758,"journal":{"name":"2017 International Conference on High Performance Computing & Simulation (HPCS)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127908663","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

DGS-SMS Compact Fifth Order Low Pass Filter sms紧凑型五阶低通滤波器

2017 International Conference on High Performance Computing & Simulation (HPCS) Pub Date : 2017-07-01 DOI: 10.1109/HPCS.2017.49

Heba El-Halabi, H. Issa, D. Kaddour, E. Pistono, P. Ferrari

引用次数: 0

Information Retrieval Based on Description Logic: Application to Biomedical Documents 基于描述逻辑的信息检索:在生物医学文献中的应用

2017 International Conference on High Performance Computing & Simulation (HPCS) Pub Date : 2017-07-01 DOI: 10.1109/HPCS.2017.128

Kabil Boukhari, Mohamed Nazih Omri

引用次数: 5

Efficient Initial Guess Determination Based on 3D Point Cloud Projection for ICP Algorithms 基于三维点云投影的ICP算法的有效初始猜测确定

2017 International Conference on High Performance Computing & Simulation (HPCS) Pub Date : 2017-07-01 DOI: 10.1109/HPCS.2017.122

Mouna Attia, Y. Slama

{"title":"Efficient Initial Guess Determination Based on 3D Point Cloud Projection for ICP Algorithms","authors":"Mouna Attia, Y. Slama","doi":"10.1109/HPCS.2017.122","DOIUrl":"https://doi.org/10.1109/HPCS.2017.122","url":null,"abstract":"The standard Iterative Closest Point (ICP) algorithm is a robust and efficient rigid registration algorithm for 3D point clouds. Nevertheless, its efficiency notably decreases when applied to a large transformation cases. In order to avoid this drawback and improve its performance, we propose a new 3-step approach based on ICP and point Cloud Projection, called ICP- CP, that both enhances the accuracy in most cases and reduces the execution time. The first step consists in the determination of the best projection plane that preserves the topological structure of the point cloud. In the second, we compute an initial guess using the projected points. As to the third, it performs successive iterations until reaching the best transformation. The novelty of our approach is the use of the projected points that aims to find an adequate initialization for the ICP algorithm to ensure a fast convergence. We achieve a series of experimentations on both benchmarks and real clouds in order to validate our contribution and prove that, in most cases, the proposed algorithm is more accurate and robust than the standard ICP in a variety of situations, namely in case of large translation, large rotation and the both simultaneously.","PeriodicalId":115758,"journal":{"name":"2017 International Conference on High Performance Computing & Simulation (HPCS)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115358541","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13