2017 International Conference on High Performance Computing & Simulation (HPCS)最新文献

筛选
英文 中文
MERCATOR: A GPGPU Framework for Irregular Streaming Applications 不规则流应用的GPGPU框架
Stephen V. Cole, J. Buhler
{"title":"MERCATOR: A GPGPU Framework for Irregular Streaming Applications","authors":"Stephen V. Cole, J. Buhler","doi":"10.1109/HPCS.2017.111","DOIUrl":"https://doi.org/10.1109/HPCS.2017.111","url":null,"abstract":"GPUs have a natural affinity for streaming applications exhibiting consistent, predictable dataflow. However, many high-impact irregular streaming applications, including sequence pattern matching, decision-tree and decision-cascade evaluation, and large-scale graph processing, exhibit unpredictable dataflow due to data-dependent filtering or expansion of the data stream. Existing GPU frameworks do not support arbitrary irregular streaming dataflow tasks, and developing application-specific optimized implementations for such tasks requires expert GPU knowledge. We introduce MERCATOR, a lightweight framework supporting modular CUDA streaming application development for irregular applications. A developer can use MERCATOR to decompose an irregular application for the GPU without explicitly remapping work to threads at runtime. MERCATOR applications are efficiently parallelized on the GPU through a combination of replication across blocks and queueing between nodes to accommodate irregularity. We quantify the performance impact of MERCATOR's support for irregularity and illustrate its utility by implementing a biological sequence comparison pipeline similar to the well-known NCBI BLASTN algorithm. MERCATOR code is available by request to the first author.","PeriodicalId":115758,"journal":{"name":"2017 International Conference on High Performance Computing & Simulation (HPCS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134341087","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
PHAST Library — Enabling Single-Source and High Performance Code for GPUs and Multi-cores PHAST库-为gpu和多核启用单源和高性能代码
Biagio Peccerillo, S. Bartolini
{"title":"PHAST Library — Enabling Single-Source and High Performance Code for GPUs and Multi-cores","authors":"Biagio Peccerillo, S. Bartolini","doi":"10.1109/HPCS.2017.109","DOIUrl":"https://doi.org/10.1109/HPCS.2017.109","url":null,"abstract":"The simulation of parallel heterogeneous architectures such as multi-cores and GPUs sets new challenges in the programming language/framework domain. Applications for simulators need to be expressed in a way that can be easily adapted for the specific architectures, effectively tuned for on each of them while preventing from introducing biases due to non-uniform hand-made optimizations. The most common heterogeneous programming frameworks are too low-level, so we propose PHAST, a high-level heterogeneous C++ library targetable on multi-cores and Nvidia GPUs. It permits to write code at a high level of abstraction, to reach good performance while allowing for fine parameter tuning and not shielding code from low-level optimizations. We evaluate PHAST in the case of DCT8x8 on both supported architectures. On multi-cores, we found that PHAST implementation is around ten times faster than OpenCL (AMD vendor) implementation, but up to about 4x slower than OpenCL (Intel vendor) one, which effectively leverages auto-vectorization. On Nvidia GPUs, PHAST code performs up to 55.14% better than CUDA SDK reference version.","PeriodicalId":115758,"journal":{"name":"2017 International Conference on High Performance Computing & Simulation (HPCS)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131997798","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Large-Scale Memory of Sequences Using Binary Sparse Neural Networks on GPU 基于GPU的二值稀疏神经网络大规模序列存储
M. R. S. Marques, G. B. Hacene, C. Lassance, Pierre-Henri Horrein
{"title":"Large-Scale Memory of Sequences Using Binary Sparse Neural Networks on GPU","authors":"M. R. S. Marques, G. B. Hacene, C. Lassance, Pierre-Henri Horrein","doi":"10.1109/HPCS.2017.88","DOIUrl":"https://doi.org/10.1109/HPCS.2017.88","url":null,"abstract":"Associative memories are models capable to store and retrieve messages given only a part of their content. These systems have been used in several applications such as databases engines, network routers, natural language processing and image recognition due to their error correction capability in pattern retrieving. Recently, Gripon and Berrou introduced a sparse associative memory based on cliques which achieves almost optimal storage efficiency (ratio of useful bits stored to bits used). Binary Tournament-based Neural Network is an extension of sparse associative memories based on cliques and it is used to store and retrieve long sequences of patterns. This paper proposes solutions to increase the retrieval performance in a parallel architecture for this memory and a scalable and optimized implementation of the algorithms needed to use.","PeriodicalId":115758,"journal":{"name":"2017 International Conference on High Performance Computing & Simulation (HPCS)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128408732","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Understanding the Performances of Sparse Compression Formats Using Data Parallel Programming Model 利用数据并行编程模型理解稀疏压缩格式的性能
Ichrak Mehrez, O. Hamdi-Larbi, T. Dufaud, N. Emad
{"title":"Understanding the Performances of Sparse Compression Formats Using Data Parallel Programming Model","authors":"Ichrak Mehrez, O. Hamdi-Larbi, T. Dufaud, N. Emad","doi":"10.1109/HPCS.2017.103","DOIUrl":"https://doi.org/10.1109/HPCS.2017.103","url":null,"abstract":"Several applications in numerical scientific computing involve very large sparse matrices with a regular or irregular sparse structure. These matrices can be stored using special compression formats (storing only non-zero elements) to reduce memory space and processing time. The choice of the optimal format is a critical process that involves several criteria. The general context of this work is to propose an auto-tuner system that, given a sparse matrix, a numerical method, a parallel programming model and an architecture, can automatically select the Optimal Compression Format (OCF). In this paper we study the performance of two different sparse compression formats namely CSR (Compressed Sparse Row) and CSC (Compressed Sparse Column). Thus, we propose data parallel algorithms for Sparse Matrix Vector Product in the case of each format. We extract a set of parameters that can help us to select the more suitable compression format for a given sparse matrix.","PeriodicalId":115758,"journal":{"name":"2017 International Conference on High Performance Computing & Simulation (HPCS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128085012","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Analyzing Performance of Multi-cores and Applications with Cache-aware Roofline Model 基于缓存感知的rooline模型的多核及应用性能分析
Diogo Marques, Helder Duarte, L. Sousa, A. Ilic
{"title":"Analyzing Performance of Multi-cores and Applications with Cache-aware Roofline Model","authors":"Diogo Marques, Helder Duarte, L. Sousa, A. Ilic","doi":"10.1109/HPCS.2017.158","DOIUrl":"https://doi.org/10.1109/HPCS.2017.158","url":null,"abstract":"To satisfy growing computational demands of modern applications, significant enhancements have been introduced in the contemporary processor architectures with the aim to increase their attainable performance, such as increased number of cores, improved capability of memory subsystem and enhancements in the processor pipeline [1]. Therefore, the performance improvements are usually coupled with an increased complexity at the architecture level, which imposes additional challenges when designing, prototyping and optimizing the execution of real-world applications on a given compute platform. Since the application performance depends on multiple factors, e.g., multi-threading, vectorization efficiency and memory accesses, achieving the most efficient execution is not a trivial task, especially when aiming at fully exploiting the capabilities of modern multi-core processors.","PeriodicalId":115758,"journal":{"name":"2017 International Conference on High Performance Computing & Simulation (HPCS)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128037322","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
High Performance Recursive Matrix Inversion for Multicore Architectures 多核架构的高性能递归矩阵反演
R. Mahfoudhi, Sami Achour, O. Hamdi-Larbi, Z. Mahjoub
{"title":"High Performance Recursive Matrix Inversion for Multicore Architectures","authors":"R. Mahfoudhi, Sami Achour, O. Hamdi-Larbi, Z. Mahjoub","doi":"10.1109/HPCS.2017.104","DOIUrl":"https://doi.org/10.1109/HPCS.2017.104","url":null,"abstract":"There are several approaches for computing the inverse of a dense square matrix, say A, namely Gaussian elimination, block wise inversion, and LU factorization (LUF). The latter is used in mathematical software libraries such as SCALAPACK, PBLAS and MATLAB. The inversion routine in SCALAPACK library (called PDGETRI) consists, once the two factors L and U are known (where ALU), in first inverting U (PDGETRF) then solving a triangular matrix system giving A−1. A symmetric way consists in first inverting L, then solving a matrix system giving A−1. Alternatively, one could compute the inverses of both U and L, then their product and get A−1. On the other hand, the Strassen fast matrix inversion algorithm is known as an efficient alternative for solving our problem. We propose in this paper a series of different versions for parallel dense matrix inversion based on the 'Divide and Conquer' paradigm. A theoretical performance study permits to establish an accurate comparison between the designed algorithms. We achieved a series of experiments that permit to validate the contribution and lead to efficient performances obtained for large matrix sizes i.e. up to 40% faster than SCALAPACK.","PeriodicalId":115758,"journal":{"name":"2017 International Conference on High Performance Computing & Simulation (HPCS)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126197791","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Cryptanalysis on GPUs with the Cube Attack: Design, Optimization and Performances Gains 基于立方体攻击的gpu密码分析:设计、优化和性能提升
Marco Cianfriglia, Stefano Guarino
{"title":"Cryptanalysis on GPUs with the Cube Attack: Design, Optimization and Performances Gains","authors":"Marco Cianfriglia, Stefano Guarino","doi":"10.1109/HPCS.2017.114","DOIUrl":"https://doi.org/10.1109/HPCS.2017.114","url":null,"abstract":"The cube attack is a flexible cryptanalysis technique, with a simple and fascinating theoretical implant. It combines offline exhaustive searches over selected tweakable public/IV bits (the sides of the “cube“), with an online key-recovery phase. Although virtually applicable to any cipher, and generally praised by the research community, the real potential of the attack is still in question, and no implementation so far succeeded in breaking a real-world strong cipher. In this paper, we present, validate and analyze the first thorough implementation of the cube attack on a GPU cluster. The framework is conceived so as to be usable out-of-the-box for any cipher featuring up to 128-bit key and IV, and easily adaptable to larger key/IV, at just the cost of some fine (performance) tuning, mostly related to memory allocation. As a test case, we consider previous state-of-the-art results against a reduced-round version of a well-known cipher (Trivium). We evaluate the computational speedup with respect to a CPU-parallel benchmark, the performance dependence on system parameters and GPU architectures (Nvidia Kepler vs Nvidia Pascal), and the scalability of our solution on multi-GPU systems. All design choices are carefully described, and their respective advantages and drawbacks are discussed. By exhibiting the benefits of a complete GPU-tailored implementation of the cube attack, we provide novel and strong elements in support of the general feasibility of the attack, thus paving the way for future work in the area.","PeriodicalId":115758,"journal":{"name":"2017 International Conference on High Performance Computing & Simulation (HPCS)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127908663","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
DGS-SMS Compact Fifth Order Low Pass Filter sms紧凑型五阶低通滤波器
Heba El-Halabi, H. Issa, D. Kaddour, E. Pistono, P. Ferrari
{"title":"DGS-SMS Compact Fifth Order Low Pass Filter","authors":"Heba El-Halabi, H. Issa, D. Kaddour, E. Pistono, P. Ferrari","doi":"10.1109/HPCS.2017.49","DOIUrl":"https://doi.org/10.1109/HPCS.2017.49","url":null,"abstract":"This paper presents a slow-wave microstrip transmission lines based on a double-layer PCB substrate. The slow wave effect is obtained by inserting metallic via rows within the lower substrate layer. Based on this concept, miniaturized high rejection 5'h order low pass stepped impedance filters with a cut-off frequency of 2.45 GHz are designed, fabricated and measured. Measurement results are in good agreement with simulations. Thanks to the slow-wave transmission lines, the size of the realized filters is reduced by 54% as compared to a classical filter prototype.","PeriodicalId":115758,"journal":{"name":"2017 International Conference on High Performance Computing & Simulation (HPCS)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121464107","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Information Retrieval Based on Description Logic: Application to Biomedical Documents 基于描述逻辑的信息检索:在生物医学文献中的应用
Kabil Boukhari, Mohamed Nazih Omri
{"title":"Information Retrieval Based on Description Logic: Application to Biomedical Documents","authors":"Kabil Boukhari, Mohamed Nazih Omri","doi":"10.1109/HPCS.2017.128","DOIUrl":"https://doi.org/10.1109/HPCS.2017.128","url":null,"abstract":"The document indexing is a fairly sensitive phase in the information retrieval. However, terms presented in a document are not sufficient to completely represent it. Then, the exploitation of the implicit information, through external resources, is necessary for better indexing. For this purpose, a new indexing model for biomedical documents based on description logics has been proposed to generate relevant indexes. The documents and the external resource are represented by descriptive expressions; a first statistical phase consists in assigning an importance degree to each term in the document and a semantic part to extract the most important concepts of the MESH thesaurus (Medical Subject Headings). The concept extraction step uses the description logics to combine the statistical and semantic approaches followed by a cleaning part to select the most important indexes for the document representation. For the experiments phase we used the OHSUMED collection, which showed the effectiveness of the proposed approach and the importance of using description logics for the indexing process.","PeriodicalId":115758,"journal":{"name":"2017 International Conference on High Performance Computing & Simulation (HPCS)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123211539","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Efficient Initial Guess Determination Based on 3D Point Cloud Projection for ICP Algorithms 基于三维点云投影的ICP算法的有效初始猜测确定
Mouna Attia, Y. Slama
{"title":"Efficient Initial Guess Determination Based on 3D Point Cloud Projection for ICP Algorithms","authors":"Mouna Attia, Y. Slama","doi":"10.1109/HPCS.2017.122","DOIUrl":"https://doi.org/10.1109/HPCS.2017.122","url":null,"abstract":"The standard Iterative Closest Point (ICP) algorithm is a robust and efficient rigid registration algorithm for 3D point clouds. Nevertheless, its efficiency notably decreases when applied to a large transformation cases. In order to avoid this drawback and improve its performance, we propose a new 3-step approach based on ICP and point Cloud Projection, called ICP- CP, that both enhances the accuracy in most cases and reduces the execution time. The first step consists in the determination of the best projection plane that preserves the topological structure of the point cloud. In the second, we compute an initial guess using the projected points. As to the third, it performs successive iterations until reaching the best transformation. The novelty of our approach is the use of the projected points that aims to find an adequate initialization for the ICP algorithm to ensure a fast convergence. We achieve a series of experimentations on both benchmarks and real clouds in order to validate our contribution and prove that, in most cases, the proposed algorithm is more accurate and robust than the standard ICP in a variety of situations, namely in case of large translation, large rotation and the both simultaneously.","PeriodicalId":115758,"journal":{"name":"2017 International Conference on High Performance Computing & Simulation (HPCS)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115358541","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信