基于数据预处理的大规模GPU架构双模式压缩

2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS) Pub Date : 2019-05-20 DOI:10.1109/IPDPS.2019.00076

Kyung Hoon Kim, Priyank Devpura, Abhishek Nayyar, Andrew Doolittle, K. H. Yum, Eun Jung Kim

{"title":"基于数据预处理的大规模GPU架构双模式压缩","authors":"Kyung Hoon Kim, Priyank Devpura, Abhishek Nayyar, Andrew Doolittle, K. H. Yum, Eun Jung Kim","doi":"10.1109/IPDPS.2019.00076","DOIUrl":null,"url":null,"abstract":"Graphics Processing Units (GPUs) have been widely accepted for diverse general purpose applications due to a massive degree of parallelism. The demand for large-scale GPUs processing a large volume of data with high throughput has been rising rapidly. However, in large-scale GPUs, a bandwidth-efficient network design is challenging. Compression techniques are a practical remedy to effectively increase network bandwidth by reducing data size transferred. We propose a new simple compression mechanism, Dual Pattern Compression (DPC), that compresses only two patterns with a very low latency. The simplicity of compression/decompression is achieved through data remapping and data-type-aware data preprocessing which exploits bit-level data redundancy. The data type is detected during runtime. We demonstrate that our compression scheme effectively mitigates the network congestion in a large-scale GPU. It achieves IPC improvement by 33% on average (up to 126%) across various benchmarks with average space savings ratios of 61% in integer, 46% (up to 72%) in floating-point and 23% (up to 57%) in character type benchmarks.","PeriodicalId":403406,"journal":{"name":"2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":"194 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Dual Pattern Compression Using Data-Preprocessing for Large-Scale GPU Architectures\",\"authors\":\"Kyung Hoon Kim, Priyank Devpura, Abhishek Nayyar, Andrew Doolittle, K. H. Yum, Eun Jung Kim\",\"doi\":\"10.1109/IPDPS.2019.00076\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Graphics Processing Units (GPUs) have been widely accepted for diverse general purpose applications due to a massive degree of parallelism. The demand for large-scale GPUs processing a large volume of data with high throughput has been rising rapidly. However, in large-scale GPUs, a bandwidth-efficient network design is challenging. Compression techniques are a practical remedy to effectively increase network bandwidth by reducing data size transferred. We propose a new simple compression mechanism, Dual Pattern Compression (DPC), that compresses only two patterns with a very low latency. The simplicity of compression/decompression is achieved through data remapping and data-type-aware data preprocessing which exploits bit-level data redundancy. The data type is detected during runtime. We demonstrate that our compression scheme effectively mitigates the network congestion in a large-scale GPU. It achieves IPC improvement by 33% on average (up to 126%) across various benchmarks with average space savings ratios of 61% in integer, 46% (up to 72%) in floating-point and 23% (up to 57%) in character type benchmarks.\",\"PeriodicalId\":403406,\"journal\":{\"name\":\"2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS)\",\"volume\":\"194 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-05-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IPDPS.2019.00076\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPS.2019.00076","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

由于大量的并行性，图形处理单元(gpu)已被广泛接受用于各种通用应用程序。对处理大数据量、高吞吐量的大型gpu的需求一直在快速增长。然而，在大规模gpu中，带宽高效的网络设计是具有挑战性的。压缩技术是通过减少传输的数据量来有效增加网络带宽的一种实用补救措施。我们提出了一种新的简单的压缩机制，双模式压缩(DPC)，它只压缩两个模式，具有非常低的延迟。压缩/解压缩的简单性是通过数据重映射和数据类型感知的数据预处理实现的，这种预处理利用了位级数据冗余。在运行时期间检测数据类型。我们证明了我们的压缩方案有效地缓解了大规模GPU中的网络拥塞。它在各种基准测试中平均实现了33%(最高126%)的IPC改进，在整数测试中平均节省了61%的空间，在浮点测试中节省了46%(最高72%)的空间，在字符类型测试中节省了23%(最高57%)的空间。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Dual Pattern Compression Using Data-Preprocessing for Large-Scale GPU Architectures

Graphics Processing Units (GPUs) have been widely accepted for diverse general purpose applications due to a massive degree of parallelism. The demand for large-scale GPUs processing a large volume of data with high throughput has been rising rapidly. However, in large-scale GPUs, a bandwidth-efficient network design is challenging. Compression techniques are a practical remedy to effectively increase network bandwidth by reducing data size transferred. We propose a new simple compression mechanism, Dual Pattern Compression (DPC), that compresses only two patterns with a very low latency. The simplicity of compression/decompression is achieved through data remapping and data-type-aware data preprocessing which exploits bit-level data redundancy. The data type is detected during runtime. We demonstrate that our compression scheme effectively mitigates the network congestion in a large-scale GPU. It achieves IPC improvement by 33% on average (up to 126%) across various benchmarks with average space savings ratios of 61% in integer, 46% (up to 72%) in floating-point and 23% (up to 57%) in character type benchmarks.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS)

自引率

0.00%

发文量