Kyung Hoon Kim, Priyank Devpura, Abhishek Nayyar, Andrew Doolittle, K. H. Yum, Eun Jung Kim
{"title":"基于数据预处理的大规模GPU架构双模式压缩","authors":"Kyung Hoon Kim, Priyank Devpura, Abhishek Nayyar, Andrew Doolittle, K. H. Yum, Eun Jung Kim","doi":"10.1109/IPDPS.2019.00076","DOIUrl":null,"url":null,"abstract":"Graphics Processing Units (GPUs) have been widely accepted for diverse general purpose applications due to a massive degree of parallelism. The demand for large-scale GPUs processing a large volume of data with high throughput has been rising rapidly. However, in large-scale GPUs, a bandwidth-efficient network design is challenging. Compression techniques are a practical remedy to effectively increase network bandwidth by reducing data size transferred. We propose a new simple compression mechanism, Dual Pattern Compression (DPC), that compresses only two patterns with a very low latency. The simplicity of compression/decompression is achieved through data remapping and data-type-aware data preprocessing which exploits bit-level data redundancy. The data type is detected during runtime. We demonstrate that our compression scheme effectively mitigates the network congestion in a large-scale GPU. It achieves IPC improvement by 33% on average (up to 126%) across various benchmarks with average space savings ratios of 61% in integer, 46% (up to 72%) in floating-point and 23% (up to 57%) in character type benchmarks.","PeriodicalId":403406,"journal":{"name":"2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":"194 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Dual Pattern Compression Using Data-Preprocessing for Large-Scale GPU Architectures\",\"authors\":\"Kyung Hoon Kim, Priyank Devpura, Abhishek Nayyar, Andrew Doolittle, K. H. Yum, Eun Jung Kim\",\"doi\":\"10.1109/IPDPS.2019.00076\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Graphics Processing Units (GPUs) have been widely accepted for diverse general purpose applications due to a massive degree of parallelism. The demand for large-scale GPUs processing a large volume of data with high throughput has been rising rapidly. However, in large-scale GPUs, a bandwidth-efficient network design is challenging. Compression techniques are a practical remedy to effectively increase network bandwidth by reducing data size transferred. We propose a new simple compression mechanism, Dual Pattern Compression (DPC), that compresses only two patterns with a very low latency. The simplicity of compression/decompression is achieved through data remapping and data-type-aware data preprocessing which exploits bit-level data redundancy. The data type is detected during runtime. We demonstrate that our compression scheme effectively mitigates the network congestion in a large-scale GPU. It achieves IPC improvement by 33% on average (up to 126%) across various benchmarks with average space savings ratios of 61% in integer, 46% (up to 72%) in floating-point and 23% (up to 57%) in character type benchmarks.\",\"PeriodicalId\":403406,\"journal\":{\"name\":\"2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS)\",\"volume\":\"194 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-05-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IPDPS.2019.00076\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPS.2019.00076","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Dual Pattern Compression Using Data-Preprocessing for Large-Scale GPU Architectures
Graphics Processing Units (GPUs) have been widely accepted for diverse general purpose applications due to a massive degree of parallelism. The demand for large-scale GPUs processing a large volume of data with high throughput has been rising rapidly. However, in large-scale GPUs, a bandwidth-efficient network design is challenging. Compression techniques are a practical remedy to effectively increase network bandwidth by reducing data size transferred. We propose a new simple compression mechanism, Dual Pattern Compression (DPC), that compresses only two patterns with a very low latency. The simplicity of compression/decompression is achieved through data remapping and data-type-aware data preprocessing which exploits bit-level data redundancy. The data type is detected during runtime. We demonstrate that our compression scheme effectively mitigates the network congestion in a large-scale GPU. It achieves IPC improvement by 33% on average (up to 126%) across various benchmarks with average space savings ratios of 61% in integer, 46% (up to 72%) in floating-point and 23% (up to 57%) in character type benchmarks.