用于数值应用的浮点数流压缩

2010 First International Conference on Networking and Computing Pub Date : 2010-11-17 DOI:10.1109/IC-NC.2010.24

Hisanobu Tomari, M. Inaba, K. Hiraki

{"title":"用于数值应用的浮点数流压缩","authors":"Hisanobu Tomari, M. Inaba, K. Hiraki","doi":"10.1109/IC-NC.2010.24","DOIUrl":null,"url":null,"abstract":"A cluster of commodity computers and general-purpose computers with accelerators such as GPGPUs are now common platforms to solve computationally intensive tasks like scientific simulations. Both technologies provide users with high performance at relatively low cost. However, the low bandwidth of interconnect compared to the computing performance hinders efficient operation of both cluster and accelerator in the case of many algorithms that require heavy data transmission. For clusters the network is one of the major performance bottlenecks, and for accelerators the peripheral bus to transfer data from host to the memory on the accelerator card is. In this paper, we propose a method of accelerating the performance of floating-point intensive algorithms by compressing the floating point number stream. With the efficient software encoder and hardware decoder, the method eliminates redundancy in the exponential part in the array of numbers on the stream and compacts the entire array to 82.8% of its original size at theoretical limit. The compression ratio is better than Gzip or Bzip2 for floating point numbers. The reduction in communication time directly leads to the reduction in total application running time for programs whose processing time is largely dominated by communication performance. We implemented a high-speed decoder using FPGA that operates at over 6 GB/s. We estimated the application performance using FFT and matrix multiplication on a cluster and the GRAPE-DR accelerator respectively, and our approach is useful in both configurations.","PeriodicalId":375145,"journal":{"name":"2010 First International Conference on Networking and Computing","volume":"201 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"Compressing Floating-Point Number Stream for Numerical Applications\",\"authors\":\"Hisanobu Tomari, M. Inaba, K. Hiraki\",\"doi\":\"10.1109/IC-NC.2010.24\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A cluster of commodity computers and general-purpose computers with accelerators such as GPGPUs are now common platforms to solve computationally intensive tasks like scientific simulations. Both technologies provide users with high performance at relatively low cost. However, the low bandwidth of interconnect compared to the computing performance hinders efficient operation of both cluster and accelerator in the case of many algorithms that require heavy data transmission. For clusters the network is one of the major performance bottlenecks, and for accelerators the peripheral bus to transfer data from host to the memory on the accelerator card is. In this paper, we propose a method of accelerating the performance of floating-point intensive algorithms by compressing the floating point number stream. With the efficient software encoder and hardware decoder, the method eliminates redundancy in the exponential part in the array of numbers on the stream and compacts the entire array to 82.8% of its original size at theoretical limit. The compression ratio is better than Gzip or Bzip2 for floating point numbers. The reduction in communication time directly leads to the reduction in total application running time for programs whose processing time is largely dominated by communication performance. We implemented a high-speed decoder using FPGA that operates at over 6 GB/s. We estimated the application performance using FFT and matrix multiplication on a cluster and the GRAPE-DR accelerator respectively, and our approach is useful in both configurations.\",\"PeriodicalId\":375145,\"journal\":{\"name\":\"2010 First International Conference on Networking and Computing\",\"volume\":\"201 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-11-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2010 First International Conference on Networking and Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IC-NC.2010.24\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 First International Conference on Networking and Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IC-NC.2010.24","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 8

摘要

一组商用计算机和带有诸如gpgpu等加速器的通用计算机现在是解决科学模拟等计算密集型任务的通用平台。这两种技术都以相对较低的成本为用户提供了高性能。然而，在许多需要大量数据传输的算法中，与计算性能相比，互连的低带宽阻碍了集群和加速器的有效运行。对于集群来说，网络是主要的性能瓶颈之一，对于加速器来说，将数据从主机传输到加速器卡上的内存的外围总线是。本文提出了一种通过压缩浮点数流来提高浮点密集型算法性能的方法。该方法利用高效的软件编码器和硬件解码器，消除了流上数字数组中指数部分的冗余，在理论极限下将整个数组压缩到原始大小的82.8%。对于浮点数，压缩比优于Gzip或Bzip2。通信时间的减少直接导致程序的总应用程序运行时间的减少，这些程序的处理时间在很大程度上取决于通信性能。我们使用FPGA实现了一个运行速度超过6 GB/s的高速解码器。我们分别在集群和GRAPE-DR加速器上使用FFT和矩阵乘法来估计应用程序性能，我们的方法在这两种配置中都很有用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Compressing Floating-Point Number Stream for Numerical Applications

A cluster of commodity computers and general-purpose computers with accelerators such as GPGPUs are now common platforms to solve computationally intensive tasks like scientific simulations. Both technologies provide users with high performance at relatively low cost. However, the low bandwidth of interconnect compared to the computing performance hinders efficient operation of both cluster and accelerator in the case of many algorithms that require heavy data transmission. For clusters the network is one of the major performance bottlenecks, and for accelerators the peripheral bus to transfer data from host to the memory on the accelerator card is. In this paper, we propose a method of accelerating the performance of floating-point intensive algorithms by compressing the floating point number stream. With the efficient software encoder and hardware decoder, the method eliminates redundancy in the exponential part in the array of numbers on the stream and compacts the entire array to 82.8% of its original size at theoretical limit. The compression ratio is better than Gzip or Bzip2 for floating point numbers. The reduction in communication time directly leads to the reduction in total application running time for programs whose processing time is largely dominated by communication performance. We implemented a high-speed decoder using FPGA that operates at over 6 GB/s. We estimated the application performance using FFT and matrix multiplication on a cluster and the GRAPE-DR accelerator respectively, and our approach is useful in both configurations.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2010 First International Conference on Networking and Computing

自引率

0.00%

发文量