并行网络编码与SIMD指令集

2008 International Symposium on Computer Science and Computational Technology Pub Date : 2008-12-20 DOI:10.1109/ISCSCT.2008.141

Han Li, Huan-yan Qian

{"title":"并行网络编码与SIMD指令集","authors":"Han Li, Huan-yan Qian","doi":"10.1109/ISCSCT.2008.141","DOIUrl":null,"url":null,"abstract":"It is a well known result that network coding may achieve better network throughput in certain multicast topologies. However, the practicality of network coding has been questioned, due to its high computational complexity. This paper represents an attempt towards a high performance implementation of network coding. We first propose to implement progressive decoding with Gauss-Jordan elimination, such that blocks can be decoded as they are received. We then employ hardware acceleration with SIMD vector instructions. We also use a careful threading design to take advantage of symmetric multiprocessor (SMP) systems and multicore processors. Our core idea of optimization is the table-based multiplication in GF(28) ,which is able to process a row multiplication of random linear codes by searching previous built product tables with vector using the SSE3 instruction PSHUFB. Our high performance implementation is encapsulated as a C++ class library. On a dual-core Intel T5500 1.66G PC, the encoding bandwidth of our implementations able to reach 42.493 MB/second with 128 blocks of 4 KB each.","PeriodicalId":228533,"journal":{"name":"2008 International Symposium on Computer Science and Computational Technology","volume":"50 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Parallelized Network Coding with SIMD Instruction Sets\",\"authors\":\"Han Li, Huan-yan Qian\",\"doi\":\"10.1109/ISCSCT.2008.141\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"It is a well known result that network coding may achieve better network throughput in certain multicast topologies. However, the practicality of network coding has been questioned, due to its high computational complexity. This paper represents an attempt towards a high performance implementation of network coding. We first propose to implement progressive decoding with Gauss-Jordan elimination, such that blocks can be decoded as they are received. We then employ hardware acceleration with SIMD vector instructions. We also use a careful threading design to take advantage of symmetric multiprocessor (SMP) systems and multicore processors. Our core idea of optimization is the table-based multiplication in GF(28) ,which is able to process a row multiplication of random linear codes by searching previous built product tables with vector using the SSE3 instruction PSHUFB. Our high performance implementation is encapsulated as a C++ class library. On a dual-core Intel T5500 1.66G PC, the encoding bandwidth of our implementations able to reach 42.493 MB/second with 128 blocks of 4 KB each.\",\"PeriodicalId\":228533,\"journal\":{\"name\":\"2008 International Symposium on Computer Science and Computational Technology\",\"volume\":\"50 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2008-12-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2008 International Symposium on Computer Science and Computational Technology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISCSCT.2008.141\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 International Symposium on Computer Science and Computational Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISCSCT.2008.141","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

在特定的组播拓扑中，网络编码可以获得更好的网络吞吐量，这是一个众所周知的结果。然而，由于网络编码的高计算复杂度，其实用性一直受到质疑。本文是对网络编码高性能实现的一种尝试。我们首先提出用高斯-约当消去实现渐进式解码，这样就可以在接收到数据块时对其进行解码。然后，我们使用SIMD矢量指令的硬件加速。我们还使用谨慎的线程设计来利用对称多处理器(SMP)系统和多核处理器。我们优化的核心思想是GF(28)中的基于表的乘法，它能够通过使用SSE3指令PSHUFB搜索以前构建的带有向量的产品表来处理随机线性代码的行乘法。我们的高性能实现被封装为一个c++类库。在双核Intel T5500 1.66G PC上，我们实现的编码带宽能够达到42.493 MB/秒，每个4 KB的128块。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Parallelized Network Coding with SIMD Instruction Sets

It is a well known result that network coding may achieve better network throughput in certain multicast topologies. However, the practicality of network coding has been questioned, due to its high computational complexity. This paper represents an attempt towards a high performance implementation of network coding. We first propose to implement progressive decoding with Gauss-Jordan elimination, such that blocks can be decoded as they are received. We then employ hardware acceleration with SIMD vector instructions. We also use a careful threading design to take advantage of symmetric multiprocessor (SMP) systems and multicore processors. Our core idea of optimization is the table-based multiplication in GF(28) ,which is able to process a row multiplication of random linear codes by searching previous built product tables with vector using the SSE3 instruction PSHUFB. Our high performance implementation is encapsulated as a C++ class library. On a dual-core Intel T5500 1.66G PC, the encoding bandwidth of our implementations able to reach 42.493 MB/second with 128 blocks of 4 KB each.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2008 International Symposium on Computer Science and Computational Technology

自引率

0.00%

发文量