用于传输线建模加速的片上系统矢量多处理器

IEEE Workshop on Signal Processing Systems Design and Implementation, 2005. Pub Date : 2005-11-02 DOI:10.1109/SIPS.2005.1579931

V. Chouliaras, J. Flint, Yibin Li, J. Núñez-Yáñez

{"title":"用于传输线建模加速的片上系统矢量多处理器","authors":"V. Chouliaras, J. Flint, Yibin Li, J. Núñez-Yáñez","doi":"10.1109/SIPS.2005.1579931","DOIUrl":null,"url":null,"abstract":"We discuss a configurable, system-on-chip vector multiprocessor for accelerating the transmission line modeling (TLM) algorithm with an architecture capable of exploiting the two primary forms of parallelism in the code, thread and data level parallelism. Theoretical results demonstrate an order of magnitude reduction in the dynamic instruction count for a scalar-processor/vector-coprocessor configuration at a vector length of sixteen 32-bit single-precision elements. Furthermore, a multi-vector SoC architecture consisting of ten such vector accelerators provides a near-linear theoretical performance benefit of the order of 88% in three out of four benchmark configurations which is orthogonal to the benefit realized by vectorization alone. We discuss in detail this potent architecture and present implementation data for the 2-way multi-processor VLSI macrocell.","PeriodicalId":436123,"journal":{"name":"IEEE Workshop on Signal Processing Systems Design and Implementation, 2005.","volume":"111 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2005-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"A system-on-chip vector multiprocessor for transmission line modelling acceleration\",\"authors\":\"V. Chouliaras, J. Flint, Yibin Li, J. Núñez-Yáñez\",\"doi\":\"10.1109/SIPS.2005.1579931\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We discuss a configurable, system-on-chip vector multiprocessor for accelerating the transmission line modeling (TLM) algorithm with an architecture capable of exploiting the two primary forms of parallelism in the code, thread and data level parallelism. Theoretical results demonstrate an order of magnitude reduction in the dynamic instruction count for a scalar-processor/vector-coprocessor configuration at a vector length of sixteen 32-bit single-precision elements. Furthermore, a multi-vector SoC architecture consisting of ten such vector accelerators provides a near-linear theoretical performance benefit of the order of 88% in three out of four benchmark configurations which is orthogonal to the benefit realized by vectorization alone. We discuss in detail this potent architecture and present implementation data for the 2-way multi-processor VLSI macrocell.\",\"PeriodicalId\":436123,\"journal\":{\"name\":\"IEEE Workshop on Signal Processing Systems Design and Implementation, 2005.\",\"volume\":\"111 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2005-11-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Workshop on Signal Processing Systems Design and Implementation, 2005.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SIPS.2005.1579931\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Workshop on Signal Processing Systems Design and Implementation, 2005.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SIPS.2005.1579931","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

我们讨论了一种可配置的片上系统矢量多处理器，用于加速传输线建模(TLM)算法，其架构能够利用代码、线程和数据级并行的两种主要并行形式。理论结果表明，在矢量长度为16个32位单精度元件的情况下，标量处理器/矢量协处理器配置的动态指令计数减少了一个数量级。此外，由十个这样的矢量加速器组成的多矢量SoC架构在四种基准配置中的三种中提供了88%的近线性理论性能优势，这与单独矢量化实现的优势是正交的。我们详细讨论了这种强大的架构，并给出了双向多处理器VLSI宏单元的实现数据。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A system-on-chip vector multiprocessor for transmission line modelling acceleration

We discuss a configurable, system-on-chip vector multiprocessor for accelerating the transmission line modeling (TLM) algorithm with an architecture capable of exploiting the two primary forms of parallelism in the code, thread and data level parallelism. Theoretical results demonstrate an order of magnitude reduction in the dynamic instruction count for a scalar-processor/vector-coprocessor configuration at a vector length of sixteen 32-bit single-precision elements. Furthermore, a multi-vector SoC architecture consisting of ten such vector accelerators provides a near-linear theoretical performance benefit of the order of 88% in three out of four benchmark configurations which is orthogonal to the benefit realized by vectorization alone. We discuss in detail this potent architecture and present implementation data for the 2-way multi-processor VLSI macrocell.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Workshop on Signal Processing Systems Design and Implementation, 2005.

自引率

0.00%

发文量