用于传输线建模加速的片上系统矢量多处理器

V. Chouliaras, J. Flint, Yibin Li, J. Núñez-Yáñez
{"title":"用于传输线建模加速的片上系统矢量多处理器","authors":"V. Chouliaras, J. Flint, Yibin Li, J. Núñez-Yáñez","doi":"10.1109/SIPS.2005.1579931","DOIUrl":null,"url":null,"abstract":"We discuss a configurable, system-on-chip vector multiprocessor for accelerating the transmission line modeling (TLM) algorithm with an architecture capable of exploiting the two primary forms of parallelism in the code, thread and data level parallelism. Theoretical results demonstrate an order of magnitude reduction in the dynamic instruction count for a scalar-processor/vector-coprocessor configuration at a vector length of sixteen 32-bit single-precision elements. Furthermore, a multi-vector SoC architecture consisting of ten such vector accelerators provides a near-linear theoretical performance benefit of the order of 88% in three out of four benchmark configurations which is orthogonal to the benefit realized by vectorization alone. We discuss in detail this potent architecture and present implementation data for the 2-way multi-processor VLSI macrocell.","PeriodicalId":436123,"journal":{"name":"IEEE Workshop on Signal Processing Systems Design and Implementation, 2005.","volume":"111 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2005-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"A system-on-chip vector multiprocessor for transmission line modelling acceleration\",\"authors\":\"V. Chouliaras, J. Flint, Yibin Li, J. Núñez-Yáñez\",\"doi\":\"10.1109/SIPS.2005.1579931\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We discuss a configurable, system-on-chip vector multiprocessor for accelerating the transmission line modeling (TLM) algorithm with an architecture capable of exploiting the two primary forms of parallelism in the code, thread and data level parallelism. Theoretical results demonstrate an order of magnitude reduction in the dynamic instruction count for a scalar-processor/vector-coprocessor configuration at a vector length of sixteen 32-bit single-precision elements. Furthermore, a multi-vector SoC architecture consisting of ten such vector accelerators provides a near-linear theoretical performance benefit of the order of 88% in three out of four benchmark configurations which is orthogonal to the benefit realized by vectorization alone. We discuss in detail this potent architecture and present implementation data for the 2-way multi-processor VLSI macrocell.\",\"PeriodicalId\":436123,\"journal\":{\"name\":\"IEEE Workshop on Signal Processing Systems Design and Implementation, 2005.\",\"volume\":\"111 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2005-11-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Workshop on Signal Processing Systems Design and Implementation, 2005.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SIPS.2005.1579931\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Workshop on Signal Processing Systems Design and Implementation, 2005.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SIPS.2005.1579931","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

摘要

我们讨论了一种可配置的片上系统矢量多处理器,用于加速传输线建模(TLM)算法,其架构能够利用代码、线程和数据级并行的两种主要并行形式。理论结果表明,在矢量长度为16个32位单精度元件的情况下,标量处理器/矢量协处理器配置的动态指令计数减少了一个数量级。此外,由十个这样的矢量加速器组成的多矢量SoC架构在四种基准配置中的三种中提供了88%的近线性理论性能优势,这与单独矢量化实现的优势是正交的。我们详细讨论了这种强大的架构,并给出了双向多处理器VLSI宏单元的实现数据。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
A system-on-chip vector multiprocessor for transmission line modelling acceleration
We discuss a configurable, system-on-chip vector multiprocessor for accelerating the transmission line modeling (TLM) algorithm with an architecture capable of exploiting the two primary forms of parallelism in the code, thread and data level parallelism. Theoretical results demonstrate an order of magnitude reduction in the dynamic instruction count for a scalar-processor/vector-coprocessor configuration at a vector length of sixteen 32-bit single-precision elements. Furthermore, a multi-vector SoC architecture consisting of ten such vector accelerators provides a near-linear theoretical performance benefit of the order of 88% in three out of four benchmark configurations which is orthogonal to the benefit realized by vectorization alone. We discuss in detail this potent architecture and present implementation data for the 2-way multi-processor VLSI macrocell.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信