{"title":"用于传输线建模加速的片上系统矢量多处理器","authors":"V. Chouliaras, J. Flint, Yibin Li, J. Núñez-Yáñez","doi":"10.1109/SIPS.2005.1579931","DOIUrl":null,"url":null,"abstract":"We discuss a configurable, system-on-chip vector multiprocessor for accelerating the transmission line modeling (TLM) algorithm with an architecture capable of exploiting the two primary forms of parallelism in the code, thread and data level parallelism. Theoretical results demonstrate an order of magnitude reduction in the dynamic instruction count for a scalar-processor/vector-coprocessor configuration at a vector length of sixteen 32-bit single-precision elements. Furthermore, a multi-vector SoC architecture consisting of ten such vector accelerators provides a near-linear theoretical performance benefit of the order of 88% in three out of four benchmark configurations which is orthogonal to the benefit realized by vectorization alone. We discuss in detail this potent architecture and present implementation data for the 2-way multi-processor VLSI macrocell.","PeriodicalId":436123,"journal":{"name":"IEEE Workshop on Signal Processing Systems Design and Implementation, 2005.","volume":"111 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2005-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"A system-on-chip vector multiprocessor for transmission line modelling acceleration\",\"authors\":\"V. Chouliaras, J. Flint, Yibin Li, J. Núñez-Yáñez\",\"doi\":\"10.1109/SIPS.2005.1579931\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We discuss a configurable, system-on-chip vector multiprocessor for accelerating the transmission line modeling (TLM) algorithm with an architecture capable of exploiting the two primary forms of parallelism in the code, thread and data level parallelism. Theoretical results demonstrate an order of magnitude reduction in the dynamic instruction count for a scalar-processor/vector-coprocessor configuration at a vector length of sixteen 32-bit single-precision elements. Furthermore, a multi-vector SoC architecture consisting of ten such vector accelerators provides a near-linear theoretical performance benefit of the order of 88% in three out of four benchmark configurations which is orthogonal to the benefit realized by vectorization alone. We discuss in detail this potent architecture and present implementation data for the 2-way multi-processor VLSI macrocell.\",\"PeriodicalId\":436123,\"journal\":{\"name\":\"IEEE Workshop on Signal Processing Systems Design and Implementation, 2005.\",\"volume\":\"111 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2005-11-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Workshop on Signal Processing Systems Design and Implementation, 2005.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SIPS.2005.1579931\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Workshop on Signal Processing Systems Design and Implementation, 2005.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SIPS.2005.1579931","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A system-on-chip vector multiprocessor for transmission line modelling acceleration
We discuss a configurable, system-on-chip vector multiprocessor for accelerating the transmission line modeling (TLM) algorithm with an architecture capable of exploiting the two primary forms of parallelism in the code, thread and data level parallelism. Theoretical results demonstrate an order of magnitude reduction in the dynamic instruction count for a scalar-processor/vector-coprocessor configuration at a vector length of sixteen 32-bit single-precision elements. Furthermore, a multi-vector SoC architecture consisting of ten such vector accelerators provides a near-linear theoretical performance benefit of the order of 88% in three out of four benchmark configurations which is orthogonal to the benefit realized by vectorization alone. We discuss in detail this potent architecture and present implementation data for the 2-way multi-processor VLSI macrocell.