{"title":"用流水线VLIW单元加速多处理器可重构体系结构","authors":"A. Azevedo, L. Agostini, F. Wagner, S. Bampi","doi":"10.1109/RSP.2005.10","DOIUrl":null,"url":null,"abstract":"The X4CP32 is an architecture that combines the parallel and reconfigurable paradigms. It consists of a grid of reconfigurable and programming units (RPUs), each one containing 4 cells (including a microprocessor in each cell), responsible for all the processing and program flow. This paper presents architectural modifications in the X4CP32 in order to increase its performance. The RPU was implemented according to the VLIW (very long instruction word) methodology, and the cells were redesigned with a pipelined implementation. These improvements raised the maximum IPC of the RPU from 0.5 to 4 with an area overhead of 26%. To evaluate the new architecture, versions of the 2D discrete cosine transform, Montgomery modular multiplication and color space conversion were mapped, using the baseline architecture and the pipelined VLIW architecture.","PeriodicalId":262048,"journal":{"name":"16th IEEE International Workshop on Rapid System Prototyping (RSP'05)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2005-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Accelerating a multiprocessor reconfigurable architecture with pipelined VLIW units\",\"authors\":\"A. Azevedo, L. Agostini, F. Wagner, S. Bampi\",\"doi\":\"10.1109/RSP.2005.10\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The X4CP32 is an architecture that combines the parallel and reconfigurable paradigms. It consists of a grid of reconfigurable and programming units (RPUs), each one containing 4 cells (including a microprocessor in each cell), responsible for all the processing and program flow. This paper presents architectural modifications in the X4CP32 in order to increase its performance. The RPU was implemented according to the VLIW (very long instruction word) methodology, and the cells were redesigned with a pipelined implementation. These improvements raised the maximum IPC of the RPU from 0.5 to 4 with an area overhead of 26%. To evaluate the new architecture, versions of the 2D discrete cosine transform, Montgomery modular multiplication and color space conversion were mapped, using the baseline architecture and the pipelined VLIW architecture.\",\"PeriodicalId\":262048,\"journal\":{\"name\":\"16th IEEE International Workshop on Rapid System Prototyping (RSP'05)\",\"volume\":\"41 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2005-06-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"16th IEEE International Workshop on Rapid System Prototyping (RSP'05)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/RSP.2005.10\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"16th IEEE International Workshop on Rapid System Prototyping (RSP'05)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/RSP.2005.10","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Accelerating a multiprocessor reconfigurable architecture with pipelined VLIW units
The X4CP32 is an architecture that combines the parallel and reconfigurable paradigms. It consists of a grid of reconfigurable and programming units (RPUs), each one containing 4 cells (including a microprocessor in each cell), responsible for all the processing and program flow. This paper presents architectural modifications in the X4CP32 in order to increase its performance. The RPU was implemented according to the VLIW (very long instruction word) methodology, and the cells were redesigned with a pipelined implementation. These improvements raised the maximum IPC of the RPU from 0.5 to 4 with an area overhead of 26%. To evaluate the new architecture, versions of the 2D discrete cosine transform, Montgomery modular multiplication and color space conversion were mapped, using the baseline architecture and the pipelined VLIW architecture.