{"title":"硬件/软件协同设计的优化:与可配置处理器和FPGA技术相关","authors":"S. Xu, H. Pollitt-Smith","doi":"10.1109/CCECE.2007.423","DOIUrl":null,"url":null,"abstract":"This paper presents a methodology for optimization of HW/SW co-design based on emerging configurable processor and FPGA technologies. This methodology is illustrated by the optimization of a discrete cosine transform (DCT) for image compression based on Tensilica's Xtensa LX core and Xilinx Virtex-II Pro device. The various optimization processes of a 2-D DCT transform, including adding different processor instruction sets onto the base processor to speedup software execution, are described. The results show a 26.76 times speed increase by adding a 4-way SIMD (single instruction multiple data) instruction with moderate hardware cost for a simple 2-D DCT implementation. The optimized 4-way SIMD processor is implemented on the FPGA board to verify the design, and shows a further significant speedup for on-board calculation compared to instruction-set simulation results. The HW vs. SW optimization strategy, speed and HW cost trade-offs, etc. are presented.","PeriodicalId":183910,"journal":{"name":"2007 Canadian Conference on Electrical and Computer Engineering","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2007-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Optimization of HW/SW Co-Design: Relevance to Configurable Processor and FPGA Technology\",\"authors\":\"S. Xu, H. Pollitt-Smith\",\"doi\":\"10.1109/CCECE.2007.423\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper presents a methodology for optimization of HW/SW co-design based on emerging configurable processor and FPGA technologies. This methodology is illustrated by the optimization of a discrete cosine transform (DCT) for image compression based on Tensilica's Xtensa LX core and Xilinx Virtex-II Pro device. The various optimization processes of a 2-D DCT transform, including adding different processor instruction sets onto the base processor to speedup software execution, are described. The results show a 26.76 times speed increase by adding a 4-way SIMD (single instruction multiple data) instruction with moderate hardware cost for a simple 2-D DCT implementation. The optimized 4-way SIMD processor is implemented on the FPGA board to verify the design, and shows a further significant speedup for on-board calculation compared to instruction-set simulation results. The HW vs. SW optimization strategy, speed and HW cost trade-offs, etc. are presented.\",\"PeriodicalId\":183910,\"journal\":{\"name\":\"2007 Canadian Conference on Electrical and Computer Engineering\",\"volume\":\"18 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2007-04-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2007 Canadian Conference on Electrical and Computer Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CCECE.2007.423\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2007 Canadian Conference on Electrical and Computer Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCECE.2007.423","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Optimization of HW/SW Co-Design: Relevance to Configurable Processor and FPGA Technology
This paper presents a methodology for optimization of HW/SW co-design based on emerging configurable processor and FPGA technologies. This methodology is illustrated by the optimization of a discrete cosine transform (DCT) for image compression based on Tensilica's Xtensa LX core and Xilinx Virtex-II Pro device. The various optimization processes of a 2-D DCT transform, including adding different processor instruction sets onto the base processor to speedup software execution, are described. The results show a 26.76 times speed increase by adding a 4-way SIMD (single instruction multiple data) instruction with moderate hardware cost for a simple 2-D DCT implementation. The optimized 4-way SIMD processor is implemented on the FPGA board to verify the design, and shows a further significant speedup for on-board calculation compared to instruction-set simulation results. The HW vs. SW optimization strategy, speed and HW cost trade-offs, etc. are presented.