硬件/软件协同设计的优化:与可配置处理器和FPGA技术相关

2007 Canadian Conference on Electrical and Computer Engineering Pub Date : 2007-04-22 DOI:10.1109/CCECE.2007.423

S. Xu, H. Pollitt-Smith

{"title":"硬件/软件协同设计的优化:与可配置处理器和FPGA技术相关","authors":"S. Xu, H. Pollitt-Smith","doi":"10.1109/CCECE.2007.423","DOIUrl":null,"url":null,"abstract":"This paper presents a methodology for optimization of HW/SW co-design based on emerging configurable processor and FPGA technologies. This methodology is illustrated by the optimization of a discrete cosine transform (DCT) for image compression based on Tensilica's Xtensa LX core and Xilinx Virtex-II Pro device. The various optimization processes of a 2-D DCT transform, including adding different processor instruction sets onto the base processor to speedup software execution, are described. The results show a 26.76 times speed increase by adding a 4-way SIMD (single instruction multiple data) instruction with moderate hardware cost for a simple 2-D DCT implementation. The optimized 4-way SIMD processor is implemented on the FPGA board to verify the design, and shows a further significant speedup for on-board calculation compared to instruction-set simulation results. The HW vs. SW optimization strategy, speed and HW cost trade-offs, etc. are presented.","PeriodicalId":183910,"journal":{"name":"2007 Canadian Conference on Electrical and Computer Engineering","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2007-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Optimization of HW/SW Co-Design: Relevance to Configurable Processor and FPGA Technology\",\"authors\":\"S. Xu, H. Pollitt-Smith\",\"doi\":\"10.1109/CCECE.2007.423\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper presents a methodology for optimization of HW/SW co-design based on emerging configurable processor and FPGA technologies. This methodology is illustrated by the optimization of a discrete cosine transform (DCT) for image compression based on Tensilica's Xtensa LX core and Xilinx Virtex-II Pro device. The various optimization processes of a 2-D DCT transform, including adding different processor instruction sets onto the base processor to speedup software execution, are described. The results show a 26.76 times speed increase by adding a 4-way SIMD (single instruction multiple data) instruction with moderate hardware cost for a simple 2-D DCT implementation. The optimized 4-way SIMD processor is implemented on the FPGA board to verify the design, and shows a further significant speedup for on-board calculation compared to instruction-set simulation results. The HW vs. SW optimization strategy, speed and HW cost trade-offs, etc. are presented.\",\"PeriodicalId\":183910,\"journal\":{\"name\":\"2007 Canadian Conference on Electrical and Computer Engineering\",\"volume\":\"18 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2007-04-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2007 Canadian Conference on Electrical and Computer Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CCECE.2007.423\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2007 Canadian Conference on Electrical and Computer Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCECE.2007.423","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

本文提出了一种基于新兴可配置处理器和FPGA技术的软硬件协同设计优化方法。该方法通过基于Tensilica的Xtensa LX核心和Xilinx Virtex-II Pro设备的图像压缩离散余弦变换(DCT)的优化来说明。描述了二维DCT变换的各种优化过程，包括在基本处理器上添加不同的处理器指令集以加速软件执行。结果表明，在一个简单的二维DCT实现中，通过添加一个4路SIMD(单指令多数据)指令，以中等的硬件成本，速度提高了26.76倍。优化后的4路SIMD处理器在FPGA板上实现以验证设计，并且与指令集仿真结果相比，显示出进一步显着的板上计算加速。介绍了硬件与软件的优化策略、速度和硬件成本权衡等。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Optimization of HW/SW Co-Design: Relevance to Configurable Processor and FPGA Technology

This paper presents a methodology for optimization of HW/SW co-design based on emerging configurable processor and FPGA technologies. This methodology is illustrated by the optimization of a discrete cosine transform (DCT) for image compression based on Tensilica's Xtensa LX core and Xilinx Virtex-II Pro device. The various optimization processes of a 2-D DCT transform, including adding different processor instruction sets onto the base processor to speedup software execution, are described. The results show a 26.76 times speed increase by adding a 4-way SIMD (single instruction multiple data) instruction with moderate hardware cost for a simple 2-D DCT implementation. The optimized 4-way SIMD processor is implemented on the FPGA board to verify the design, and shows a further significant speedup for on-board calculation compared to instruction-set simulation results. The HW vs. SW optimization strategy, speed and HW cost trade-offs, etc. are presented.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2007 Canadian Conference on Electrical and Computer Engineering

自引率

0.00%

发文量