D. Stevens, N. Glynn, Panagiotis Galiatsatos, V. Chouliaras, D. Reisis
{"title":"Evaluating the performance of a configurable, extensible VLIW processor in FFT execution","authors":"D. Stevens, N. Glynn, Panagiotis Galiatsatos, V. Chouliaras, D. Reisis","doi":"10.1109/ICECS.2009.5410773","DOIUrl":null,"url":null,"abstract":"This paper presents the setup and the evaluation of the LE1 configurable, extensible, multi-cluster VLIW processor system in FFT execution. The input code is a C implementation of the FFT algorithm and we evaluate its performance on the LE1 simulator for multiple CPU configurations (issue width, execution resource mix, custom instruction) and compiler optimizations (inlining, loop unrolling) in an effort to optimize the cycle count. We identify the prevailing LE1 configurations, with respect to the FFT cycle performance, their silicon area and the power dissipation. Finally, we compare these results to a fully systolic single datapath delay feedback (SDF) VLSI FFT architecture derived from the same C code.","PeriodicalId":343974,"journal":{"name":"2009 16th IEEE International Conference on Electronics, Circuits and Systems - (ICECS 2009)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 16th IEEE International Conference on Electronics, Circuits and Systems - (ICECS 2009)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICECS.2009.5410773","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5
Abstract
This paper presents the setup and the evaluation of the LE1 configurable, extensible, multi-cluster VLIW processor system in FFT execution. The input code is a C implementation of the FFT algorithm and we evaluate its performance on the LE1 simulator for multiple CPU configurations (issue width, execution resource mix, custom instruction) and compiler optimizations (inlining, loop unrolling) in an effort to optimize the cycle count. We identify the prevailing LE1 configurations, with respect to the FFT cycle performance, their silicon area and the power dissipation. Finally, we compare these results to a fully systolic single datapath delay feedback (SDF) VLSI FFT architecture derived from the same C code.