{"title":"fpga的并行处理:分析对性能的影响","authors":"Xiaoguang Li, S. Areibi, R. Dony","doi":"10.1109/IWSOC.2006.348232","DOIUrl":null,"url":null,"abstract":"The processing elements, logic resources, and on-chip block RAMs of modern FPGAs can not only be used for prototyping custom hardware modules, but also for parallel processing purposes by implementing multiple processors for a single task. This paper compares the performance of a single-processor implementation with two types of dual-processor implementations for a widely used radix-2 n-point FFT algorithm (Kooley and Tuckey, 1965) in terms of processing speed and FPGA resource utilization. In the first dual-processor implementation, the partitioning is performed based on the computation complexity - O(nlog(n)) of the radix-2 FFT algorithm. In the second implementation, the partitioning is based on a detailed profiling procedure applied to each line of the code in the single-processor implementation. Results obtained show that the speedup of the first dual-processor implementation is on average 1.3times faster than the single-processor implementation, whereas the second dual-processor implementation is about 1.9times faster which is very close to the expected speedup. This result shows that detailed profiling is crucial in identifying the bottlenecks of an algorithm (i.e., all the factors are taken into consideration) and consequently the algorithm can be efficiently mapped on a multiprocessor system based on the correct decision","PeriodicalId":134742,"journal":{"name":"2006 6th International Workshop on System on Chip for Real Time Applications","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2006-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Parallel Processing on FPGAs: The Effect of Profiling on Performance\",\"authors\":\"Xiaoguang Li, S. Areibi, R. Dony\",\"doi\":\"10.1109/IWSOC.2006.348232\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The processing elements, logic resources, and on-chip block RAMs of modern FPGAs can not only be used for prototyping custom hardware modules, but also for parallel processing purposes by implementing multiple processors for a single task. This paper compares the performance of a single-processor implementation with two types of dual-processor implementations for a widely used radix-2 n-point FFT algorithm (Kooley and Tuckey, 1965) in terms of processing speed and FPGA resource utilization. In the first dual-processor implementation, the partitioning is performed based on the computation complexity - O(nlog(n)) of the radix-2 FFT algorithm. In the second implementation, the partitioning is based on a detailed profiling procedure applied to each line of the code in the single-processor implementation. Results obtained show that the speedup of the first dual-processor implementation is on average 1.3times faster than the single-processor implementation, whereas the second dual-processor implementation is about 1.9times faster which is very close to the expected speedup. This result shows that detailed profiling is crucial in identifying the bottlenecks of an algorithm (i.e., all the factors are taken into consideration) and consequently the algorithm can be efficiently mapped on a multiprocessor system based on the correct decision\",\"PeriodicalId\":134742,\"journal\":{\"name\":\"2006 6th International Workshop on System on Chip for Real Time Applications\",\"volume\":\"14 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2006-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2006 6th International Workshop on System on Chip for Real Time Applications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IWSOC.2006.348232\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2006 6th International Workshop on System on Chip for Real Time Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IWSOC.2006.348232","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Parallel Processing on FPGAs: The Effect of Profiling on Performance
The processing elements, logic resources, and on-chip block RAMs of modern FPGAs can not only be used for prototyping custom hardware modules, but also for parallel processing purposes by implementing multiple processors for a single task. This paper compares the performance of a single-processor implementation with two types of dual-processor implementations for a widely used radix-2 n-point FFT algorithm (Kooley and Tuckey, 1965) in terms of processing speed and FPGA resource utilization. In the first dual-processor implementation, the partitioning is performed based on the computation complexity - O(nlog(n)) of the radix-2 FFT algorithm. In the second implementation, the partitioning is based on a detailed profiling procedure applied to each line of the code in the single-processor implementation. Results obtained show that the speedup of the first dual-processor implementation is on average 1.3times faster than the single-processor implementation, whereas the second dual-processor implementation is about 1.9times faster which is very close to the expected speedup. This result shows that detailed profiling is crucial in identifying the bottlenecks of an algorithm (i.e., all the factors are taken into consideration) and consequently the algorithm can be efficiently mapped on a multiprocessor system based on the correct decision