{"title":"Genetic algorithms in software and in hardware-a performance analysis of workstation and custom computing machine implementations","authors":"P. Graham, B. Nelson","doi":"10.1109/FPGA.1996.564847","DOIUrl":null,"url":null,"abstract":"The paper analyzes the performance differences found between the hardware and software versions of a genetic algorithm used to solve the travelling salesman problem. The hardware implementation requires 4 FPGA's on a Splash 2 board and runs at 11 MHz. The software implementation was written in C++ and executed on a 125 MHz HP PA-RISC workstation. The software run time was more than four times that of the hardware (up to 50 times as many cycles). The paper analyses the contribution made to this performance difference by the following hardware features: hard-wired control, custom address generation logic, memory hierarchy efficiency, and both fine- and course-grained parallelism. The results indicate that the major contributor to the hardware performance advantage is fine-grained parallelism-RTL-level parallelism due to operator pipelining. This alone accounts for as much as a 38X cycle-count reduction over the software in one section of the algorithm. The next major contributors include hard-wired control and custom address generation which account for as much as a 3X speedup in other sections of the algorithm. Finally, memory hierarchy inefficiencies in the software (cache misses and paging) and coarse-grained parallelism in the hardware are each shown to have lesser effect on the performance difference between the implementations.","PeriodicalId":244873,"journal":{"name":"1996 Proceedings IEEE Symposium on FPGAs for Custom Computing Machines","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1996-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"94","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"1996 Proceedings IEEE Symposium on FPGAs for Custom Computing Machines","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FPGA.1996.564847","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 94
Abstract
The paper analyzes the performance differences found between the hardware and software versions of a genetic algorithm used to solve the travelling salesman problem. The hardware implementation requires 4 FPGA's on a Splash 2 board and runs at 11 MHz. The software implementation was written in C++ and executed on a 125 MHz HP PA-RISC workstation. The software run time was more than four times that of the hardware (up to 50 times as many cycles). The paper analyses the contribution made to this performance difference by the following hardware features: hard-wired control, custom address generation logic, memory hierarchy efficiency, and both fine- and course-grained parallelism. The results indicate that the major contributor to the hardware performance advantage is fine-grained parallelism-RTL-level parallelism due to operator pipelining. This alone accounts for as much as a 38X cycle-count reduction over the software in one section of the algorithm. The next major contributors include hard-wired control and custom address generation which account for as much as a 3X speedup in other sections of the algorithm. Finally, memory hierarchy inefficiencies in the software (cache misses and paging) and coarse-grained parallelism in the hardware are each shown to have lesser effect on the performance difference between the implementations.
本文分析了一种用于求解旅行商问题的遗传算法的硬件和软件版本之间的性能差异。硬件实现需要4个FPGA在一个Splash 2板上,运行频率为11mhz。软件实现用c++编写,在125 MHz HP PA-RISC工作站上运行。软件的运行时间是硬件的四倍多(最多是50倍的周期)。本文分析了以下硬件特性对这种性能差异的贡献:硬连接控制、自定义地址生成逻辑、内存层次结构效率以及细粒度和粗粒度并行性。结果表明,硬件性能优势的主要贡献者是细粒度并行性——由于操作符流水线而产生的rtl级并行性。仅这一点就可以在算法的一个部分中减少多达38倍的周期计数。下一个主要贡献者包括硬连线控制和自定义地址生成,它们在算法的其他部分中占了多达3倍的加速。最后,软件中的内存层次结构效率低下(缓存丢失和分页)和硬件中的粗粒度并行性对实现之间的性能差异的影响都较小。