Genetic algorithms in software and in hardware-a performance analysis of workstation and custom computing machine implementations

P. Graham, B. Nelson
{"title":"Genetic algorithms in software and in hardware-a performance analysis of workstation and custom computing machine implementations","authors":"P. Graham, B. Nelson","doi":"10.1109/FPGA.1996.564847","DOIUrl":null,"url":null,"abstract":"The paper analyzes the performance differences found between the hardware and software versions of a genetic algorithm used to solve the travelling salesman problem. The hardware implementation requires 4 FPGA's on a Splash 2 board and runs at 11 MHz. The software implementation was written in C++ and executed on a 125 MHz HP PA-RISC workstation. The software run time was more than four times that of the hardware (up to 50 times as many cycles). The paper analyses the contribution made to this performance difference by the following hardware features: hard-wired control, custom address generation logic, memory hierarchy efficiency, and both fine- and course-grained parallelism. The results indicate that the major contributor to the hardware performance advantage is fine-grained parallelism-RTL-level parallelism due to operator pipelining. This alone accounts for as much as a 38X cycle-count reduction over the software in one section of the algorithm. The next major contributors include hard-wired control and custom address generation which account for as much as a 3X speedup in other sections of the algorithm. Finally, memory hierarchy inefficiencies in the software (cache misses and paging) and coarse-grained parallelism in the hardware are each shown to have lesser effect on the performance difference between the implementations.","PeriodicalId":244873,"journal":{"name":"1996 Proceedings IEEE Symposium on FPGAs for Custom Computing Machines","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1996-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"94","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"1996 Proceedings IEEE Symposium on FPGAs for Custom Computing Machines","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FPGA.1996.564847","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 94

Abstract

The paper analyzes the performance differences found between the hardware and software versions of a genetic algorithm used to solve the travelling salesman problem. The hardware implementation requires 4 FPGA's on a Splash 2 board and runs at 11 MHz. The software implementation was written in C++ and executed on a 125 MHz HP PA-RISC workstation. The software run time was more than four times that of the hardware (up to 50 times as many cycles). The paper analyses the contribution made to this performance difference by the following hardware features: hard-wired control, custom address generation logic, memory hierarchy efficiency, and both fine- and course-grained parallelism. The results indicate that the major contributor to the hardware performance advantage is fine-grained parallelism-RTL-level parallelism due to operator pipelining. This alone accounts for as much as a 38X cycle-count reduction over the software in one section of the algorithm. The next major contributors include hard-wired control and custom address generation which account for as much as a 3X speedup in other sections of the algorithm. Finally, memory hierarchy inefficiencies in the software (cache misses and paging) and coarse-grained parallelism in the hardware are each shown to have lesser effect on the performance difference between the implementations.
软件和硬件中的遗传算法——工作站和自定义计算机实现的性能分析
本文分析了一种用于求解旅行商问题的遗传算法的硬件和软件版本之间的性能差异。硬件实现需要4个FPGA在一个Splash 2板上,运行频率为11mhz。软件实现用c++编写,在125 MHz HP PA-RISC工作站上运行。软件的运行时间是硬件的四倍多(最多是50倍的周期)。本文分析了以下硬件特性对这种性能差异的贡献:硬连接控制、自定义地址生成逻辑、内存层次结构效率以及细粒度和粗粒度并行性。结果表明,硬件性能优势的主要贡献者是细粒度并行性——由于操作符流水线而产生的rtl级并行性。仅这一点就可以在算法的一个部分中减少多达38倍的周期计数。下一个主要贡献者包括硬连线控制和自定义地址生成,它们在算法的其他部分中占了多达3倍的加速。最后,软件中的内存层次结构效率低下(缓存丢失和分页)和硬件中的粗粒度并行性对实现之间的性能差异的影响都较小。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信