Hsin-Jung Yang, Kermin Fleming, Michael Adler, J. Emer
{"title":"Optimizing under abstraction: Using prefetching to improve FPGA performance","authors":"Hsin-Jung Yang, Kermin Fleming, Michael Adler, J. Emer","doi":"10.1109/FPL.2013.6645522","DOIUrl":null,"url":null,"abstract":"In an effort to speed the development of FPGA-based accelerators, recent research has focused on providing FPGA developers with memory and communications abstractions. Because abstraction decouples the function of these interfaces from their implementation, these new interfaces present an enormous opportunity for optimization. In this paper we examine stride prefetching as a means of improving the performance of an automatically synthesized, abstract memory hierarchy. We demonstrate, by applying our technique to several large benchmarks, that prefetching can improve preexisting application runtime by 15% on average, and up to 40%, without requiring program modification.","PeriodicalId":200435,"journal":{"name":"2013 23rd International Conference on Field programmable Logic and Applications","volume":"48 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 23rd International Conference on Field programmable Logic and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FPL.2013.6645522","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8
Abstract
In an effort to speed the development of FPGA-based accelerators, recent research has focused on providing FPGA developers with memory and communications abstractions. Because abstraction decouples the function of these interfaces from their implementation, these new interfaces present an enormous opportunity for optimization. In this paper we examine stride prefetching as a means of improving the performance of an automatically synthesized, abstract memory hierarchy. We demonstrate, by applying our technique to several large benchmarks, that prefetching can improve preexisting application runtime by 15% on average, and up to 40%, without requiring program modification.