{"title":"A hardware cache memcpy accelerator","authors":"Stephan Wong, F. Duarte, S. Vassiliadis","doi":"10.1109/FPT.2006.270305","DOIUrl":null,"url":null,"abstract":"In this paper, we present a hardware solution to perform the commonly used memcpy operation with the goal to reduce the time to perform the actual memory copies. This is accomplished by taking advantage of the presence of a cache that is found next to many current-day (embedded) processors. Additionally, the currently presented solution assumes that to be copied data is already in the cache and is aligned by the cache-line size. We present the concept and implementation details of the proposed hardware module and the system used to experiment both our hardware and an optimized software implementation of the memcpy function. Experimental results show that the proposed hardware solution is at least 79% faster than an optimized hand-coded software solution","PeriodicalId":354940,"journal":{"name":"2006 IEEE International Conference on Field Programmable Technology","volume":"222 ","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2006-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2006 IEEE International Conference on Field Programmable Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FPT.2006.270305","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 15
Abstract
In this paper, we present a hardware solution to perform the commonly used memcpy operation with the goal to reduce the time to perform the actual memory copies. This is accomplished by taking advantage of the presence of a cache that is found next to many current-day (embedded) processors. Additionally, the currently presented solution assumes that to be copied data is already in the cache and is aligned by the cache-line size. We present the concept and implementation details of the proposed hardware module and the system used to experiment both our hardware and an optimized software implementation of the memcpy function. Experimental results show that the proposed hardware solution is at least 79% faster than an optimized hand-coded software solution