{"title":"图像处理应用中基于fpga的远程协处理器的性能","authors":"Domingo Benítez","doi":"10.1109/DSD.2002.1115378","DOIUrl":null,"url":null,"abstract":"This paper describes a performance evaluation of image-processing applications on FPGA-based coprocessors that are part of general-purpose computers. Our experiments show that the maximum speed-up depends on the amount of data processed by the coprocessor. Taking images with 256/spl times/256 pixels, a moderate FPGA capacity of 10E+5 CLBs provides two orders of magnitude of performance improvement over a Pentium III processor for most of our benchmarks. However, memory organization and host bus degrade these results. Those benchmarks that can exhibit high performance improvement would require about 200 memory banks of 256 bytes and a host bandwidth as high as 30 GB/s. Based on our quantitative approach, it can be explained why some currently available FPGA-based coprocessors do not provide the achievable level of performance for some image-processing applications.","PeriodicalId":330609,"journal":{"name":"Proceedings Euromicro Symposium on Digital System Design. Architectures, Methods and Tools","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2002-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Performance of remote FPGA-based coprocessors for image-processing applications\",\"authors\":\"Domingo Benítez\",\"doi\":\"10.1109/DSD.2002.1115378\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper describes a performance evaluation of image-processing applications on FPGA-based coprocessors that are part of general-purpose computers. Our experiments show that the maximum speed-up depends on the amount of data processed by the coprocessor. Taking images with 256/spl times/256 pixels, a moderate FPGA capacity of 10E+5 CLBs provides two orders of magnitude of performance improvement over a Pentium III processor for most of our benchmarks. However, memory organization and host bus degrade these results. Those benchmarks that can exhibit high performance improvement would require about 200 memory banks of 256 bytes and a host bandwidth as high as 30 GB/s. Based on our quantitative approach, it can be explained why some currently available FPGA-based coprocessors do not provide the achievable level of performance for some image-processing applications.\",\"PeriodicalId\":330609,\"journal\":{\"name\":\"Proceedings Euromicro Symposium on Digital System Design. Architectures, Methods and Tools\",\"volume\":\"14 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2002-09-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings Euromicro Symposium on Digital System Design. Architectures, Methods and Tools\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/DSD.2002.1115378\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings Euromicro Symposium on Digital System Design. Architectures, Methods and Tools","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DSD.2002.1115378","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Performance of remote FPGA-based coprocessors for image-processing applications
This paper describes a performance evaluation of image-processing applications on FPGA-based coprocessors that are part of general-purpose computers. Our experiments show that the maximum speed-up depends on the amount of data processed by the coprocessor. Taking images with 256/spl times/256 pixels, a moderate FPGA capacity of 10E+5 CLBs provides two orders of magnitude of performance improvement over a Pentium III processor for most of our benchmarks. However, memory organization and host bus degrade these results. Those benchmarks that can exhibit high performance improvement would require about 200 memory banks of 256 bytes and a host bandwidth as high as 30 GB/s. Based on our quantitative approach, it can be explained why some currently available FPGA-based coprocessors do not provide the achievable level of performance for some image-processing applications.