节能嵌入式视觉应用的架构探索:从通用处理器到特定领域加速器

2016 IEEE Computer Society Annual Symposium on VLSI (ISVLSI) Pub Date : 2016-07-01 DOI:10.1109/ISVLSI.2016.112

Maria Malik, Farnoud Farahmand, P. Otto, N. Akhlaghi, T. Mohsenin, S. Sikdar, H. Homayoun

{"title":"节能嵌入式视觉应用的架构探索:从通用处理器到特定领域加速器","authors":"Maria Malik, Farnoud Farahmand, P. Otto, N. Akhlaghi, T. Mohsenin, S. Sikdar, H. Homayoun","doi":"10.1109/ISVLSI.2016.112","DOIUrl":null,"url":null,"abstract":"OpenCV applications are computationally intensive tasks among computer vision algorithms. The demand for low power yet high performance real-time processing of OpenCV embedded vision applications have led to developing their customized implementations on state-of-the-art embedded processing platforms. Given the industry move to heterogeneous platforms which integrates single core or multicore CPU with on-chip FPGA accelerators and GPU accelerators, the question of what platform and what implementation, whether hardware or software, is best suited for energy-efficient processing of this class of applications is becoming important. In this paper, we seek to answer this question through a detailed hardware and software implementation of OpenCV applications and methodically measurement and comprehensive analysis of their power and performance on state-of-the-art heterogeneous embedded processing platforms. The results show that in addition to application behavior, the size of image is an important factor in deciding the efficient platform in terms of highest energy-efficiency (EDP) among hardware accelerators on FPGA and software accelerators on GPU and multicore CPUs. While hardware implementation on ZYNQ shown to be the most performance and energy-efficient for image size of 500x500 or less, software GPU implementation found to be the most efficient and achieves highest speedup for larger image sizes. In addition, while for compute intensive vision applications the gap between FPGA, CPU and GPU reduces as the size of image increases, for non-intensive applications, a large performance and EDP gap is observed between the studied platforms, as the size of the image increases.","PeriodicalId":140647,"journal":{"name":"2016 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":"{\"title\":\"Architecture Exploration for Energy-Efficient Embedded Vision Applications: From General Purpose Processor to Domain Specific Accelerator\",\"authors\":\"Maria Malik, Farnoud Farahmand, P. Otto, N. Akhlaghi, T. Mohsenin, S. Sikdar, H. Homayoun\",\"doi\":\"10.1109/ISVLSI.2016.112\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"OpenCV applications are computationally intensive tasks among computer vision algorithms. The demand for low power yet high performance real-time processing of OpenCV embedded vision applications have led to developing their customized implementations on state-of-the-art embedded processing platforms. Given the industry move to heterogeneous platforms which integrates single core or multicore CPU with on-chip FPGA accelerators and GPU accelerators, the question of what platform and what implementation, whether hardware or software, is best suited for energy-efficient processing of this class of applications is becoming important. In this paper, we seek to answer this question through a detailed hardware and software implementation of OpenCV applications and methodically measurement and comprehensive analysis of their power and performance on state-of-the-art heterogeneous embedded processing platforms. The results show that in addition to application behavior, the size of image is an important factor in deciding the efficient platform in terms of highest energy-efficiency (EDP) among hardware accelerators on FPGA and software accelerators on GPU and multicore CPUs. While hardware implementation on ZYNQ shown to be the most performance and energy-efficient for image size of 500x500 or less, software GPU implementation found to be the most efficient and achieves highest speedup for larger image sizes. In addition, while for compute intensive vision applications the gap between FPGA, CPU and GPU reduces as the size of image increases, for non-intensive applications, a large performance and EDP gap is observed between the studied platforms, as the size of the image increases.\",\"PeriodicalId\":140647,\"journal\":{\"name\":\"2016 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)\",\"volume\":\"45 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"14\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISVLSI.2016.112\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISVLSI.2016.112","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 14

摘要

OpenCV应用程序是计算机视觉算法中的计算密集型任务。对低功耗高性能实时处理的OpenCV嵌入式视觉应用程序的需求导致了在最先进的嵌入式处理平台上开发他们的定制实现。考虑到行业转向异构平台(将单核或多核CPU与片上FPGA加速器和GPU加速器集成在一起)，哪种平台和实现(无论是硬件还是软件)最适合这类应用的节能处理的问题变得越来越重要。在本文中，我们试图通过OpenCV应用程序的详细硬件和软件实现，以及系统地测量和全面分析其在最先进的异构嵌入式处理平台上的功率和性能来回答这个问题。结果表明，在FPGA上的硬件加速器、GPU和多核cpu上的软件加速器中，图像的大小是决定高效平台最高能效(EDP)的重要因素。虽然ZYNQ上的硬件实现对于500x500或更小的图像尺寸显示出最高的性能和节能，但软件GPU实现被发现是最有效的，并且在更大的图像尺寸上实现了最高的加速。此外，对于计算密集型视觉应用，FPGA、CPU和GPU之间的差距随着图像大小的增加而减小，而对于非密集型应用，随着图像大小的增加，所研究的平台之间的性能和EDP差距很大。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Architecture Exploration for Energy-Efficient Embedded Vision Applications: From General Purpose Processor to Domain Specific Accelerator

OpenCV applications are computationally intensive tasks among computer vision algorithms. The demand for low power yet high performance real-time processing of OpenCV embedded vision applications have led to developing their customized implementations on state-of-the-art embedded processing platforms. Given the industry move to heterogeneous platforms which integrates single core or multicore CPU with on-chip FPGA accelerators and GPU accelerators, the question of what platform and what implementation, whether hardware or software, is best suited for energy-efficient processing of this class of applications is becoming important. In this paper, we seek to answer this question through a detailed hardware and software implementation of OpenCV applications and methodically measurement and comprehensive analysis of their power and performance on state-of-the-art heterogeneous embedded processing platforms. The results show that in addition to application behavior, the size of image is an important factor in deciding the efficient platform in terms of highest energy-efficiency (EDP) among hardware accelerators on FPGA and software accelerators on GPU and multicore CPUs. While hardware implementation on ZYNQ shown to be the most performance and energy-efficient for image size of 500x500 or less, software GPU implementation found to be the most efficient and achieves highest speedup for larger image sizes. In addition, while for compute intensive vision applications the gap between FPGA, CPU and GPU reduces as the size of image increases, for non-intensive applications, a large performance and EDP gap is observed between the studied platforms, as the size of the image increases.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2016 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)

自引率

0.00%

发文量