{"title":"A VLIW-Vector co-processor design for accelerating Basic Linear Algebraic Operations in OpenCV","authors":"Venkata Ganapathi Puppala","doi":"10.1109/ISVDAT.2014.6881085","DOIUrl":null,"url":null,"abstract":"OpenCV is a widely used computer vision library written in C++. Basic Linear Algebraic Operations (BLAOP) involving matrices are at the heart of OpenCV. Though OpenCV provides ubiquity in the computer vision field, it runs slow when ported on embedded processors. Accelerating the LAOPs using a co-processor certainly helps improving the throughput. In this paper we present a floating point VLIW-Vector Co-processor Architecture with Vector Floating Point Datapath (VFPDP) and a 4-slot VLIW processor core to accelerate BLAOps achieving performance of two GFLOPS when run at 500MHz clock frequency. We also demonstrate a detailed mapping strategy of One sided Jacobi Singular Value Decomposition (OJSVD) algorithm onto the proposed architecture. The proposed architecture is designed using Verilog HDL and it is synthesized using Synopsis Design Compiler with 28nm TSMC target libraries. The clock period is set to 2ns and the timing constraints are met. Using the Altera's SOPC builder, an experimental system is created with the co-processor interfaced to the NIOS II soft processor and implemented in Cyclone IV FPGA. The OJSVD algorithm is ported onto both the standalone NIOS II processor based system and the system with the proposed co-processor. The results show that 15X performance improvement achieved with this co-processor.","PeriodicalId":217280,"journal":{"name":"18th International Symposium on VLSI Design and Test","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"18th International Symposium on VLSI Design and Test","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISVDAT.2014.6881085","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
OpenCV is a widely used computer vision library written in C++. Basic Linear Algebraic Operations (BLAOP) involving matrices are at the heart of OpenCV. Though OpenCV provides ubiquity in the computer vision field, it runs slow when ported on embedded processors. Accelerating the LAOPs using a co-processor certainly helps improving the throughput. In this paper we present a floating point VLIW-Vector Co-processor Architecture with Vector Floating Point Datapath (VFPDP) and a 4-slot VLIW processor core to accelerate BLAOps achieving performance of two GFLOPS when run at 500MHz clock frequency. We also demonstrate a detailed mapping strategy of One sided Jacobi Singular Value Decomposition (OJSVD) algorithm onto the proposed architecture. The proposed architecture is designed using Verilog HDL and it is synthesized using Synopsis Design Compiler with 28nm TSMC target libraries. The clock period is set to 2ns and the timing constraints are met. Using the Altera's SOPC builder, an experimental system is created with the co-processor interfaced to the NIOS II soft processor and implemented in Cyclone IV FPGA. The OJSVD algorithm is ported onto both the standalone NIOS II processor based system and the system with the proposed co-processor. The results show that 15X performance improvement achieved with this co-processor.