在OpenCV中加速基本线性代数运算的VLIW-Vector协处理器设计

Venkata Ganapathi Puppala
{"title":"在OpenCV中加速基本线性代数运算的VLIW-Vector协处理器设计","authors":"Venkata Ganapathi Puppala","doi":"10.1109/ISVDAT.2014.6881085","DOIUrl":null,"url":null,"abstract":"OpenCV is a widely used computer vision library written in C++. Basic Linear Algebraic Operations (BLAOP) involving matrices are at the heart of OpenCV. Though OpenCV provides ubiquity in the computer vision field, it runs slow when ported on embedded processors. Accelerating the LAOPs using a co-processor certainly helps improving the throughput. In this paper we present a floating point VLIW-Vector Co-processor Architecture with Vector Floating Point Datapath (VFPDP) and a 4-slot VLIW processor core to accelerate BLAOps achieving performance of two GFLOPS when run at 500MHz clock frequency. We also demonstrate a detailed mapping strategy of One sided Jacobi Singular Value Decomposition (OJSVD) algorithm onto the proposed architecture. The proposed architecture is designed using Verilog HDL and it is synthesized using Synopsis Design Compiler with 28nm TSMC target libraries. The clock period is set to 2ns and the timing constraints are met. Using the Altera's SOPC builder, an experimental system is created with the co-processor interfaced to the NIOS II soft processor and implemented in Cyclone IV FPGA. The OJSVD algorithm is ported onto both the standalone NIOS II processor based system and the system with the proposed co-processor. The results show that 15X performance improvement achieved with this co-processor.","PeriodicalId":217280,"journal":{"name":"18th International Symposium on VLSI Design and Test","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"A VLIW-Vector co-processor design for accelerating Basic Linear Algebraic Operations in OpenCV\",\"authors\":\"Venkata Ganapathi Puppala\",\"doi\":\"10.1109/ISVDAT.2014.6881085\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"OpenCV is a widely used computer vision library written in C++. Basic Linear Algebraic Operations (BLAOP) involving matrices are at the heart of OpenCV. Though OpenCV provides ubiquity in the computer vision field, it runs slow when ported on embedded processors. Accelerating the LAOPs using a co-processor certainly helps improving the throughput. In this paper we present a floating point VLIW-Vector Co-processor Architecture with Vector Floating Point Datapath (VFPDP) and a 4-slot VLIW processor core to accelerate BLAOps achieving performance of two GFLOPS when run at 500MHz clock frequency. We also demonstrate a detailed mapping strategy of One sided Jacobi Singular Value Decomposition (OJSVD) algorithm onto the proposed architecture. The proposed architecture is designed using Verilog HDL and it is synthesized using Synopsis Design Compiler with 28nm TSMC target libraries. The clock period is set to 2ns and the timing constraints are met. Using the Altera's SOPC builder, an experimental system is created with the co-processor interfaced to the NIOS II soft processor and implemented in Cyclone IV FPGA. The OJSVD algorithm is ported onto both the standalone NIOS II processor based system and the system with the proposed co-processor. The results show that 15X performance improvement achieved with this co-processor.\",\"PeriodicalId\":217280,\"journal\":{\"name\":\"18th International Symposium on VLSI Design and Test\",\"volume\":\"12 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-08-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"18th International Symposium on VLSI Design and Test\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISVDAT.2014.6881085\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"18th International Symposium on VLSI Design and Test","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISVDAT.2014.6881085","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

摘要

OpenCV是一个使用c++编写的广泛使用的计算机视觉库。涉及矩阵的基本线性代数运算(BLAOP)是OpenCV的核心。尽管OpenCV在计算机视觉领域提供了无处不在的应用,但它在移植到嵌入式处理器上时运行缓慢。使用协处理器加速LAOPs当然有助于提高吞吐量。在本文中,我们提出了一个带有矢量浮点数据路径(VFPDP)和4槽VLIW处理器核心的浮点VLIW-矢量协处理器架构,以加速BLAOps,在500MHz时钟频率下运行时达到两个GFLOPS的性能。我们还演示了单侧Jacobi奇异值分解(OJSVD)算法到所提出的体系结构的详细映射策略。该架构采用Verilog HDL进行设计,并采用28纳米TSMC目标库的概要设计编译器进行合成。时钟周期设置为2ns,满足时间约束。利用Altera的SOPC构建器,通过NIOS II软处理器接口的协处理器创建了一个实验系统,并在Cyclone IV FPGA中实现。OJSVD算法既可以移植到基于NIOS II处理器的独立系统上,也可以移植到带有拟议协处理器的系统上。结果表明,该协处理器的性能提高了15倍。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
A VLIW-Vector co-processor design for accelerating Basic Linear Algebraic Operations in OpenCV
OpenCV is a widely used computer vision library written in C++. Basic Linear Algebraic Operations (BLAOP) involving matrices are at the heart of OpenCV. Though OpenCV provides ubiquity in the computer vision field, it runs slow when ported on embedded processors. Accelerating the LAOPs using a co-processor certainly helps improving the throughput. In this paper we present a floating point VLIW-Vector Co-processor Architecture with Vector Floating Point Datapath (VFPDP) and a 4-slot VLIW processor core to accelerate BLAOps achieving performance of two GFLOPS when run at 500MHz clock frequency. We also demonstrate a detailed mapping strategy of One sided Jacobi Singular Value Decomposition (OJSVD) algorithm onto the proposed architecture. The proposed architecture is designed using Verilog HDL and it is synthesized using Synopsis Design Compiler with 28nm TSMC target libraries. The clock period is set to 2ns and the timing constraints are met. Using the Altera's SOPC builder, an experimental system is created with the co-processor interfaced to the NIOS II soft processor and implemented in Cyclone IV FPGA. The OJSVD algorithm is ported onto both the standalone NIOS II processor based system and the system with the proposed co-processor. The results show that 15X performance improvement achieved with this co-processor.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信