Real-time scene recognition on embedded system with SIFT keypoints and a new descriptor

Han Xiao, Wenhao He, Kui Yuan, Feng Wen
{"title":"Real-time scene recognition on embedded system with SIFT keypoints and a new descriptor","authors":"Han Xiao, Wenhao He, Kui Yuan, Feng Wen","doi":"10.1109/ICMA.2013.6618104","DOIUrl":null,"url":null,"abstract":"The vision system of a mobile robot has to interpret the environment in real time at low power. As a good algorithm for extracting information from images, SIFT (Scale Invariant Feature Transform) is widely used in computer vision. However, the high computational complexity makes it hard to achieve real-time performance of SIFT with pure software. This paper presents a machine vision system implementing the SIFT algorithm on an embedded image processing card, where real-time scene recognition is accomplished with low power consumption through the cooperation between an FPGA (Field Programmable Gate Array) and a DSP (Digital Signal Processor) chip. The original SIFT keypoint detection algorithm is adapted for parallel computation and implemented with a hardware pipeline in the FPGA. Although our current system is designed for 360×288 video frames, this pipelined architecture can be applied to images with arbitrary resolution. Meanwhile, the original 128-dimensional SIFT descriptor is replaced by an 18-dimensional new descriptor which can be generated more efficiently and can be matched according to an absolute distance threshold with the distance defined by infinity-norm. On this basis, a five-branch-tree data structure is designed for fast searching and matching of descriptors, and robust scene recognition is realized through the combination of keypoints. Since our new descriptor allows one keypoint to be matched to several keypoints, which is a distinct property from the original SIFT algorithm, our system can recognize multiple images with overlapping contents simultaneously. In addition, compared with traditional work that needs off-line training, our system can perform fast on-line learning, which is a desirable property for mobile robots.","PeriodicalId":335884,"journal":{"name":"2013 IEEE International Conference on Mechatronics and Automation","volume":"56 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 IEEE International Conference on Mechatronics and Automation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMA.2013.6618104","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 12

Abstract

The vision system of a mobile robot has to interpret the environment in real time at low power. As a good algorithm for extracting information from images, SIFT (Scale Invariant Feature Transform) is widely used in computer vision. However, the high computational complexity makes it hard to achieve real-time performance of SIFT with pure software. This paper presents a machine vision system implementing the SIFT algorithm on an embedded image processing card, where real-time scene recognition is accomplished with low power consumption through the cooperation between an FPGA (Field Programmable Gate Array) and a DSP (Digital Signal Processor) chip. The original SIFT keypoint detection algorithm is adapted for parallel computation and implemented with a hardware pipeline in the FPGA. Although our current system is designed for 360×288 video frames, this pipelined architecture can be applied to images with arbitrary resolution. Meanwhile, the original 128-dimensional SIFT descriptor is replaced by an 18-dimensional new descriptor which can be generated more efficiently and can be matched according to an absolute distance threshold with the distance defined by infinity-norm. On this basis, a five-branch-tree data structure is designed for fast searching and matching of descriptors, and robust scene recognition is realized through the combination of keypoints. Since our new descriptor allows one keypoint to be matched to several keypoints, which is a distinct property from the original SIFT algorithm, our system can recognize multiple images with overlapping contents simultaneously. In addition, compared with traditional work that needs off-line training, our system can perform fast on-line learning, which is a desirable property for mobile robots.
基于SIFT关键点和描述符的嵌入式系统实时场景识别
移动机器人的视觉系统必须在低功耗条件下实时解读环境。SIFT (Scale Invariant Feature Transform)作为一种很好的图像信息提取算法,在计算机视觉中得到了广泛的应用。然而,高计算复杂度使得纯软件难以实现SIFT的实时性。本文提出了一种基于SIFT算法的嵌入式图像处理卡机器视觉系统,该系统通过FPGA(现场可编程门阵列)和DSP(数字信号处理器)芯片的配合,以低功耗完成实时场景识别。原有的SIFT关键点检测算法适用于并行计算,并在FPGA中采用硬件流水线实现。虽然我们目前的系统是为360×288视频帧设计的,但这种流水线架构可以应用于任意分辨率的图像。同时,将原有的128维SIFT描述子替换为一个新的18维描述子,该描述子可以更高效地生成,并且可以根据绝对距离阈值与无限范数定义的距离进行匹配。在此基础上,设计了五分支树数据结构,实现描述符的快速搜索和匹配,并通过关键点组合实现鲁棒场景识别。由于我们的新描述符允许一个关键点与多个关键点匹配,这是与原始SIFT算法不同的特性,因此我们的系统可以同时识别具有重叠内容的多幅图像。此外,与需要离线训练的传统工作相比,我们的系统可以进行快速的在线学习,这是移动机器人的理想特性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信