Real-time scene recognition on embedded system with SIFT keypoints and a new descriptor

2013 IEEE International Conference on Mechatronics and Automation Pub Date : 2013-10-03 DOI:10.1109/ICMA.2013.6618104

Han Xiao, Wenhao He, Kui Yuan, Feng Wen

{"title":"Real-time scene recognition on embedded system with SIFT keypoints and a new descriptor","authors":"Han Xiao, Wenhao He, Kui Yuan, Feng Wen","doi":"10.1109/ICMA.2013.6618104","DOIUrl":null,"url":null,"abstract":"The vision system of a mobile robot has to interpret the environment in real time at low power. As a good algorithm for extracting information from images, SIFT (Scale Invariant Feature Transform) is widely used in computer vision. However, the high computational complexity makes it hard to achieve real-time performance of SIFT with pure software. This paper presents a machine vision system implementing the SIFT algorithm on an embedded image processing card, where real-time scene recognition is accomplished with low power consumption through the cooperation between an FPGA (Field Programmable Gate Array) and a DSP (Digital Signal Processor) chip. The original SIFT keypoint detection algorithm is adapted for parallel computation and implemented with a hardware pipeline in the FPGA. Although our current system is designed for 360×288 video frames, this pipelined architecture can be applied to images with arbitrary resolution. Meanwhile, the original 128-dimensional SIFT descriptor is replaced by an 18-dimensional new descriptor which can be generated more efficiently and can be matched according to an absolute distance threshold with the distance defined by infinity-norm. On this basis, a five-branch-tree data structure is designed for fast searching and matching of descriptors, and robust scene recognition is realized through the combination of keypoints. Since our new descriptor allows one keypoint to be matched to several keypoints, which is a distinct property from the original SIFT algorithm, our system can recognize multiple images with overlapping contents simultaneously. In addition, compared with traditional work that needs off-line training, our system can perform fast on-line learning, which is a desirable property for mobile robots.","PeriodicalId":335884,"journal":{"name":"2013 IEEE International Conference on Mechatronics and Automation","volume":"56 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 IEEE International Conference on Mechatronics and Automation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMA.2013.6618104","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 12

Abstract

The vision system of a mobile robot has to interpret the environment in real time at low power. As a good algorithm for extracting information from images, SIFT (Scale Invariant Feature Transform) is widely used in computer vision. However, the high computational complexity makes it hard to achieve real-time performance of SIFT with pure software. This paper presents a machine vision system implementing the SIFT algorithm on an embedded image processing card, where real-time scene recognition is accomplished with low power consumption through the cooperation between an FPGA (Field Programmable Gate Array) and a DSP (Digital Signal Processor) chip. The original SIFT keypoint detection algorithm is adapted for parallel computation and implemented with a hardware pipeline in the FPGA. Although our current system is designed for 360×288 video frames, this pipelined architecture can be applied to images with arbitrary resolution. Meanwhile, the original 128-dimensional SIFT descriptor is replaced by an 18-dimensional new descriptor which can be generated more efficiently and can be matched according to an absolute distance threshold with the distance defined by infinity-norm. On this basis, a five-branch-tree data structure is designed for fast searching and matching of descriptors, and robust scene recognition is realized through the combination of keypoints. Since our new descriptor allows one keypoint to be matched to several keypoints, which is a distinct property from the original SIFT algorithm, our system can recognize multiple images with overlapping contents simultaneously. In addition, compared with traditional work that needs off-line training, our system can perform fast on-line learning, which is a desirable property for mobile robots.

查看原文本刊更多论文

基于SIFT关键点和描述符的嵌入式系统实时场景识别

移动机器人的视觉系统必须在低功耗条件下实时解读环境。SIFT (Scale Invariant Feature Transform)作为一种很好的图像信息提取算法，在计算机视觉中得到了广泛的应用。然而，高计算复杂度使得纯软件难以实现SIFT的实时性。本文提出了一种基于SIFT算法的嵌入式图像处理卡机器视觉系统，该系统通过FPGA(现场可编程门阵列)和DSP(数字信号处理器)芯片的配合，以低功耗完成实时场景识别。原有的SIFT关键点检测算法适用于并行计算，并在FPGA中采用硬件流水线实现。虽然我们目前的系统是为360×288视频帧设计的，但这种流水线架构可以应用于任意分辨率的图像。同时，将原有的128维SIFT描述子替换为一个新的18维描述子，该描述子可以更高效地生成，并且可以根据绝对距离阈值与无限范数定义的距离进行匹配。在此基础上，设计了五分支树数据结构，实现描述符的快速搜索和匹配，并通过关键点组合实现鲁棒场景识别。由于我们的新描述符允许一个关键点与多个关键点匹配，这是与原始SIFT算法不同的特性，因此我们的系统可以同时识别具有重叠内容的多幅图像。此外，与需要离线训练的传统工作相比，我们的系统可以进行快速的在线学习，这是移动机器人的理想特性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2013 IEEE International Conference on Mechatronics and Automation

自引率

0.00%

发文量