{"title":"Real-time scene recognition on embedded system with SIFT keypoints and a new descriptor","authors":"Han Xiao, Wenhao He, Kui Yuan, Feng Wen","doi":"10.1109/ICMA.2013.6618104","DOIUrl":null,"url":null,"abstract":"The vision system of a mobile robot has to interpret the environment in real time at low power. As a good algorithm for extracting information from images, SIFT (Scale Invariant Feature Transform) is widely used in computer vision. However, the high computational complexity makes it hard to achieve real-time performance of SIFT with pure software. This paper presents a machine vision system implementing the SIFT algorithm on an embedded image processing card, where real-time scene recognition is accomplished with low power consumption through the cooperation between an FPGA (Field Programmable Gate Array) and a DSP (Digital Signal Processor) chip. The original SIFT keypoint detection algorithm is adapted for parallel computation and implemented with a hardware pipeline in the FPGA. Although our current system is designed for 360×288 video frames, this pipelined architecture can be applied to images with arbitrary resolution. Meanwhile, the original 128-dimensional SIFT descriptor is replaced by an 18-dimensional new descriptor which can be generated more efficiently and can be matched according to an absolute distance threshold with the distance defined by infinity-norm. On this basis, a five-branch-tree data structure is designed for fast searching and matching of descriptors, and robust scene recognition is realized through the combination of keypoints. Since our new descriptor allows one keypoint to be matched to several keypoints, which is a distinct property from the original SIFT algorithm, our system can recognize multiple images with overlapping contents simultaneously. In addition, compared with traditional work that needs off-line training, our system can perform fast on-line learning, which is a desirable property for mobile robots.","PeriodicalId":335884,"journal":{"name":"2013 IEEE International Conference on Mechatronics and Automation","volume":"56 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 IEEE International Conference on Mechatronics and Automation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMA.2013.6618104","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 12
Abstract
The vision system of a mobile robot has to interpret the environment in real time at low power. As a good algorithm for extracting information from images, SIFT (Scale Invariant Feature Transform) is widely used in computer vision. However, the high computational complexity makes it hard to achieve real-time performance of SIFT with pure software. This paper presents a machine vision system implementing the SIFT algorithm on an embedded image processing card, where real-time scene recognition is accomplished with low power consumption through the cooperation between an FPGA (Field Programmable Gate Array) and a DSP (Digital Signal Processor) chip. The original SIFT keypoint detection algorithm is adapted for parallel computation and implemented with a hardware pipeline in the FPGA. Although our current system is designed for 360×288 video frames, this pipelined architecture can be applied to images with arbitrary resolution. Meanwhile, the original 128-dimensional SIFT descriptor is replaced by an 18-dimensional new descriptor which can be generated more efficiently and can be matched according to an absolute distance threshold with the distance defined by infinity-norm. On this basis, a five-branch-tree data structure is designed for fast searching and matching of descriptors, and robust scene recognition is realized through the combination of keypoints. Since our new descriptor allows one keypoint to be matched to several keypoints, which is a distinct property from the original SIFT algorithm, our system can recognize multiple images with overlapping contents simultaneously. In addition, compared with traditional work that needs off-line training, our system can perform fast on-line learning, which is a desirable property for mobile robots.