Closed-Loop Deep Vision

2013 International Conference on Digital Image Computing: Techniques and Applications (DICTA) Pub Date : 2013-12-23 DOI:10.1109/DICTA.2013.6691492

G. Carneiro, Zhibin Liao, Tat-Jun Chin

{"title":"Closed-Loop Deep Vision","authors":"G. Carneiro, Zhibin Liao, Tat-Jun Chin","doi":"10.1109/DICTA.2013.6691492","DOIUrl":null,"url":null,"abstract":"There has been a resurgence of interest in one of the most fundamental aspects of computer vision, which is related to the existence of a feedback mechanism in the inference of a visual classification process. Indeed, this mechanism was present in the first computer vision methodologies, but technical and theoretical issues imposed major roadblocks that forced researchers to seek alternative approaches based on pure feed-forward inference. These open loop approaches process the input image sequentially with increasingly more complex analysis steps, and any mistake made by intermediate steps impair all subsequent analysis tasks. On the other hand, closed-loop approaches involving feed- forward and feedback mechanisms can fix mistakes made during such intermediate stages. In this paper, we present a new closed- loop inference for computer vision problems based on an iterative analysis using deep belief networks (DBN). Specifically, an image is processed using a feed-forward mechanism that will produce a classification result, which is then used to sample an image from the current belief state of the DBN. Then the difference between the input image and the sampled image is fed back to the DBN for re- classification, and this process iterates until convergence. We show that our closed-loop vision inference improves the classification results compared to pure feed-forward mechanisms on the MNIST handwritten digit dataset and the Multiple Object Categories containing shapes of horses, dragonflies, llamas and rhinos.","PeriodicalId":231632,"journal":{"name":"2013 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"233 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DICTA.2013.6691492","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

There has been a resurgence of interest in one of the most fundamental aspects of computer vision, which is related to the existence of a feedback mechanism in the inference of a visual classification process. Indeed, this mechanism was present in the first computer vision methodologies, but technical and theoretical issues imposed major roadblocks that forced researchers to seek alternative approaches based on pure feed-forward inference. These open loop approaches process the input image sequentially with increasingly more complex analysis steps, and any mistake made by intermediate steps impair all subsequent analysis tasks. On the other hand, closed-loop approaches involving feed- forward and feedback mechanisms can fix mistakes made during such intermediate stages. In this paper, we present a new closed- loop inference for computer vision problems based on an iterative analysis using deep belief networks (DBN). Specifically, an image is processed using a feed-forward mechanism that will produce a classification result, which is then used to sample an image from the current belief state of the DBN. Then the difference between the input image and the sampled image is fed back to the DBN for re- classification, and this process iterates until convergence. We show that our closed-loop vision inference improves the classification results compared to pure feed-forward mechanisms on the MNIST handwritten digit dataset and the Multiple Object Categories containing shapes of horses, dragonflies, llamas and rhinos.

查看原文本刊更多论文

闭环深度视觉

人们对计算机视觉最基本的一个方面重新产生了兴趣，这与视觉分类过程推理中的反馈机制的存在有关。事实上，这种机制在最初的计算机视觉方法中就存在，但技术和理论问题造成了主要障碍，迫使研究人员寻求基于纯前馈推理的替代方法。这些开环方法以越来越复杂的分析步骤依次处理输入图像，中间步骤的任何错误都会影响后续的所有分析任务。另一方面，包括前馈和反馈机制的闭环方法可以修复在这种中间阶段所犯的错误。本文提出了一种基于深度信念网络迭代分析的闭环推理方法。具体来说，使用前馈机制处理图像，该机制将产生分类结果，然后使用该结果从DBN的当前信念状态中采样图像。然后将输入图像与采样图像之间的差值反馈给DBN进行重新分类，这个过程不断迭代直到收敛。在MNIST手写数字数据集和包含马、蜻蜓、羊驼和犀牛形状的多对象类别数据集上，与纯前馈机制相比，我们的闭环视觉推理提高了分类结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2013 International Conference on Digital Image Computing: Techniques and Applications (DICTA)

自引率

0.00%

发文量