Human Eyes–Inspired Recurrent Neural Networks Are More Robust Against Adversarial Noises

IF 2.7 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neural Computation Pub Date : 2024-08-19 DOI:10.1162/neco_a_01688

Minkyu Choi;Yizhen Zhang;Kuan Han;Xiaokai Wang;Zhongming Liu

{"title":"Human Eyes–Inspired Recurrent Neural Networks Are More Robust Against Adversarial Noises","authors":"Minkyu Choi;Yizhen Zhang;Kuan Han;Xiaokai Wang;Zhongming Liu","doi":"10.1162/neco_a_01688","DOIUrl":null,"url":null,"abstract":"Humans actively observe the visual surroundings by focusing on salient objects and ignoring trivial details. However, computer vision models based on convolutional neural networks (CNN) often analyze visual input all at once through a single feedforward pass. In this study, we designed a dual-stream vision model inspired by the human brain. This model features retina-like input layers and includes two streams: one determining the next point of focus (the fixation), while the other interprets the visuals surrounding the fixation. Trained on image recognition, this model examines an image through a sequence of fixations, each time focusing on different parts, thereby progressively building a representation of the image. We evaluated this model against various benchmarks in terms of object recognition, gaze behavior, and adversarial robustness. Our findings suggest that the model can attend and gaze in ways similar to humans without being explicitly trained to mimic human attention and that the model can enhance robustness against adversarial attacks due to its retinal sampling and recurrent processing. In particular, the model can correct its perceptual errors by taking more glances, setting itself apart from all feedforward-only models. In conclusion, the interactions of retinal sampling, eye movement, and recurrent dynamics are important to human-like visual exploration and inference.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"36 9","pages":"1713-1743"},"PeriodicalIF":2.7000,"publicationDate":"2024-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neural Computation","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10661266/","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Humans actively observe the visual surroundings by focusing on salient objects and ignoring trivial details. However, computer vision models based on convolutional neural networks (CNN) often analyze visual input all at once through a single feedforward pass. In this study, we designed a dual-stream vision model inspired by the human brain. This model features retina-like input layers and includes two streams: one determining the next point of focus (the fixation), while the other interprets the visuals surrounding the fixation. Trained on image recognition, this model examines an image through a sequence of fixations, each time focusing on different parts, thereby progressively building a representation of the image. We evaluated this model against various benchmarks in terms of object recognition, gaze behavior, and adversarial robustness. Our findings suggest that the model can attend and gaze in ways similar to humans without being explicitly trained to mimic human attention and that the model can enhance robustness against adversarial attacks due to its retinal sampling and recurrent processing. In particular, the model can correct its perceptual errors by taking more glances, setting itself apart from all feedforward-only models. In conclusion, the interactions of retinal sampling, eye movement, and recurrent dynamics are important to human-like visual exploration and inference.

查看原文本刊更多论文

人眼启发的递归神经网络在对抗对抗性噪声时更稳健

人类会主动观察周围的视觉环境，将注意力集中在突出的物体上，而忽略琐碎的细节。然而，基于卷积神经网络（CNN）的计算机视觉模型通常通过单一前馈传递一次性分析视觉输入。在这项研究中，我们设计了一种受人脑启发的双流视觉模型。该模型具有类似视网膜的输入层，包括两个流：一个流确定下一个焦点（固定点），另一个流解释固定点周围的视觉效果。经过图像识别训练后，该模型通过一连串的定点来检查图像，每次都聚焦于不同的部分，从而逐步建立图像的表征。我们根据物体识别、凝视行为和对抗鲁棒性等方面的各种基准对该模型进行了评估。我们的研究结果表明，该模型能够以与人类类似的方式关注和凝视，而无需进行明确的模仿人类关注的训练，并且由于其视网膜采样和循环处理功能，该模型能够增强抵御对抗性攻击的鲁棒性。特别是，该模型可以通过多看一眼来纠正其感知错误，从而与所有纯前馈模型区分开来。总之，视网膜采样、眼球运动和递归动力学的相互作用对于类人视觉探索和推理非常重要。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Neural Computation 工程技术-计算机：人工智能

CiteScore

6.30

自引率

3.40%

发文量

审稿时长

3.0 months

期刊介绍： Neural Computation is uniquely positioned at the crossroads between neuroscience and TMCS and welcomes the submission of original papers from all areas of TMCS, including: Advanced experimental design; Analysis of chemical sensor data; Connectomic reconstructions; Analysis of multielectrode and optical recordings; Genetic data for cell identity; Analysis of behavioral data; Multiscale models; Analysis of molecular mechanisms; Neuroinformatics; Analysis of brain imaging data; Neuromorphic engineering; Principles of neural coding, computation, circuit dynamics, and plasticity; Theories of brain function.