Enabling scale and rotation invariance in convolutional neural networks with retina like transformation

IF 6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neural Networks Pub Date : 2025-03-20 DOI:10.1016/j.neunet.2025.107395

Jiahong Zhang , Guoqi Li , Qiaoyi Su , Lihong Cao , Yonghong Tian , Bo Xu

{"title":"Enabling scale and rotation invariance in convolutional neural networks with retina like transformation","authors":"Jiahong Zhang , Guoqi Li , Qiaoyi Su , Lihong Cao , Yonghong Tian , Bo Xu","doi":"10.1016/j.neunet.2025.107395","DOIUrl":null,"url":null,"abstract":"<div><div>Traditional convolutional neural networks (CNNs) struggle with scale and rotation transformations, resulting in reduced performance on transformed images. Previous research focused on designing specific CNN modules to extract transformation-invariant features. However, these methods lack versatility and are not adaptable to a wide range of scenarios. Drawing inspiration from human visual invariance, we propose a novel brain-inspired approach to tackle the invariance problem in CNNs. If we consider a CNN as the visual cortex, we have the potential to design an “eye” that exhibits transformation invariance, allowing CNNs to perceive the world consistently. Therefore, we propose a retina module and then integrate it into CNNs to create transformation-invariant CNNs (TICNN), achieving scale and rotation invariance. The retina module comprises a retina-like transformation and a transformation-aware neural network (TANN). The retina-like transformation supports flexible image transformations, while the TANN regulates these transformations for scaling and rotation. Specifically, we propose a reference-based training method (RBTM) where the retina module learns to align input images with a reference scale and rotation, thereby achieving invariance. Furthermore, we provide mathematical substantiation for the retina module to confirm its feasibility. Experimental results also demonstrate that our method outperforms existing methods in recognizing images with scale and rotation variations. The code will be released at <span><span>https://github.com/JiaHongZ/TICNN</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"187 ","pages":"Article 107395"},"PeriodicalIF":6.0000,"publicationDate":"2025-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neural Networks","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0893608025002746","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Traditional convolutional neural networks (CNNs) struggle with scale and rotation transformations, resulting in reduced performance on transformed images. Previous research focused on designing specific CNN modules to extract transformation-invariant features. However, these methods lack versatility and are not adaptable to a wide range of scenarios. Drawing inspiration from human visual invariance, we propose a novel brain-inspired approach to tackle the invariance problem in CNNs. If we consider a CNN as the visual cortex, we have the potential to design an “eye” that exhibits transformation invariance, allowing CNNs to perceive the world consistently. Therefore, we propose a retina module and then integrate it into CNNs to create transformation-invariant CNNs (TICNN), achieving scale and rotation invariance. The retina module comprises a retina-like transformation and a transformation-aware neural network (TANN). The retina-like transformation supports flexible image transformations, while the TANN regulates these transformations for scaling and rotation. Specifically, we propose a reference-based training method (RBTM) where the retina module learns to align input images with a reference scale and rotation, thereby achieving invariance. Furthermore, we provide mathematical substantiation for the retina module to confirm its feasibility. Experimental results also demonstrate that our method outperforms existing methods in recognizing images with scale and rotation variations. The code will be released at https://github.com/JiaHongZ/TICNN.

查看原文本刊更多论文

在具有视网膜样变换的卷积神经网络中实现缩放和旋转不变性

传统的卷积神经网络（cnn）在尺度和旋转变换方面存在问题，导致在变换后的图像上性能下降。以往的研究主要集中在设计特定的CNN模块来提取变换不变特征。然而，这些方法缺乏通用性，不能适应广泛的场景。从人类视觉不变性中获得灵感，我们提出了一种新的大脑启发方法来解决cnn中的不变性问题。如果我们把CNN看作视觉皮层，我们就有可能设计出一只展现变换不变性的“眼睛”，让CNN始终如一地感知世界。因此，我们提出一个视网膜模块，然后将其集成到cnn中，创建变换不变性cnn (TICNN)，实现尺度和旋转不变性。视网膜模块包括一个类视网膜转换和一个转换感知神经网络（TANN）。类视网膜变换支持灵活的图像变换，而TANN调节这些变换的缩放和旋转。具体来说，我们提出了一种基于参考的训练方法（RBTM），其中视网膜模块学习以参考尺度和旋转对齐输入图像，从而实现不变性。此外，我们为视网膜模块提供了数学证明，以证实其可行性。实验结果还表明，该方法在识别具有尺度和旋转变化的图像方面优于现有方法。代码将在https://github.com/JiaHongZ/TICNN上发布。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Neural Networks 工程技术-计算机：人工智能

CiteScore

13.90

自引率

7.70%

发文量

425

审稿时长

67 days

期刊介绍： Neural Networks is a platform that aims to foster an international community of scholars and practitioners interested in neural networks, deep learning, and other approaches to artificial intelligence and machine learning. Our journal invites submissions covering various aspects of neural networks research, from computational neuroscience and cognitive modeling to mathematical analyses and engineering applications. By providing a forum for interdisciplinary discussions between biology and technology, we aim to encourage the development of biologically-inspired artificial intelligence.