Ying Xie , Jixiang Wang , Zhiqiang Xu , Junnan Shen , Lijie Wen , Rongbin Xu , Hang Xu , Yun Yang
{"title":"Alignable kernel network","authors":"Ying Xie , Jixiang Wang , Zhiqiang Xu , Junnan Shen , Lijie Wen , Rongbin Xu , Hang Xu , Yun Yang","doi":"10.1016/j.inffus.2024.102758","DOIUrl":null,"url":null,"abstract":"<div><div>To enhance the adaptability and performance of Convolutional Neural Networks (CNN), we present an adaptable mechanism called Alignable Kernel (AliK) unit, which dynamically adjusts the receptive field (RF) dimensions of a model in response to varying stimuli. The branches of AliK unit are integrated through a novel align transformation softmax attention, incorporating prior knowledge through rank ordering constraints. The attention weightings across the branches establish the effective RF scales, leveraged by neurons in the fusion layer. This mechanism is inspired by neuroscientific observations indicating that the RF dimensions of neurons in the visual cortex vary with the stimulus, a feature often overlooked in CNN architectures. By aggregating successive AliK ensembles, we develop a deep network architecture named the Alignable Kernel Network (AliKNet). AliKNet with interdisciplinary design improves the network’s performance and interpretability by taking direct inspiration from the structure and function of human neural systems, especially the visual cortex. Empirical evaluations in the domains of image classification and semantic segmentation have demonstrated that AliKNet excels over numerous state-of-the-art architectures, achieving this without increasing model complexity. Furthermore, we demonstrate that AliKNet can identify target objects across various scales, confirming their ability to dynamically adapt their RF sizes in response to the input data.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"115 ","pages":"Article 102758"},"PeriodicalIF":14.7000,"publicationDate":"2024-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Fusion","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1566253524005360","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
To enhance the adaptability and performance of Convolutional Neural Networks (CNN), we present an adaptable mechanism called Alignable Kernel (AliK) unit, which dynamically adjusts the receptive field (RF) dimensions of a model in response to varying stimuli. The branches of AliK unit are integrated through a novel align transformation softmax attention, incorporating prior knowledge through rank ordering constraints. The attention weightings across the branches establish the effective RF scales, leveraged by neurons in the fusion layer. This mechanism is inspired by neuroscientific observations indicating that the RF dimensions of neurons in the visual cortex vary with the stimulus, a feature often overlooked in CNN architectures. By aggregating successive AliK ensembles, we develop a deep network architecture named the Alignable Kernel Network (AliKNet). AliKNet with interdisciplinary design improves the network’s performance and interpretability by taking direct inspiration from the structure and function of human neural systems, especially the visual cortex. Empirical evaluations in the domains of image classification and semantic segmentation have demonstrated that AliKNet excels over numerous state-of-the-art architectures, achieving this without increasing model complexity. Furthermore, we demonstrate that AliKNet can identify target objects across various scales, confirming their ability to dynamically adapt their RF sizes in response to the input data.
期刊介绍:
Information Fusion serves as a central platform for showcasing advancements in multi-sensor, multi-source, multi-process information fusion, fostering collaboration among diverse disciplines driving its progress. It is the leading outlet for sharing research and development in this field, focusing on architectures, algorithms, and applications. Papers dealing with fundamental theoretical analyses as well as those demonstrating their application to real-world problems will be welcome.