Jianhua Ma , Mingfeng Jiang , Xian Fang , Jiatong Chen , Yaming Wang , Guang Yang
{"title":"基于双反向残差块的轻型显著目标检测混合聚合策略。","authors":"Jianhua Ma , Mingfeng Jiang , Xian Fang , Jiatong Chen , Yaming Wang , Guang Yang","doi":"10.1016/j.neunet.2025.108097","DOIUrl":null,"url":null,"abstract":"<div><div>Lightweight salient object detection (SOD) is widely used in various downstream applications due to its low resource requirements and fast inference speed. The use of hybrid encoders offers the potential to achieve a better balance between efficiency and accuracy for SOD task. However, the aggregation of features from convolutional neural networks (CNNs) and transformers remains challenging, and most existing lightweight SOD models rarely explore the efficient aggregation of cross-architecture features derived from hybrid encoders. In this paper, we propose a hybrid aggregation strategy network (HASNet) that balances accuracy and efficiency for lightweight SOD by grouping and aggregating features to leverage salient information across different architectures. Specifically, the features obtained after hybrid encoder processing are divided into convolutional and transformer features for shallow and deep aggregation respectively. Deep aggregation uses the global inverted residual block (GIRB) to facilitate the transfer of salient information encoded within transformer features across various levels. Meanwhile, shallow aggregation uses the lightweight inverted residual block (LIRB) to efficiently integrate the spatial information inherent in convolutional features. The GIRB incorporates an efficient global operation to extract channel semantic information from the high-dimensional transformer features. The LIRB fuses low-level features by efficiently exploiting the spatial information in features at extremely low computational cost. Comprehensive experiments conducted across five datasets demonstrate that our HASNet significantly outperform existing methods in a thorough evaluation encompassing parameter sizes, inference speed, and accuracy. The source code will be publicly available at <span><span>https://github.com/LitterMa-820/HASNet</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"194 ","pages":"Article 108097"},"PeriodicalIF":6.3000,"publicationDate":"2025-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Hybrid aggregation strategy with double inverted residual blocks for lightweight salient object detection\",\"authors\":\"Jianhua Ma , Mingfeng Jiang , Xian Fang , Jiatong Chen , Yaming Wang , Guang Yang\",\"doi\":\"10.1016/j.neunet.2025.108097\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Lightweight salient object detection (SOD) is widely used in various downstream applications due to its low resource requirements and fast inference speed. The use of hybrid encoders offers the potential to achieve a better balance between efficiency and accuracy for SOD task. However, the aggregation of features from convolutional neural networks (CNNs) and transformers remains challenging, and most existing lightweight SOD models rarely explore the efficient aggregation of cross-architecture features derived from hybrid encoders. In this paper, we propose a hybrid aggregation strategy network (HASNet) that balances accuracy and efficiency for lightweight SOD by grouping and aggregating features to leverage salient information across different architectures. Specifically, the features obtained after hybrid encoder processing are divided into convolutional and transformer features for shallow and deep aggregation respectively. Deep aggregation uses the global inverted residual block (GIRB) to facilitate the transfer of salient information encoded within transformer features across various levels. Meanwhile, shallow aggregation uses the lightweight inverted residual block (LIRB) to efficiently integrate the spatial information inherent in convolutional features. The GIRB incorporates an efficient global operation to extract channel semantic information from the high-dimensional transformer features. The LIRB fuses low-level features by efficiently exploiting the spatial information in features at extremely low computational cost. Comprehensive experiments conducted across five datasets demonstrate that our HASNet significantly outperform existing methods in a thorough evaluation encompassing parameter sizes, inference speed, and accuracy. The source code will be publicly available at <span><span>https://github.com/LitterMa-820/HASNet</span><svg><path></path></svg></span>.</div></div>\",\"PeriodicalId\":49763,\"journal\":{\"name\":\"Neural Networks\",\"volume\":\"194 \",\"pages\":\"Article 108097\"},\"PeriodicalIF\":6.3000,\"publicationDate\":\"2025-09-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Neural Networks\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0893608025009773\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neural Networks","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0893608025009773","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Hybrid aggregation strategy with double inverted residual blocks for lightweight salient object detection
Lightweight salient object detection (SOD) is widely used in various downstream applications due to its low resource requirements and fast inference speed. The use of hybrid encoders offers the potential to achieve a better balance between efficiency and accuracy for SOD task. However, the aggregation of features from convolutional neural networks (CNNs) and transformers remains challenging, and most existing lightweight SOD models rarely explore the efficient aggregation of cross-architecture features derived from hybrid encoders. In this paper, we propose a hybrid aggregation strategy network (HASNet) that balances accuracy and efficiency for lightweight SOD by grouping and aggregating features to leverage salient information across different architectures. Specifically, the features obtained after hybrid encoder processing are divided into convolutional and transformer features for shallow and deep aggregation respectively. Deep aggregation uses the global inverted residual block (GIRB) to facilitate the transfer of salient information encoded within transformer features across various levels. Meanwhile, shallow aggregation uses the lightweight inverted residual block (LIRB) to efficiently integrate the spatial information inherent in convolutional features. The GIRB incorporates an efficient global operation to extract channel semantic information from the high-dimensional transformer features. The LIRB fuses low-level features by efficiently exploiting the spatial information in features at extremely low computational cost. Comprehensive experiments conducted across five datasets demonstrate that our HASNet significantly outperform existing methods in a thorough evaluation encompassing parameter sizes, inference speed, and accuracy. The source code will be publicly available at https://github.com/LitterMa-820/HASNet.
期刊介绍:
Neural Networks is a platform that aims to foster an international community of scholars and practitioners interested in neural networks, deep learning, and other approaches to artificial intelligence and machine learning. Our journal invites submissions covering various aspects of neural networks research, from computational neuroscience and cognitive modeling to mathematical analyses and engineering applications. By providing a forum for interdisciplinary discussions between biology and technology, we aim to encourage the development of biologically-inspired artificial intelligence.