Hybrid aggregation strategy with double inverted residual blocks for lightweight salient object detection

IF 6.3 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neural Networks Pub Date : 2025-09-10 DOI:10.1016/j.neunet.2025.108097

Jianhua Ma , Mingfeng Jiang , Xian Fang , Jiatong Chen , Yaming Wang , Guang Yang

{"title":"Hybrid aggregation strategy with double inverted residual blocks for lightweight salient object detection","authors":"Jianhua Ma , Mingfeng Jiang , Xian Fang , Jiatong Chen , Yaming Wang , Guang Yang","doi":"10.1016/j.neunet.2025.108097","DOIUrl":null,"url":null,"abstract":"<div><div>Lightweight salient object detection (SOD) is widely used in various downstream applications due to its low resource requirements and fast inference speed. The use of hybrid encoders offers the potential to achieve a better balance between efficiency and accuracy for SOD task. However, the aggregation of features from convolutional neural networks (CNNs) and transformers remains challenging, and most existing lightweight SOD models rarely explore the efficient aggregation of cross-architecture features derived from hybrid encoders. In this paper, we propose a hybrid aggregation strategy network (HASNet) that balances accuracy and efficiency for lightweight SOD by grouping and aggregating features to leverage salient information across different architectures. Specifically, the features obtained after hybrid encoder processing are divided into convolutional and transformer features for shallow and deep aggregation respectively. Deep aggregation uses the global inverted residual block (GIRB) to facilitate the transfer of salient information encoded within transformer features across various levels. Meanwhile, shallow aggregation uses the lightweight inverted residual block (LIRB) to efficiently integrate the spatial information inherent in convolutional features. The GIRB incorporates an efficient global operation to extract channel semantic information from the high-dimensional transformer features. The LIRB fuses low-level features by efficiently exploiting the spatial information in features at extremely low computational cost. Comprehensive experiments conducted across five datasets demonstrate that our HASNet significantly outperform existing methods in a thorough evaluation encompassing parameter sizes, inference speed, and accuracy. The source code will be publicly available at <span><span>https://github.com/LitterMa-820/HASNet</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"194 ","pages":"Article 108097"},"PeriodicalIF":6.3000,"publicationDate":"2025-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neural Networks","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0893608025009773","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Lightweight salient object detection (SOD) is widely used in various downstream applications due to its low resource requirements and fast inference speed. The use of hybrid encoders offers the potential to achieve a better balance between efficiency and accuracy for SOD task. However, the aggregation of features from convolutional neural networks (CNNs) and transformers remains challenging, and most existing lightweight SOD models rarely explore the efficient aggregation of cross-architecture features derived from hybrid encoders. In this paper, we propose a hybrid aggregation strategy network (HASNet) that balances accuracy and efficiency for lightweight SOD by grouping and aggregating features to leverage salient information across different architectures. Specifically, the features obtained after hybrid encoder processing are divided into convolutional and transformer features for shallow and deep aggregation respectively. Deep aggregation uses the global inverted residual block (GIRB) to facilitate the transfer of salient information encoded within transformer features across various levels. Meanwhile, shallow aggregation uses the lightweight inverted residual block (LIRB) to efficiently integrate the spatial information inherent in convolutional features. The GIRB incorporates an efficient global operation to extract channel semantic information from the high-dimensional transformer features. The LIRB fuses low-level features by efficiently exploiting the spatial information in features at extremely low computational cost. Comprehensive experiments conducted across five datasets demonstrate that our HASNet significantly outperform existing methods in a thorough evaluation encompassing parameter sizes, inference speed, and accuracy. The source code will be publicly available at https://github.com/LitterMa-820/HASNet.

查看原文本刊更多论文

基于双反向残差块的轻型显著目标检测混合聚合策略。

轻型显著目标检测（SOD）因其资源要求低、推理速度快而被广泛应用于各种下游应用。混合编码器的使用提供了在SOD任务的效率和准确性之间实现更好平衡的潜力。然而，来自卷积神经网络（cnn）和变压器的特征聚合仍然具有挑战性，大多数现有的轻量级SOD模型很少探索来自混合编码器的跨架构特征的有效聚合。在本文中，我们提出了一种混合聚合策略网络（HASNet），它通过分组和聚合特征来平衡轻量级SOD的准确性和效率，从而利用不同架构中的重要信息。具体来说，将混合编码器处理后得到的特征分为卷积特征和变压器特征，分别进行浅聚集和深聚集。深度聚合使用全局反向残差块（GIRB）来促进变压器特征中编码的显著信息在不同级别之间的传输。同时，浅聚集利用轻量级的倒残差块（libb）来有效地整合卷积特征中固有的空间信息。GIRB结合了一种高效的全局操作，从高维变压器特征中提取信道语义信息。该方法以极低的计算成本有效地利用特征中的空间信息，从而融合低级特征。在五个数据集上进行的综合实验表明，我们的HASNet在包括参数大小、推理速度和准确性在内的全面评估方面明显优于现有方法。源代码将在https://github.com/LitterMa-820/HASNet上公开提供。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Neural Networks 工程技术-计算机：人工智能

CiteScore

13.90

自引率

7.70%

发文量

425

审稿时长

67 days

期刊介绍： Neural Networks is a platform that aims to foster an international community of scholars and practitioners interested in neural networks, deep learning, and other approaches to artificial intelligence and machine learning. Our journal invites submissions covering various aspects of neural networks research, from computational neuroscience and cognitive modeling to mathematical analyses and engineering applications. By providing a forum for interdisciplinary discussions between biology and technology, we aim to encourage the development of biologically-inspired artificial intelligence.