Hybrid aggregation strategy with double inverted residual blocks for lightweight salient object detection

IF 6.3 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Jianhua Ma , Mingfeng Jiang , Xian Fang , Jiatong Chen , Yaming Wang , Guang Yang
{"title":"Hybrid aggregation strategy with double inverted residual blocks for lightweight salient object detection","authors":"Jianhua Ma ,&nbsp;Mingfeng Jiang ,&nbsp;Xian Fang ,&nbsp;Jiatong Chen ,&nbsp;Yaming Wang ,&nbsp;Guang Yang","doi":"10.1016/j.neunet.2025.108097","DOIUrl":null,"url":null,"abstract":"<div><div>Lightweight salient object detection (SOD) is widely used in various downstream applications due to its low resource requirements and fast inference speed. The use of hybrid encoders offers the potential to achieve a better balance between efficiency and accuracy for SOD task. However, the aggregation of features from convolutional neural networks (CNNs) and transformers remains challenging, and most existing lightweight SOD models rarely explore the efficient aggregation of cross-architecture features derived from hybrid encoders. In this paper, we propose a hybrid aggregation strategy network (HASNet) that balances accuracy and efficiency for lightweight SOD by grouping and aggregating features to leverage salient information across different architectures. Specifically, the features obtained after hybrid encoder processing are divided into convolutional and transformer features for shallow and deep aggregation respectively. Deep aggregation uses the global inverted residual block (GIRB) to facilitate the transfer of salient information encoded within transformer features across various levels. Meanwhile, shallow aggregation uses the lightweight inverted residual block (LIRB) to efficiently integrate the spatial information inherent in convolutional features. The GIRB incorporates an efficient global operation to extract channel semantic information from the high-dimensional transformer features. The LIRB fuses low-level features by efficiently exploiting the spatial information in features at extremely low computational cost. Comprehensive experiments conducted across five datasets demonstrate that our HASNet significantly outperform existing methods in a thorough evaluation encompassing parameter sizes, inference speed, and accuracy. The source code will be publicly available at <span><span>https://github.com/LitterMa-820/HASNet</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"194 ","pages":"Article 108097"},"PeriodicalIF":6.3000,"publicationDate":"2025-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neural Networks","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0893608025009773","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Lightweight salient object detection (SOD) is widely used in various downstream applications due to its low resource requirements and fast inference speed. The use of hybrid encoders offers the potential to achieve a better balance between efficiency and accuracy for SOD task. However, the aggregation of features from convolutional neural networks (CNNs) and transformers remains challenging, and most existing lightweight SOD models rarely explore the efficient aggregation of cross-architecture features derived from hybrid encoders. In this paper, we propose a hybrid aggregation strategy network (HASNet) that balances accuracy and efficiency for lightweight SOD by grouping and aggregating features to leverage salient information across different architectures. Specifically, the features obtained after hybrid encoder processing are divided into convolutional and transformer features for shallow and deep aggregation respectively. Deep aggregation uses the global inverted residual block (GIRB) to facilitate the transfer of salient information encoded within transformer features across various levels. Meanwhile, shallow aggregation uses the lightweight inverted residual block (LIRB) to efficiently integrate the spatial information inherent in convolutional features. The GIRB incorporates an efficient global operation to extract channel semantic information from the high-dimensional transformer features. The LIRB fuses low-level features by efficiently exploiting the spatial information in features at extremely low computational cost. Comprehensive experiments conducted across five datasets demonstrate that our HASNet significantly outperform existing methods in a thorough evaluation encompassing parameter sizes, inference speed, and accuracy. The source code will be publicly available at https://github.com/LitterMa-820/HASNet.
基于双反向残差块的轻型显著目标检测混合聚合策略。
轻型显著目标检测(SOD)因其资源要求低、推理速度快而被广泛应用于各种下游应用。混合编码器的使用提供了在SOD任务的效率和准确性之间实现更好平衡的潜力。然而,来自卷积神经网络(cnn)和变压器的特征聚合仍然具有挑战性,大多数现有的轻量级SOD模型很少探索来自混合编码器的跨架构特征的有效聚合。在本文中,我们提出了一种混合聚合策略网络(HASNet),它通过分组和聚合特征来平衡轻量级SOD的准确性和效率,从而利用不同架构中的重要信息。具体来说,将混合编码器处理后得到的特征分为卷积特征和变压器特征,分别进行浅聚集和深聚集。深度聚合使用全局反向残差块(GIRB)来促进变压器特征中编码的显著信息在不同级别之间的传输。同时,浅聚集利用轻量级的倒残差块(libb)来有效地整合卷积特征中固有的空间信息。GIRB结合了一种高效的全局操作,从高维变压器特征中提取信道语义信息。该方法以极低的计算成本有效地利用特征中的空间信息,从而融合低级特征。在五个数据集上进行的综合实验表明,我们的HASNet在包括参数大小、推理速度和准确性在内的全面评估方面明显优于现有方法。源代码将在https://github.com/LitterMa-820/HASNet上公开提供。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Neural Networks
Neural Networks 工程技术-计算机:人工智能
CiteScore
13.90
自引率
7.70%
发文量
425
审稿时长
67 days
期刊介绍: Neural Networks is a platform that aims to foster an international community of scholars and practitioners interested in neural networks, deep learning, and other approaches to artificial intelligence and machine learning. Our journal invites submissions covering various aspects of neural networks research, from computational neuroscience and cognitive modeling to mathematical analyses and engineering applications. By providing a forum for interdisciplinary discussions between biology and technology, we aim to encourage the development of biologically-inspired artificial intelligence.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信