Bing Liu , Zushuang Liang , Lina Gao , Yulong Huang , Tiantian Wang
{"title":"Multi-graph mutual learning network with cross-modal feature fusion for video salient object detection","authors":"Bing Liu , Zushuang Liang , Lina Gao , Yulong Huang , Tiantian Wang","doi":"10.1016/j.dsp.2025.105485","DOIUrl":null,"url":null,"abstract":"<div><div>Video salient object detection (VSOD) aims to identify and highlight the most visually compelling and motion-related elements within video sequences, which serves as a crucial preprocessing step for intelligent video analysis. However, due to inadequate spatiotemporal cross-modal feature fusion and suboptimal capture of salient structure information, existing detection methods exhibit poor performance in numerous complex scenes. Such scenes typically involve moving objects but not salient in the background or rapid motion changes in foreground objects. To address this issue, we propose a multi-graph mutual learning network with cross-modal feature fusion for VSOD. Specifically, we design a cross-attention module (CAM) to effectively fuse spatiotemporal modal features. And we devise a multi-scale feature fusion module (MFFM) to fully integrate multi-scale features from different feature extraction layers. Finally, we propose a multi-graph mutual learning network (MGMLN) to improve the integrity and continuity of object structural information. Extensive experiments were conducted on four commonly used test datasets for VSOD. Our proposed method demonstrates the capability to accurately predict the most salient objects and maintain coherent details within complex dynamic visual scenes, when benchmarked against 21 state-of-the-art (SOTA) VSOD models.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"168 ","pages":"Article 105485"},"PeriodicalIF":3.0000,"publicationDate":"2025-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Digital Signal Processing","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S105120042500507X","RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
Video salient object detection (VSOD) aims to identify and highlight the most visually compelling and motion-related elements within video sequences, which serves as a crucial preprocessing step for intelligent video analysis. However, due to inadequate spatiotemporal cross-modal feature fusion and suboptimal capture of salient structure information, existing detection methods exhibit poor performance in numerous complex scenes. Such scenes typically involve moving objects but not salient in the background or rapid motion changes in foreground objects. To address this issue, we propose a multi-graph mutual learning network with cross-modal feature fusion for VSOD. Specifically, we design a cross-attention module (CAM) to effectively fuse spatiotemporal modal features. And we devise a multi-scale feature fusion module (MFFM) to fully integrate multi-scale features from different feature extraction layers. Finally, we propose a multi-graph mutual learning network (MGMLN) to improve the integrity and continuity of object structural information. Extensive experiments were conducted on four commonly used test datasets for VSOD. Our proposed method demonstrates the capability to accurately predict the most salient objects and maintain coherent details within complex dynamic visual scenes, when benchmarked against 21 state-of-the-art (SOTA) VSOD models.
期刊介绍:
Digital Signal Processing: A Review Journal is one of the oldest and most established journals in the field of signal processing yet it aims to be the most innovative. The Journal invites top quality research articles at the frontiers of research in all aspects of signal processing. Our objective is to provide a platform for the publication of ground-breaking research in signal processing with both academic and industrial appeal.
The journal has a special emphasis on statistical signal processing methodology such as Bayesian signal processing, and encourages articles on emerging applications of signal processing such as:
• big data• machine learning• internet of things• information security• systems biology and computational biology,• financial time series analysis,• autonomous vehicles,• quantum computing,• neuromorphic engineering,• human-computer interaction and intelligent user interfaces,• environmental signal processing,• geophysical signal processing including seismic signal processing,• chemioinformatics and bioinformatics,• audio, visual and performance arts,• disaster management and prevention,• renewable energy,