{"title":"Learning Stage-wise Fusion Transformer for light field saliency detection","authors":"Wenhui Jiang , Qi Shu , Hongwei Cheng , Yuming Fang , Yifan Zuo , Xiaowei Zhao","doi":"10.1016/j.patrec.2025.07.005","DOIUrl":null,"url":null,"abstract":"<div><div>Light field salient object detection (SOD) has attracted tremendous research efforts recently. As the light field data contains multiple images with different characteristics, effectively integrating the valuable information from these images remains under-explored. Recent efforts focus on aggregating the complementary information from all-in-focus (AiF) and focal stack images (FS) late in the decoding stage. In this paper, we explore how learning the AiF and FS image encoders jointly can strengthen light field SOD. Towards this goal, we propose a Stage-wise Fusion Transformer (SF-Transformer) to aggregate the rich information from AiF image and FS images at different levels. Specifically, we present a Focal Stack Transformer (FST) for focal stacks encoding, which makes full use of the spatial-stack correlations for performant FS representation. We further introduce a Stage-wise Deep Fusion (SDF) which refines both AiF and FS image representation by capturing the multi-modal feature interactions in each encoding stage, thus effectively exploring the advantages of AiF and FS characteristics. We conduct comprehensive experiments on DUT-LFSD, HFUT-LFSD, and LFSD. The experimental results validate the effectiveness of the proposed method.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"197 ","pages":"Pages 117-123"},"PeriodicalIF":3.3000,"publicationDate":"2025-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition Letters","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167865525002570","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Light field salient object detection (SOD) has attracted tremendous research efforts recently. As the light field data contains multiple images with different characteristics, effectively integrating the valuable information from these images remains under-explored. Recent efforts focus on aggregating the complementary information from all-in-focus (AiF) and focal stack images (FS) late in the decoding stage. In this paper, we explore how learning the AiF and FS image encoders jointly can strengthen light field SOD. Towards this goal, we propose a Stage-wise Fusion Transformer (SF-Transformer) to aggregate the rich information from AiF image and FS images at different levels. Specifically, we present a Focal Stack Transformer (FST) for focal stacks encoding, which makes full use of the spatial-stack correlations for performant FS representation. We further introduce a Stage-wise Deep Fusion (SDF) which refines both AiF and FS image representation by capturing the multi-modal feature interactions in each encoding stage, thus effectively exploring the advantages of AiF and FS characteristics. We conduct comprehensive experiments on DUT-LFSD, HFUT-LFSD, and LFSD. The experimental results validate the effectiveness of the proposed method.
期刊介绍:
Pattern Recognition Letters aims at rapid publication of concise articles of a broad interest in pattern recognition.
Subject areas include all the current fields of interest represented by the Technical Committees of the International Association of Pattern Recognition, and other developing themes involving learning and recognition.