{"title":"Hierarchical boundary feature alignment network for video salient object detection","authors":"Amin Mao , Jiebin Yan , Yuming Fang , Hantao Liu","doi":"10.1016/j.jvcir.2025.104435","DOIUrl":null,"url":null,"abstract":"<div><div>The deep learning based video salient object detection (VSOD) models have achieved great success in the past few years, however, these VSOD models still suffer from the following two problems: i) struggle in accurately predicting those pixels surrounding salient objects; ii) unaligned features of different scales lead to deviations in feature fusion. To tackle these problems, we propose a hierarchical boundary feature alignment network (HBFA). Specifically, the proposed HBFA consists of a temporal–spatial fusion module (TSM) and three decoding branches. TSM captures multi-scale spatiotemporal information. The two boundary feature branches are used to guide the whole network to pay more attention to the boundary of salient objects, while the feature alignment branch is capable of fusing the features from the internal and external branches while aligning features across different scales. Our extensive experiments show that the proposed method reaches a new state-of-the-art performance.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"109 ","pages":"Article 104435"},"PeriodicalIF":2.6000,"publicationDate":"2025-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Visual Communication and Image Representation","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1047320325000495","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
The deep learning based video salient object detection (VSOD) models have achieved great success in the past few years, however, these VSOD models still suffer from the following two problems: i) struggle in accurately predicting those pixels surrounding salient objects; ii) unaligned features of different scales lead to deviations in feature fusion. To tackle these problems, we propose a hierarchical boundary feature alignment network (HBFA). Specifically, the proposed HBFA consists of a temporal–spatial fusion module (TSM) and three decoding branches. TSM captures multi-scale spatiotemporal information. The two boundary feature branches are used to guide the whole network to pay more attention to the boundary of salient objects, while the feature alignment branch is capable of fusing the features from the internal and external branches while aligning features across different scales. Our extensive experiments show that the proposed method reaches a new state-of-the-art performance.
期刊介绍:
The Journal of Visual Communication and Image Representation publishes papers on state-of-the-art visual communication and image representation, with emphasis on novel technologies and theoretical work in this multidisciplinary area of pure and applied research. The field of visual communication and image representation is considered in its broadest sense and covers both digital and analog aspects as well as processing and communication in biological visual systems.