Journal of Visual Communication and Image Representation最新文献

筛选
英文 中文
Multi-level cross-modal attention guided DIBR 3D image watermarking 多层次跨模态注意引导的DIBR三维图像水印
IF 2.6 4区 计算机科学
Journal of Visual Communication and Image Representation Pub Date : 2025-03-30 DOI: 10.1016/j.jvcir.2025.104455
Qingmo Chen , Zhang Wang , Zhouyan He , Ting Luo , Jiangtao Huang
{"title":"Multi-level cross-modal attention guided DIBR 3D image watermarking","authors":"Qingmo Chen ,&nbsp;Zhang Wang ,&nbsp;Zhouyan He ,&nbsp;Ting Luo ,&nbsp;Jiangtao Huang","doi":"10.1016/j.jvcir.2025.104455","DOIUrl":"10.1016/j.jvcir.2025.104455","url":null,"abstract":"<div><div>For depth-image-based rendering (DIBR) 3D images, both center and synthesized virtual views are subject to illegal distribution during transmission. To address the issue of copyright protection of DIBR 3D images, we propose a multi-level cross-modal attention guided network (MCANet) for 3D image watermarking. To optimize the watermark embedding process, the watermark adjustment module (WAM) is designed to extract cross-modal information at different scales, thereby calculating 3D image attention to adjust the watermark distribution. Furthermore, the nested dual output U-net (NDOU) is devised to enhance the compensatory capability of the skip connections, thus providing an effective global feature to the up-sampling process for high image quality. Compared to state-of-the-art (SOTA) 3D image watermarking methods, the proposed watermarking model shows superior performance in terms of robustness and imperceptibility.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"109 ","pages":"Article 104455"},"PeriodicalIF":2.6,"publicationDate":"2025-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143746691","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
LDINet: Latent decomposition-interpolation for single image fast-moving objects deblatting LDINet:单幅快速运动物体去噪的潜在分解插值
IF 2.6 4区 计算机科学
Journal of Visual Communication and Image Representation Pub Date : 2025-03-25 DOI: 10.1016/j.jvcir.2025.104439
Haodong Fan, Dingyi Zhang, Yunlong Yu, Yingming Li
{"title":"LDINet: Latent decomposition-interpolation for single image fast-moving objects deblatting","authors":"Haodong Fan,&nbsp;Dingyi Zhang,&nbsp;Yunlong Yu,&nbsp;Yingming Li","doi":"10.1016/j.jvcir.2025.104439","DOIUrl":"10.1016/j.jvcir.2025.104439","url":null,"abstract":"<div><div>The image of fast-moving objects (FMOs) usually contains a blur stripe indicating the blurred object that is mixed with the background. In this work we propose a novel Latent Decomposition-Interpolation Network (LDINet) to generate the appearances and shapes of the objects from the blurry stripe contained in the single image. In particular, we introduce an Decomposition-Interpolation Module (DIM) to break down the feature maps of the inputs into discrete time indexed parts and interpolate the target latent frames according to the provided time indexes with affine transformations, where the features are categorized into the scalar-like and gradient-like parts when warping in the interpolation. Finally, a decoder renders the prediction results. In addition, based on the results, a Refining Conditional Deblatting (RCD) approach is presented to further enhance the fidelity. Extensive experiments are conducted and have shown that the proposed methods achieve superior performances compared to the existing competing methods.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"109 ","pages":"Article 104439"},"PeriodicalIF":2.6,"publicationDate":"2025-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143725251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
VDD: Varied Drone Dataset for semantic segmentation VDD:用于语义分割的可变无人机数据集
IF 2.6 4区 计算机科学
Journal of Visual Communication and Image Representation Pub Date : 2025-03-22 DOI: 10.1016/j.jvcir.2025.104429
Wenxiao Cai, Ke Jin, Jinyan Hou, Cong Guo, Letian Wu, Wankou Yang
{"title":"VDD: Varied Drone Dataset for semantic segmentation","authors":"Wenxiao Cai,&nbsp;Ke Jin,&nbsp;Jinyan Hou,&nbsp;Cong Guo,&nbsp;Letian Wu,&nbsp;Wankou Yang","doi":"10.1016/j.jvcir.2025.104429","DOIUrl":"10.1016/j.jvcir.2025.104429","url":null,"abstract":"<div><div>Semantic segmentation of drone images is critical for various aerial vision tasks as it provides essential semantic details to understand scenes on the ground. Ensuring high accuracy of semantic segmentation models for drones requires access to diverse, large-scale, and high-resolution datasets, which are often scarce in the field of aerial image processing. While existing datasets typically focus on urban scenes and are relatively small, our Varied Drone Dataset (VDD) addresses these limitations by offering a large-scale, densely labeled collection of 400 high-resolution images spanning 7 classes. This dataset features various scenes in urban, industrial, rural, and natural areas, captured from different camera angles and under diverse lighting conditions. We also make new annotations to UDD (Chen et al., 2018) and UAVid (Lyu et al., 2018), integrating them under VDD annotation standards, to create the Integrated Drone Dataset (IDD). We train seven state-of-the-art models on drone datasets as baselines. It is expected that our dataset will generate considerable interest in drone image segmentation and serve as a foundation for other drone vision tasks. Datasets are publicly available at <span><span>https://github.com/RussRobin/VDD</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"109 ","pages":"Article 104429"},"PeriodicalIF":2.6,"publicationDate":"2025-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143739111","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
RBMark: Robust and blind video watermark in DT CWT domain RBMark: DT CWT域的鲁棒盲视频水印
IF 2.6 4区 计算机科学
Journal of Visual Communication and Image Representation Pub Date : 2025-03-22 DOI: 10.1016/j.jvcir.2025.104438
I.-Chun Huang , Ji-Yan Wu , Wei Tsang Ooi
{"title":"RBMark: Robust and blind video watermark in DT CWT domain","authors":"I.-Chun Huang ,&nbsp;Ji-Yan Wu ,&nbsp;Wei Tsang Ooi","doi":"10.1016/j.jvcir.2025.104438","DOIUrl":"10.1016/j.jvcir.2025.104438","url":null,"abstract":"<div><div>Video watermark embedding algorithms based on the transform domain are robust against media processing methods but are limited in data embedding capacity. Learning-based watermarking algorithms have recently become increasingly popular because of their good performance in image feature extraction and data embedding. However, they are not consistently robust against various video processing methods. To effectively trade off the embedding capacity and robustness, this paper proposes RBMark, a novel video watermarking method based on Dual-Tree Complex Wavelet Transform (DT CWT). First, the watermark bit-stream is transformed into a key image in the embedding phase. Second, we extract the DT CWT domain coefficients from both the video frame and the key image and embed the key image coefficients into the video frame coefficients. During the extraction phase, the key image coefficients are extracted to perform inverse DT CWT and reconstruct the watermark bit-stream. Compared with prior watermarking algorithms, RBMark achieves higher robustness and embedding capacity. We evaluated its performance using multiple representative video datasets to compare with prior transform domain and learning-based watermarking algorithms. Experimental results demonstrate that RBMark achieves up to 98% and 99% improvement in Bit Error Rate over the transform domain and learning-based methods, respectively. Furthermore, RBMark can embed at most 2040 bits in each 1080p-resolution video frame (i.e., <span><math><mrow><mn>9</mn><mo>.</mo><mn>84</mn><mo>×</mo><mn>1</mn><msup><mrow><mn>0</mn></mrow><mrow><mo>−</mo><mn>4</mn></mrow></msup></mrow></math></span> bits per pixel). The source code is available in <span><span>this URL</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"109 ","pages":"Article 104438"},"PeriodicalIF":2.6,"publicationDate":"2025-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143705134","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improving model generalization by on-manifold adversarial augmentation in the frequency domain 利用频域上流形对抗增广改进模型泛化
IF 2.6 4区 计算机科学
Journal of Visual Communication and Image Representation Pub Date : 2025-03-21 DOI: 10.1016/j.jvcir.2025.104437
Chang Liu , Wenzhao Xiang , Yuan He , Hui Xue , Shibao Zheng , Hang Su
{"title":"Improving model generalization by on-manifold adversarial augmentation in the frequency domain","authors":"Chang Liu ,&nbsp;Wenzhao Xiang ,&nbsp;Yuan He ,&nbsp;Hui Xue ,&nbsp;Shibao Zheng ,&nbsp;Hang Su","doi":"10.1016/j.jvcir.2025.104437","DOIUrl":"10.1016/j.jvcir.2025.104437","url":null,"abstract":"<div><div>Deep Neural Networks (DNNs) often suffer from performance drops when training and test data distributions differ. Ensuring model generalization for Out-Of-Distribution (OOD) data is crucial, but current models still struggle with accuracy on such data. Recent studies have shown that regular or off-manifold adversarial examples as data augmentation improve OOD generalization. Building on this, we provide theoretical validation that on-manifold adversarial examples can enhance OOD generalization even more. However, generating these examples is challenging due to the complexity of real manifolds. To address this, we propose AdvWavAug, an on-manifold adversarial data augmentation method using a Wavelet module. This approach, based on the AdvProp training framework, leverages wavelet transformation to project an image into the wavelet domain and modifies it within the estimated data manifold. Experiments on various models and datasets, including ImageNet and its distorted versions, show that our method significantly improves model generalization, especially for OOD data.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"109 ","pages":"Article 104437"},"PeriodicalIF":2.6,"publicationDate":"2025-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143686193","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-TuneV: Fine-tuning the fusion of multiple modules for video action recognition Multi-TuneV:对多个模块的融合进行微调,实现视频动作识别
IF 2.6 4区 计算机科学
Journal of Visual Communication and Image Representation Pub Date : 2025-03-20 DOI: 10.1016/j.jvcir.2025.104441
Xinyuan Liu, Junyong Ye, Jingjing Wang, Guangyi Xu, Youwei Li, Chaoming Zheng
{"title":"Multi-TuneV: Fine-tuning the fusion of multiple modules for video action recognition","authors":"Xinyuan Liu,&nbsp;Junyong Ye,&nbsp;Jingjing Wang,&nbsp;Guangyi Xu,&nbsp;Youwei Li,&nbsp;Chaoming Zheng","doi":"10.1016/j.jvcir.2025.104441","DOIUrl":"10.1016/j.jvcir.2025.104441","url":null,"abstract":"<div><div>The current pre-trained models have achieved remarkable success, but they usually have complex structures and hundreds of millions of parameters, resulting in a huge computational resource requirement to train or fully fine-tune a pre-trained model, which limits its transfer learning on different tasks. In order to migrate pre-trained models to the field of Video Action Recognition (VAR), recent research uses parametric efficient transfer learning (PETL) approaches, while most of them are studied on a single fine-tuning module. For a complex task like VAR, a single fine-tuning method may not achieve optimal results. To address this challenge, we want to study the effect of joint fine-tuning with multiple modules, so we propose a method that merges multiple fine-tuning modules, namely Multi-TuneV. It combines five fine-tuning methods, including ST-Adapter, AdaptFormer, BitFit, VPT and LoRA. We design a particular architecture for Multi-TuneV and integrate it organically into the Video ViT model so that it can coordinate the multiple fine-tuning modules to extract features. Multi-TuneV enables pre-trained models to migrate to video classification tasks while maintaining improved accuracy and effectively limiting the number of tunable parameters, because it combines the advantages of five fine-tuning methods. We conduct extensive experiments with Multi-TuneV on three common video datasets, and show that it surpasses both full fine-tuning and other single fine-tuning methods. When only 18.7 % (16.09 M) of the full fine-tuning parameters are updated, the accuracy of Multi-TuneV on SSv2 and HMDB51 improve by 23.43 % and 16.46 % compared with the full fine-tuning strategy, and improve to 67.43 % and 75.84 %. This proves the effectiveness of joint multi-module fine-tuning. Multi-TuneV provides a new idea for PETL and a new perspective to address the challenge in video understanding tasks. Code is available at <span><span>https://github.com/hhh123-1/Multi-TuneV</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"109 ","pages":"Article 104441"},"PeriodicalIF":2.6,"publicationDate":"2025-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143686194","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DBFAM: A dual-branch network with efficient feature fusion and attention-enhanced gating for medical image segmentation DBFAM:一种具有高效特征融合和注意力增强门控的双分支网络,用于医学图像分割
IF 2.6 4区 计算机科学
Journal of Visual Communication and Image Representation Pub Date : 2025-03-17 DOI: 10.1016/j.jvcir.2025.104434
Benzhe Ren , Yuhui Zheng , Zhaohui Zheng , Jin Ding , Tao Wang
{"title":"DBFAM: A dual-branch network with efficient feature fusion and attention-enhanced gating for medical image segmentation","authors":"Benzhe Ren ,&nbsp;Yuhui Zheng ,&nbsp;Zhaohui Zheng ,&nbsp;Jin Ding ,&nbsp;Tao Wang","doi":"10.1016/j.jvcir.2025.104434","DOIUrl":"10.1016/j.jvcir.2025.104434","url":null,"abstract":"<div><div>In the field of medical image segmentation, convolutional neural networks (CNNs) and transformer networks have garnered significant attention due to their unique advantages. However, CNNs have limitations in modeling long-range dependencies, while transformers are constrained by their quadratic computational complexity. Recently, State Space Models (SSMs), exemplified by Mamba, have emerged as a promising approach. These models excel in capturing long-range interactions while maintaining linear computational complexity. This paper proposes a dual-branch parallel network that combines CNNs with Visual State Space Models (VSSMs). The two branches of the encoder separately capture local and global information. To further leverage the intricate relationships between local and global features, a dual-branch local–global feature fusion module is introduced, effectively integrating features from both branches. Additionally, an Attention-Enhanced Gated Module is proposed to replace traditional skip connections, aiming to improve the alignment of information transfer between the encoder and decoder. Extensive experiments on multiple datasets validate the effectiveness of our method.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"109 ","pages":"Article 104434"},"PeriodicalIF":2.6,"publicationDate":"2025-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143686192","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cell tracking-by-detection using elliptical bounding boxes 使用椭圆边界框的细胞检测跟踪
IF 2.6 4区 计算机科学
Journal of Visual Communication and Image Representation Pub Date : 2025-03-14 DOI: 10.1016/j.jvcir.2025.104425
Lucas N. Kirsten, Cláudio R. Jung
{"title":"Cell tracking-by-detection using elliptical bounding boxes","authors":"Lucas N. Kirsten,&nbsp;Cláudio R. Jung","doi":"10.1016/j.jvcir.2025.104425","DOIUrl":"10.1016/j.jvcir.2025.104425","url":null,"abstract":"<div><div>Cell detection and tracking are crucial for bio-analysis. Current approaches rely on the tracking-by-model evolution paradigm, where end-to-end deep learning models are trained for cell detection and tracking. However, such methods require extensive amounts of annotated data, which is time-consuming and often requires specialized annotators. The proposed method involves approximating cell shapes as oriented ellipses and utilizing generic-purpose-oriented object detectors for cell detection to alleviate the requirement of annotated data. A global data association algorithm is then employed to explore temporal cell similarity using probability distance metrics, considering that the ellipses relate to two-dimensional Gaussian distributions. The results of this study suggest that the proposed tracking-by-detection paradigm is a viable alternative for cell tracking. The method achieves competitive results and reduces the dependency on extensive annotated data, addressing a common challenge in current cell detection and tracking approaches. Our code is publicly available at <span><span>https://github.com/LucasKirsten/Deep-Cell-Tracking-EBB</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"108 ","pages":"Article 104425"},"PeriodicalIF":2.6,"publicationDate":"2025-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143631697","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Transformer-based weakly supervised 3D human pose estimation 基于变形金刚的弱监督三维人体姿态估计
IF 2.6 4区 计算机科学
Journal of Visual Communication and Image Representation Pub Date : 2025-03-14 DOI: 10.1016/j.jvcir.2025.104432
Xiao-guang Wu , Hu-jie Xie , Xiao-chen Niu , Chen Wang , Ze-lei Wang , Shi-wen Zhang , Yu-ze Shan
{"title":"Transformer-based weakly supervised 3D human pose estimation","authors":"Xiao-guang Wu ,&nbsp;Hu-jie Xie ,&nbsp;Xiao-chen Niu ,&nbsp;Chen Wang ,&nbsp;Ze-lei Wang ,&nbsp;Shi-wen Zhang ,&nbsp;Yu-ze Shan","doi":"10.1016/j.jvcir.2025.104432","DOIUrl":"10.1016/j.jvcir.2025.104432","url":null,"abstract":"<div><div>Deep learning-based 3D human pose estimation methods typically require large amounts of 3D pose annotations. However, due to limitations in data quality and the scarcity of 3D labeled data, researchers have adopted weak supervision methods to reduce the demand for annotated data. Compared to traditional approaches, Transformers have recently achieved remarkable success in 3D human pose estimation. Leveraging their powerful modeling and generalization capabilities, Transformers effectively capture patterns and features in the data, even under limited data conditions, mitigating the issue of data scarcity. Nonetheless, the Transformer architecture struggles to capture long-term dependencies and spatio-temporal correlations between joints when processing spatio-temporal features, which limits its ability to model temporal and spatial relationships comprehensively. To address these challenges and better utilize limited labeled data under weak supervision, we proposed an improved Transformer-based model. By grouping joints according to body parts, we enhanced the spatio-temporal correlations between joints. Additionally, the integration of LSTM captures long-term dependencies, improving temporal sequence modeling and enabling the generation of accurate 3D poses from limited data. These structural improvements, combined with weak supervision strategies, enhance the model’s performance while reducing the reliance on extensive 3D annotations. Furthermore, a multi-hypothesis strategy and temporal smoothness consistency constraints were employed to regulate variations between adjacent time steps. Comparisons on the Human3.6M and HumanEva datasets validate the effectiveness of our approach.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"109 ","pages":"Article 104432"},"PeriodicalIF":2.6,"publicationDate":"2025-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143654646","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hierarchical boundary feature alignment network for video salient object detection 视频显著目标检测的分层边界特征对齐网络
IF 2.6 4区 计算机科学
Journal of Visual Communication and Image Representation Pub Date : 2025-03-13 DOI: 10.1016/j.jvcir.2025.104435
Amin Mao , Jiebin Yan , Yuming Fang , Hantao Liu
{"title":"Hierarchical boundary feature alignment network for video salient object detection","authors":"Amin Mao ,&nbsp;Jiebin Yan ,&nbsp;Yuming Fang ,&nbsp;Hantao Liu","doi":"10.1016/j.jvcir.2025.104435","DOIUrl":"10.1016/j.jvcir.2025.104435","url":null,"abstract":"<div><div>The deep learning based video salient object detection (VSOD) models have achieved great success in the past few years, however, these VSOD models still suffer from the following two problems: i) struggle in accurately predicting those pixels surrounding salient objects; ii) unaligned features of different scales lead to deviations in feature fusion. To tackle these problems, we propose a hierarchical boundary feature alignment network (HBFA). Specifically, the proposed HBFA consists of a temporal–spatial fusion module (TSM) and three decoding branches. TSM captures multi-scale spatiotemporal information. The two boundary feature branches are used to guide the whole network to pay more attention to the boundary of salient objects, while the feature alignment branch is capable of fusing the features from the internal and external branches while aligning features across different scales. Our extensive experiments show that the proposed method reaches a new state-of-the-art performance.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"109 ","pages":"Article 104435"},"PeriodicalIF":2.6,"publicationDate":"2025-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143686195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信