International Journal of Computer Vision最新文献

筛选
英文 中文
Parameter Efficient Fine-Tuning for Multi-modal Generative Vision Models with Möbius-Inspired Transformation
IF 19.5 2区 计算机科学
International Journal of Computer Vision Pub Date : 2025-03-13 DOI: 10.1007/s11263-025-02398-3
Haoran Duan, Shuai Shao, Bing Zhai, Tejal Shah, Jungong Han, Rajiv Ranjan
{"title":"Parameter Efficient Fine-Tuning for Multi-modal Generative Vision Models with Möbius-Inspired Transformation","authors":"Haoran Duan, Shuai Shao, Bing Zhai, Tejal Shah, Jungong Han, Rajiv Ranjan","doi":"10.1007/s11263-025-02398-3","DOIUrl":"https://doi.org/10.1007/s11263-025-02398-3","url":null,"abstract":"<p>The rapid development of multimodal generative vision models has drawn scientific curiosity. Notable advancements, such as OpenAI’s ChatGPT and Stable Diffusion, demonstrate the potential of combining multimodal data for generative content. Nonetheless, customising these models to specific domains or tasks is challenging due to computational costs and data requirements. Conventional fine-tuning methods take redundant processing resources, motivating the development of parameter-efficient fine-tuning technologies such as adapter module, low-rank factorization and orthogonal fine-tuning. These solutions selectively change a subset of model parameters, reducing learning needs while maintaining high-quality results. Orthogonal fine-tuning, regarded as a reliable technique, preserves semantic linkages in weight space but has limitations in its expressive powers. To better overcome these constraints, we provide a simple but innovative and effective transformation method inspired by Möbius geometry, which replaces conventional orthogonal transformations in parameter-efficient fine-tuning. This strategy improved fine-tuning’s adaptability and expressiveness, allowing it to capture more data patterns. Our strategy, which is supported by theoretical understanding and empirical validation, outperforms existing approaches, demonstrating competitive improvements in generation quality for key generative tasks.\u0000</p>","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"16 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2025-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143618570","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exemplar-Free Continual Learning of Vision Transformers via Gated Class-Attention and Cascaded Feature Drift Compensation
IF 19.5 2区 计算机科学
International Journal of Computer Vision Pub Date : 2025-03-13 DOI: 10.1007/s11263-025-02374-x
Marco Cotogni, Fei Yang, Claudio Cusano, Andrew D. Bagdanov, Joost van de Weijer
{"title":"Exemplar-Free Continual Learning of Vision Transformers via Gated Class-Attention and Cascaded Feature Drift Compensation","authors":"Marco Cotogni, Fei Yang, Claudio Cusano, Andrew D. Bagdanov, Joost van de Weijer","doi":"10.1007/s11263-025-02374-x","DOIUrl":"https://doi.org/10.1007/s11263-025-02374-x","url":null,"abstract":"<p>Vision transformers (ViTs) have achieved remarkable successes across a broad range of computer vision applications. As a consequence, there has been increasing interest in extending continual learning theory and techniques to ViT architectures. We propose a new method for exemplar-free class incremental training of ViTs. The main challenge of exemplar-free continual learning is maintaining plasticity of the learner without causing catastrophic forgetting of previously learned tasks. This is often achieved via exemplar replay which can help recalibrate previous task classifiers to the feature drift which occurs when learning new tasks. Exemplar replay, however, comes at the cost of retaining samples from previous tasks which for many applications may not be possible. To address the problem of continual ViT training, we first propose <i>gated class-attention</i> to minimize the drift in the final ViT transformer block. This mask-based gating is applied to class-attention mechanism of the last transformer block and strongly regulates the weights crucial for previous tasks. Importantly, gated class-attention does not require the task-ID during inference, which distinguishes it from other parameter isolation methods. Secondly, we propose a new method of <i>feature drift compensation</i> that accommodates feature drift in the backbone when learning new tasks. The combination of gated class-attention and cascaded feature drift compensation allows for plasticity towards new tasks while limiting forgetting of previous ones. Extensive experiments performed on CIFAR-100, Tiny-ImageNet and ImageNet100 demonstrate that our exemplar-free method obtains competitive results when compared to rehearsal based ViT methods.(Code:https://github.com/OcraM17/GCAB-CFDC)</p>","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"21 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2025-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143608063","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Attribute-Centric Compositional Text-to-Image Generation
IF 19.5 2区 计算机科学
International Journal of Computer Vision Pub Date : 2025-03-13 DOI: 10.1007/s11263-025-02371-0
Yuren Cong, Martin Renqiang Min, Li Erran Li, Bodo Rosenhahn, Michael Ying Yang
{"title":"Attribute-Centric Compositional Text-to-Image Generation","authors":"Yuren Cong, Martin Renqiang Min, Li Erran Li, Bodo Rosenhahn, Michael Ying Yang","doi":"10.1007/s11263-025-02371-0","DOIUrl":"https://doi.org/10.1007/s11263-025-02371-0","url":null,"abstract":"<p>Despite the recent impressive breakthroughs in text-to-image generation, generative models have difficulty in capturing the data distribution of underrepresented attribute compositions while over-memorizing overrepresented attribute compositions, which raises public concerns about their robustness and fairness. To tackle this challenge, we propose <b>ACTIG</b>, an attribute-centric compositional text-to-image generation framework. We present an attribute-centric feature augmentation and a novel image-free training scheme, which greatly improves model’s ability to generate images with underrepresented attributes. We further propose an attribute-centric contrastive loss to avoid overfitting to overrepresented attribute compositions. We validate our framework on the CelebA-HQ and CUB datasets. Extensive experiments show that the compositional generalization of ACTIG is outstanding, and our framework outperforms previous works in terms of image quality and text-image consistency. The source code and trained models are publicly available at https://github.com/yrcong/ACTIG.</p>","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"23 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2025-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143618568","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
UniFace++: Revisiting a Unified Framework for Face Reenactment and Swapping via 3D Priors
IF 19.5 2区 计算机科学
International Journal of Computer Vision Pub Date : 2025-03-11 DOI: 10.1007/s11263-025-02395-6
Chao Xu, Yijie Qian, Shaoting Zhu, Baigui Sun, Jian Zhao, Yong Liu, Xuelong Li
{"title":"UniFace++: Revisiting a Unified Framework for Face Reenactment and Swapping via 3D Priors","authors":"Chao Xu, Yijie Qian, Shaoting Zhu, Baigui Sun, Jian Zhao, Yong Liu, Xuelong Li","doi":"10.1007/s11263-025-02395-6","DOIUrl":"https://doi.org/10.1007/s11263-025-02395-6","url":null,"abstract":"<p>Face reenactment and swapping share a similar pattern of identity and attribute manipulation. Our previous work UniFace has preliminarily explored establishing a unification between the two at the feature level, but it heavily relies on the accuracy of feature disentanglement, and GANs are also unstable during training. In this work, we delve into the intrinsic connections between the two from a more general training paradigm perspective, introducing a novel diffusion-based unified method UniFace++. Specifically, this work combines the advantages of each, <i>i.e.</i>, stability of reconstruction training from reenactment, simplicity and effectiveness of the target-oriented processing from swapping, and redefining both as target-oriented reconstruction tasks. In this way, face reenactment avoids complex source feature deformation and face swapping mitigates the unstable seesaw-style optimization. The core of our approach is the rendered face obtained from reassembled 3D facial priors serving as the target pivot, which contains precise geometry and coarse identity textures. We further incorporate it with the proposed Texture-Geometry-aware Diffusion Model (TGDM) to perform texture transfer under the reconstruction supervision for high-fidelity face synthesis. Extensive quantitative and qualitative experiments demonstrate the superiority of our method for both tasks.</p>","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"20 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2025-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143599231","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Investigating Self-Supervised Methods for Label-Efficient Learning
IF 19.5 2区 计算机科学
International Journal of Computer Vision Pub Date : 2025-03-10 DOI: 10.1007/s11263-025-02397-4
Srinivasa Rao Nandam, Sara Atito, Zhenhua Feng, Josef Kittler, Muhammed Awais
{"title":"Investigating Self-Supervised Methods for Label-Efficient Learning","authors":"Srinivasa Rao Nandam, Sara Atito, Zhenhua Feng, Josef Kittler, Muhammed Awais","doi":"10.1007/s11263-025-02397-4","DOIUrl":"https://doi.org/10.1007/s11263-025-02397-4","url":null,"abstract":"<p>Vision transformers combined with self-supervised learning have enabled the development of models which scale across large datasets for several downstream tasks, including classification, segmentation, and detection. However, the potential of these models for low-shot learning across several downstream tasks remains largely under explored. In this work, we conduct a systematic examination of different self-supervised pretext tasks, namely contrastive learning, clustering, and masked image modelling, to assess their low-shot capabilities by comparing different pretrained models. In addition, we explore the impact of various collapse avoidance techniques, such as centring, ME-MAX, and sinkhorn, on these downstream tasks. Based on our detailed analysis, we introduce a framework that combines mask image modelling and clustering as pretext tasks. This framework demonstrates superior performance across all examined low-shot downstream tasks, including multi-class classification, multi-label classification and semantic segmentation. Furthermore, when testing the model on large-scale datasets, we show performance gains in various tasks.\u0000</p>","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"2 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2025-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143583011","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Semantics-Conditioned Generative Zero-Shot Learning via Feature Refinement
IF 19.5 2区 计算机科学
International Journal of Computer Vision Pub Date : 2025-03-10 DOI: 10.1007/s11263-025-02394-7
Shiming Chen, Ziming Hong, Xinge You, Ling Shao
{"title":"Semantics-Conditioned Generative Zero-Shot Learning via Feature Refinement","authors":"Shiming Chen, Ziming Hong, Xinge You, Ling Shao","doi":"10.1007/s11263-025-02394-7","DOIUrl":"https://doi.org/10.1007/s11263-025-02394-7","url":null,"abstract":"<p>Generative zero-shot learning (ZSL) recognizes novel categories by employing a cross-modal generative model conditioned on semantic factors (such as attributes) to transfer knowledge from seen classes to unseen ones. Many existing generative ZSL methods rely solely on feature extraction models pre-trained on ImageNet, disregarding the cross-dataset bias between ImageNet and ZSL benchmarks. This bias inevitably leads to suboptimal visual features that lack semantic relevance to the predefined attributes, constraining the generator’s ability to synthesize semantically meaningful visual features for generative ZSL. In this paper, we introduce a visual feature refinement method (ViFR) to mitigate cross-dataset bias and advance generative ZSL. Given a generative ZSL model, ViFR incorporates both pre-feature refinement (Pre-FR) and post-feature refinement (Post-FR) modules to simultaneously enhance visual features. In Pre-FR, ViFR aims to learn attribute localization for discriminative visual feature representations using an attribute-guided attention mechanism optimized with attribute-based cross-entropy loss. In Post-FR, ViFR learns an effective visual<span>(rightarrow )</span>semantic mapping by integrating the semantic-conditioned generator into a unified generative model to enhance visual features. Additionally, we propose a self-adaptive margin center loss (SAMC-loss) that collaborates with semantic cycle-consistency loss to guide Post-FR in learning class- and semantically-relevant representations. The features in Post-FR are concatenated to form fully refined visual features for ZSL classification. Extensive experiments on benchmark datasets (i.e., CUB, SUN, and AWA2) demonstrate that ViFR outperforms state-of-the-art ZSL approaches. Our implementation is publicly available at https://github.com/shiming-chen/ViFR.</p>","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"13 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2025-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143583010","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deep Hierarchical Learning for 3D Semantic Segmentation
IF 19.5 2区 计算机科学
International Journal of Computer Vision Pub Date : 2025-03-06 DOI: 10.1007/s11263-025-02387-6
Chongshou Li, Yuheng Liu, Xinke Li, Yuning Zhang, Tianrui Li, Junsong Yuan
{"title":"Deep Hierarchical Learning for 3D Semantic Segmentation","authors":"Chongshou Li, Yuheng Liu, Xinke Li, Yuning Zhang, Tianrui Li, Junsong Yuan","doi":"10.1007/s11263-025-02387-6","DOIUrl":"https://doi.org/10.1007/s11263-025-02387-6","url":null,"abstract":"<p>The inherent structure of human cognition facilitates the hierarchical organization of semantic categories for three-dimensional objects, simplifying the visual world into distinct and manageable layers. A vivid example is observed in the animal-taxonomy domain, where distinctions are not only made between broader categories like birds and mammals but also within subcategories such as different bird species, illustrating the depth of human hierarchical processing. This observation bridges to the computational realm as this paper presents deep hierarchical learning (DHL) on 3D data. By formulating a probabilistic representation, our proposed DHL lays a pioneering theoretical foundation for hierarchical learning (HL) in visual tasks. Addressing the primary challenges in effectiveness and generality of DHL for 3D data, we 1) introduce a hierarchical regularization term to connect hierarchical coherence across the predictions with the classification loss; 2) develop a general deep learning framework with a hierarchical embedding fusion module for enhanced hierarchical embedding learning; and 3) devise a novel method for constructing class hierarchies in datasets with non-hierarchical labels, leveraging recent vision language models. A novel hierarchy quality indicator, CH-MOS, supported by questionnaire-based surveys, is developed to evaluate the semantic explainability of the generated class hierarchy for human understanding. Our methodology’s validity is confirmed through extensive experiments on multiple datasets for 3D object and scene point cloud semantic segmentation tasks, demonstrating DHL’s capability in parsing 3D data across various hierarchical levels. This evidence suggests DHL’s potential for broader applicability to a wide range of tasks.</p>","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"56 81 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2025-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143561192","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Temporal Transductive Inference for Few-Shot Video Object Segmentation
IF 19.5 2区 计算机科学
International Journal of Computer Vision Pub Date : 2025-03-06 DOI: 10.1007/s11263-025-02390-x
Mennatullah Siam
{"title":"Temporal Transductive Inference for Few-Shot Video Object Segmentation","authors":"Mennatullah Siam","doi":"10.1007/s11263-025-02390-x","DOIUrl":"https://doi.org/10.1007/s11263-025-02390-x","url":null,"abstract":"<p>Few-shot video object segmentation (FS-VOS) aims at segmenting video frames using a few labelled examples of classes not seen during initial training. In this paper, we present a simple but effective temporal transductive inference (TTI) approach that leverages temporal consistency in the unlabelled video frames during few-shot inference without episodic training. Key to our approach is the use of a video-level temporal constraint that augments frame-level constraints. The objective of the video-level constraint is to learn consistent linear classifiers for novel classes across the image sequence. It acts as a spatiotemporal regularizer during the transductive inference to increase temporal coherence and reduce overfitting on the few-shot support set. Empirically, our approach outperforms state-of-the-art meta-learning approaches in terms of mean intersection over union on YouTube-VIS by 2.5%. In addition, we introduce an improved benchmark dataset that is exhaustively labelled (i.e., all object occurrences are labelled, unlike the currently available). Our empirical results and temporal consistency analysis confirm the added benefits of the proposed spatiotemporal regularizer to improve temporal coherence. Our code and benchmark dataset is publicly available at, https://github.com/MSiam/tti_fsvos/.\u0000</p>","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"24 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2025-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143560823","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Part-Whole Relational Fusion Towards Multi-Modal Scene Understanding
IF 19.5 2区 计算机科学
International Journal of Computer Vision Pub Date : 2025-03-06 DOI: 10.1007/s11263-025-02393-8
Yi Liu, Chengxin Li, Shoukun Xu, Jungong Han
{"title":"Part-Whole Relational Fusion Towards Multi-Modal Scene Understanding","authors":"Yi Liu, Chengxin Li, Shoukun Xu, Jungong Han","doi":"10.1007/s11263-025-02393-8","DOIUrl":"https://doi.org/10.1007/s11263-025-02393-8","url":null,"abstract":"<p>Multi-modal fusion has played a vital role in multi-modal scene understanding. Most existing methods focus on cross-modal fusion involving two modalities, often overlooking more complex multi-modal fusion, which is essential for real-world applications like autonomous driving, where visible, depth, event, LiDAR, etc., are used. Besides, few attempts for multi-modal fusion, e.g., simple concatenation, cross-modal attention, and token selection, cannot well dig into the intrinsic shared and specific details of multiple modalities. To tackle the challenge, in this paper, we propose a Part-Whole Relational Fusion (PWRF) framework. For the first time, this framework treats multi-modal fusion as part-whole relational fusion. It routes multiple individual part-level modalities to a fused whole-level modality using the part-whole relational routing ability of Capsule Networks (CapsNets). Through this part-whole routing, our PWRF generates modal-shared and modal-specific semantics from the whole-level modal capsules and the routing coefficients, respectively. On top of that, modal-shared and modal-specific details can be employed to solve the issue of multi-modal scene understanding, including synthetic multi-modal segmentation and visible-depth-thermal salient object detection in this paper. Experiments on several datasets demonstrate the superiority of the proposed PWRF framework for multi-modal scene understanding. The source code has been released on https://github.com/liuyi1989/PWRF.</p>","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"33 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2025-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143561193","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
UMSCS: A Novel Unpaired Multimodal Image Segmentation Method Via Cross-Modality Generative and Semi-supervised Learning
IF 19.5 2区 计算机科学
International Journal of Computer Vision Pub Date : 2025-03-06 DOI: 10.1007/s11263-025-02389-4
Feiyang Yang, Xiongfei Li, Bo Wang, Peihong Teng, Guifeng Liu
{"title":"UMSCS: A Novel Unpaired Multimodal Image Segmentation Method Via Cross-Modality Generative and Semi-supervised Learning","authors":"Feiyang Yang, Xiongfei Li, Bo Wang, Peihong Teng, Guifeng Liu","doi":"10.1007/s11263-025-02389-4","DOIUrl":"https://doi.org/10.1007/s11263-025-02389-4","url":null,"abstract":"<p>Multimodal medical image segmentation is crucial for enhancing diagnostic accuracy in various clinical settings. However, due to the difficulty of obtaining complete data in real clinical settings, the use of unpaired and unlabeled multimodal data is severely limited. This results in unpaired data being unusable as simultaneous input for models due to spatial misalignments and morphological differences, and unlabeled data failing to provide effective supervisory signals for models. To alleviate these issues, we propose a semi-supervised multimodal segmentation method based on cross-modal generative that seamlessly integrates image translation and segmentation stages. In the cross-modalities generative stage, we employ adversarial learning to discern the latent anatomical correlations across various modalities, followed by maintaining a balance between semantic consistency and structural consistency in image translation through region-aware constraints and cross-modal structural information contrastive learning with dynamic weight adjustment. In the segmentation stage, we employ a teacher-student semi-supervised learning (SSL) framework where the student network distills multimodal knowledge from the teacher network and utilizes unlabeled source data to enhance the supervisory signal. Experimental results demonstrate that our proposed method achieves state-of-the-art performance in extensive experiments on the segmentation tasks of cardiac substructures and multi-organs abdominal, outperforming other competitive methods.</p>","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"87 1 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2025-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143570484","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信