Image and Vision Computing最新文献

筛选
英文 中文
Design of a novel fuzzy ensemble CNN framework for ovarian cancer classification using Tissue Microarray images 基于组织微阵列图像的卵巢癌分类模糊集成CNN框架设计
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2025-06-19 DOI: 10.1016/j.imavis.2025.105604
Thien B. Nguyen-Tat , Anh T. Vu-Xuan , Vuong M. Ngo
{"title":"Design of a novel fuzzy ensemble CNN framework for ovarian cancer classification using Tissue Microarray images","authors":"Thien B. Nguyen-Tat ,&nbsp;Anh T. Vu-Xuan ,&nbsp;Vuong M. Ngo","doi":"10.1016/j.imavis.2025.105604","DOIUrl":"10.1016/j.imavis.2025.105604","url":null,"abstract":"<div><h3>Background:</h3><div>Ovarian cancer remains a significant health concern, with a high mortality rate often attributed to late diagnosis. Tissue Microarray (TMA) images offer a cost-effective diagnostic tool, but their manual analysis is time-consuming and requires expert interpretation. To address this, we aim to develop an automated deep learning solution.</div></div><div><h3>Purpose:</h3><div>This study seeks to develop a robust deep learning method for classifying ovarian cancer TMA images. Specifically, we compare the performance of different Convolutional Neural Network (CNN) architectures and propose an improved ensemble model to enhance diagnostic accuracy and streamline the clinical workflow.</div></div><div><h3>Methods:</h3><div>The training dataset comprises 12,710 TMA images sourced from various repositories. These images were meticulously labeled into five distinct categories, CC, EC, HGSC, LGSC, and MC, using original data sources and expert annotations. In the first stage, we trained five CNN models, including our proposed EOC-Net and four transfer learning models: DenseNet121, EfficientNetB0, InceptionV3, and ResNet50-v2. In the second stage, we constructed a fuzzy rank-based ensemble model utilizing the Gamma function to combine the predictions from the individual models, aiming to optimize overall accuracy.</div></div><div><h3>Results:</h3><div>In the first stage, the models achieved Training Accuracies ranging from 86.95% to 96.29% and Testing Accuracies ranging from 76.25% to 87.05%. Notably, EOC-Net, despite having significantly fewer parameters, emerged as the top-performing model. However, in the second stage, the proposed ensemble model surpassed all individual models, achieving an Accuracy of 88.73%, representing a substantial improvement of 1.68%–12.48%.</div></div><div><h3>Conclusion:</h3><div>Our study underscores the potential of Deep Learning and Ensemble Learning techniques for accurately classifying ovarian cancer TMA images. The ensemble model’s superior performance demonstrates its ability to enhance diagnostic precision, potentially reducing the workload for clinical experts and improving patient outcomes.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"161 ","pages":"Article 105604"},"PeriodicalIF":4.2,"publicationDate":"2025-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144338516","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SFDFNet: Leveraging spatial-frequency deep fusion for RGB-T semantic segmentation SFDFNet:利用空频深度融合实现RGB-T语义分割
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2025-06-18 DOI: 10.1016/j.imavis.2025.105605
Guanhua An , Yuhe Geng , Shengyu Fang , Jichang Guo
{"title":"SFDFNet: Leveraging spatial-frequency deep fusion for RGB-T semantic segmentation","authors":"Guanhua An ,&nbsp;Yuhe Geng ,&nbsp;Shengyu Fang ,&nbsp;Jichang Guo","doi":"10.1016/j.imavis.2025.105605","DOIUrl":"10.1016/j.imavis.2025.105605","url":null,"abstract":"<div><div>Due to the insensitivity to lighting variations, the RGB-Thermal (RGB-T) semantic segmentation models show significant potential in processing images captured under adverse conditions, such as low light and overexposure. Current RGB-T semantic segmentation methods usually rely on complex spatial domain fusion strategies, yet they neglect the complementary frequency characteristics of RGB and thermal modalities. Through frequency analysis, we find that thermal images focus on low-frequency information, while RGB images are rich in high-frequency details. Leveraging these complementary properties, we introduce the Spatial-Frequency Deep Fusion Network (SFDFNet), which employs a dual-stream architecture to enhance RGB-T semantic segmentation. Key innovations include the Distinctive Feature Enhancement Module (DFEM) to improve feature representation in both modalities and the Spatial-Frequency Fusion Module (SFFM), which integrates spatial and frequency features to optimize cross-modal fusion. Extensive experiments on three RGB-T datasets demonstrate the superior performance of our method, both qualitatively and quantitatively, compared to state-of-the-art models.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"161 ","pages":"Article 105605"},"PeriodicalIF":4.2,"publicationDate":"2025-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144335846","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Visual-Aware Text as Query for Referring Video Object Segmentation 视觉感知文本作为参考视频对象分割的查询
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2025-06-17 DOI: 10.1016/j.imavis.2025.105608
Qi Kuang, Ying Chen
{"title":"Visual-Aware Text as Query for Referring Video Object Segmentation","authors":"Qi Kuang,&nbsp;Ying Chen","doi":"10.1016/j.imavis.2025.105608","DOIUrl":"10.1016/j.imavis.2025.105608","url":null,"abstract":"<div><div>Current referring video object segmentation (R-VOS) approaches rely on directly identifying, locating, and segmenting referenced objects from text referring expressions in videos. However, there is inherent ambiguities in text referring expressions that can significantly negatively impact model performance. To address this challenge, a novel R-VOS method taking Visual-Aware Text as Query (VATaQ) is proposed, in which the referring expression is reconstructed with the guidance of visual feature, leading text feature to be highly relevant to the current video, thereby enhancing the clarity of the expressions. Furthermore, a CLIP-side Adapter Module (CAM), which leverages semantically enriched CLIP to enhance the visual feature with more semantic information, thus helping the model achieve a more comprehensive multi-modal representation. Experimental results show that the VATaQ shows outstanding performance on four video benchmark datasets, which outperforms the baseline network by 3.4% on the largest Ref-YouTube-VOS dataset.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"161 ","pages":"Article 105608"},"PeriodicalIF":4.2,"publicationDate":"2025-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144313203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multimodal Sensitive Adaptive Transformer for 3D medical image segmentation 三维医学图像分割的多模态敏感自适应变压器
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2025-06-16 DOI: 10.1016/j.imavis.2025.105606
Zhibing Wang , Wenmin Wang , Nannan Li , Qi Chen , Yifan Zhang , Meng Xiao , Haomei Jia , Shenyong Zhang
{"title":"Multimodal Sensitive Adaptive Transformer for 3D medical image segmentation","authors":"Zhibing Wang ,&nbsp;Wenmin Wang ,&nbsp;Nannan Li ,&nbsp;Qi Chen ,&nbsp;Yifan Zhang ,&nbsp;Meng Xiao ,&nbsp;Haomei Jia ,&nbsp;Shenyong Zhang","doi":"10.1016/j.imavis.2025.105606","DOIUrl":"10.1016/j.imavis.2025.105606","url":null,"abstract":"<div><div>Three-dimensional medical imaging segmentation presents a significant challenge within the field, with the segmentation of multiple organs and lesions in MRI images being particularly demanding. This paper introduces an innovative approach utilizing the Multimodal Sensitive Adaptive Attention (MSAA). We refer to this new structure as the Multimodal Sensitive Adaptive Transformer Network (MSAT), which incorporates downsampling and Multimodal Sensitive Adaptive Attention into the encoding phase and integrate skip connections from different layers, outputs from Multimodal Sensitive Adaptive Attention, and upsampled feature outputs into the decoding phase. The MSAT consists of two primary components. The initial component is designed to extract a richer set of high-dimensional features through an advanced network architecture. This includes integration of different layers skip connections, outputs from the MSAA, and the results of the preceding upsampling layer. The second component features a Multimodal Sensitive Adaptive Attention block, which integrates two types of attention mechanisms: Local Sensitive Adaptive Attention (LSAA) and Spatial Sensitive Adaptive Attention (SSAA). These attention mechanisms work synergistically to blend high and low-dimensional features effectively, thereby enriching the contextual information captured by the model. Our experiments, conducted across several datasets including Synapse, BTCV, ACDC, and the BraTS 2021 dataset, demonstrate that the MSAT outperforms other existing methodologies. The MSAT shows superior segmentation capabilities for 3D multi-organ, cardiac, and brain tumor segmentation tasks.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"161 ","pages":"Article 105606"},"PeriodicalIF":4.2,"publicationDate":"2025-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144291604","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Semi-supervised cross-modality person re-identification based on pseudo label learning 基于伪标签学习的半监督跨模态人再识别
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2025-06-16 DOI: 10.1016/j.imavis.2025.105602
Fei Wu , Ruixuan Zhou , Yang Gao , Yujian Feng , Qinghua Huang , Xiao-Yuan Jing
{"title":"Semi-supervised cross-modality person re-identification based on pseudo label learning","authors":"Fei Wu ,&nbsp;Ruixuan Zhou ,&nbsp;Yang Gao ,&nbsp;Yujian Feng ,&nbsp;Qinghua Huang ,&nbsp;Xiao-Yuan Jing","doi":"10.1016/j.imavis.2025.105602","DOIUrl":"10.1016/j.imavis.2025.105602","url":null,"abstract":"<div><div>Visible-infrared person re-identification (RGB-IR Re-ID) aims to find images of the same identity from different modalities. In practice, multiple person and cameras can provide abundant training samples and non-negligible modality differences makes manual labeling of all samples be impractical. How to accurately re-identify cross-modality pedestrians under the training condition of having few labeled samples and a quantity of unlabeled samples is an important research question. However, person re-identification in this scenario, which we call Semi-Supervised Cross-Modality Re-ID (SSCM Re-ID), has not been well studied. In this paper, we propose a cross-modality pseudo label learning (CPL) framework for SSCM Re-ID task. It consists of three modules: the feature mapping module, the identity alignment module and the pseudo-label generation module. The feature mapping module is designed to extract shared discriminatory features from modality-specific channels, followed by the identity alignment module that aims to align person identities jointly at the global-level and part-level aspects. Finally, the pseudo-label generation module is used to select samples with reliable pseudo labels from the unlabeled samples based on the confidence level. Moreover, we propose the dynamic center-based cross-entropy loss to constrain the distance of similar samples. Experiments on widely used cross-modality Re-ID datasets demonstrate that CPL can achieve the state-of-the-art SSCM Re-ID performance.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"161 ","pages":"Article 105602"},"PeriodicalIF":4.2,"publicationDate":"2025-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144307285","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhancing radiology report generation: A prior knowledge-aware transformer network for effective alignment and fusion of multi-modal radiological data 增强放射学报告生成:用于多模态放射学数据有效对齐和融合的先验知识感知变压器网络
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2025-06-16 DOI: 10.1016/j.imavis.2025.105603
Amaan Izhar, Norisma Idris, Nurul Japar
{"title":"Enhancing radiology report generation: A prior knowledge-aware transformer network for effective alignment and fusion of multi-modal radiological data","authors":"Amaan Izhar,&nbsp;Norisma Idris,&nbsp;Nurul Japar","doi":"10.1016/j.imavis.2025.105603","DOIUrl":"10.1016/j.imavis.2025.105603","url":null,"abstract":"<div><div>Medical imaging and their reports are essential in healthcare, providing crucial insights into internal structures and abnormalities for diagnosis and treatment. However, medical radiology report generation is time-consuming and further complicated by the shortage of expert radiologists. This paper presents a deep learning-based prior knowledge-aware transformer network designed to address the challenges of aligning and fusing medical images with textual data. Our method integrates medical signals of contextual biomedical entities and auxiliary medical knowledge embeddings extracted from reports with the visual features of radiology images to enhance alignment. Further, to tackle the fusion issue, we introduce Prior-Knowledge-Aware-Report-Generator, a novel module with pre-normalization layers designed to improve training stability and efficiency, a prior knowledge-aware cross-attention mechanism to focus on multi-modal unified fused prior knowledge representation of radiology images and medical signals, and a feedforward layer utilizing the SwiGLU gated activation function, enhancing receptive field coverage. This ensures the model effectively incorporates and exploits prior medical knowledge to generate high-quality reports. We evaluate our method using standard natural language generation metrics on three widely used publicly available datasets — IUXRAY, COVCTR, and PGROSS. Our approach achieves average Bleu scores of 0.383, 0.647, and 0.191 for the respective datasets, outperforming existing state-of-the-art methods further evidenced by rigorous ablation and qualitative analysis conducted that taps into the contributions of various components of our model granting relevant clinical insights. The results demonstrate that the fusion of medical signals with radiology images significantly improves report accuracy and alignment with clinical findings, providing valuable assistance to radiologists.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"161 ","pages":"Article 105603"},"PeriodicalIF":4.2,"publicationDate":"2025-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144313091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Point-cloud-based hand gesture recognition using principal component analysis and boundary extraction 基于主成分分析和边界提取的点云手势识别
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2025-06-14 DOI: 10.1016/j.imavis.2025.105625
Yiwen Zhang , Dong An , Dongzhao Yang , Tianxu Xu , Yuxuan He , Qiang Wang , Zhongqi Pan , Yang Yue
{"title":"Point-cloud-based hand gesture recognition using principal component analysis and boundary extraction","authors":"Yiwen Zhang ,&nbsp;Dong An ,&nbsp;Dongzhao Yang ,&nbsp;Tianxu Xu ,&nbsp;Yuxuan He ,&nbsp;Qiang Wang ,&nbsp;Zhongqi Pan ,&nbsp;Yang Yue","doi":"10.1016/j.imavis.2025.105625","DOIUrl":"10.1016/j.imavis.2025.105625","url":null,"abstract":"<div><div>In this work, we introduce a method for hand gesture recognition that utilizes a Time-of-Flight (ToF) camera and 3D point cloud networks. A dataset of hand gesture point clouds, specifically digits 0–9, is created using a ToF depth camera. These data are then subjected to a data compression algorithm, which combines point cloud principal component analysis (PCA) with point cloud boundary extraction. The effectiveness of the proposed data compression algorithm in the context of hand gesture recognition is evaluated using seven different point cloud recognition networks. The experimental results demonstrate that the algorithm not only exhibits generalizability across various point cloud classification models but also significantly reduces the size of the hand gesture point cloud data while maintaining high recognition accuracy.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"161 ","pages":"Article 105625"},"PeriodicalIF":4.2,"publicationDate":"2025-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144307286","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient Mamba: Overcoming the visual limitations of Mamba with innovative structures 高效曼巴:用创新的结构克服曼巴的视觉限制
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2025-06-12 DOI: 10.1016/j.imavis.2025.105569
Wei Xu , Yi Wan , Dong Zhao , Long Zhang
{"title":"Efficient Mamba: Overcoming the visual limitations of Mamba with innovative structures","authors":"Wei Xu ,&nbsp;Yi Wan ,&nbsp;Dong Zhao ,&nbsp;Long Zhang","doi":"10.1016/j.imavis.2025.105569","DOIUrl":"10.1016/j.imavis.2025.105569","url":null,"abstract":"<div><div>Mamba models have emerged as strong competitors to Transformers due to their efficient long-sequence processing and high memory efficiency. However, their state space models (SSMs) suffer from limitations in capturing long-range dependencies, lack of channel interactions, and weak generalization in vision tasks.</div><div>To address these issues, we propose Efficient Mamba (EMB), an innovative framework that enhances SSMs while integrating convolutional neural networks (CNNs) and Transformers to mitigate their inherent drawbacks. The key contributions of EMB are as follows: (1) We introduce the TransSSM module, which incorporates feature flipping and channel shuffle to enhance channel interactions and improve generalization. Additionally, we propose the Window Spatial Attention (WSA) module for precise local feature modeling and Dual Pooling Attention (DPA) to improve global feature modeling and model stability. (2) We design the MFB-SCFB composite structure, which integrates TransSSM, WSA, Inverted Residual Block(IRBs), and convolutional attention modules to facilitate effective global–local feature interaction.</div><div>EMB achieves state-of-the-art (SOTA) performance across multiple vision tasks. For instance, on ImageNet classification, EMB-S/T/N achieves Top-1 accuracies of 78.9%, 76.3%, and 73.5%, with model sizes and FLOPs of 5.9M/1.5G, 2.5M/0.6G, and 1.4M/0.3G, respectively, when trained on a single NVIDIA 4090 GPU.</div><div>Experimental results demonstrate that EMB provides a novel paradigm for efficient vision model design, offering valuable insights for future SSM research.</div><div>Code: <span><span>https://github.com/Xuwei86/EMB/tree/main</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"161 ","pages":"Article 105569"},"PeriodicalIF":4.2,"publicationDate":"2025-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144272414","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deep learning advances in breast medical imaging with a focus on clinical readiness and radiologists’ perspective 深度学习在乳房医学成像方面的进步,重点是临床准备和放射科医生的观点
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2025-06-12 DOI: 10.1016/j.imavis.2025.105601
Oladosu Oyebisi Oladimeji , Abdullah Al-Zubaer Imran , Xiaoqin Wang , Saritha Unnikrishnan
{"title":"Deep learning advances in breast medical imaging with a focus on clinical readiness and radiologists’ perspective","authors":"Oladosu Oyebisi Oladimeji ,&nbsp;Abdullah Al-Zubaer Imran ,&nbsp;Xiaoqin Wang ,&nbsp;Saritha Unnikrishnan","doi":"10.1016/j.imavis.2025.105601","DOIUrl":"10.1016/j.imavis.2025.105601","url":null,"abstract":"<div><div>Breast cancer is the leading cause of death from cancer among women globally. According to the World Health Organization (WHO), early detection and treatment can significantly reduce surgeries and improve survival rates. Since deep learning emerged in 2012, it has garnered significant research interest in breast cancer, particularly for diagnosis, treatment, prognosis, and survival prediction. This review specifically focuses on the application of deep learning to breast image analysis (MRI, mammogram, and ultrasound) with a particular emphasis on radiologist involvement in the evaluation process. Studies published between 2019 and 2024 in the Scopus database will be reviewed. We further explore radiologists’ perspectives on the clinical readiness of artificial intelligence (AI) for breast image analysis. By analyzing insights from published articles, we will discuss the challenges, limitations, and future directions for this evolving field. While the review highlights the promise of deep learning in breast image analysis, it also acknowledges critical issues that must be addressed before widespread clinical integration can be achieved.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"161 ","pages":"Article 105601"},"PeriodicalIF":4.2,"publicationDate":"2025-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144335847","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Memory augmented using diffusion model for class-incremental learning 基于扩散模型的班级增量学习记忆增强
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2025-06-10 DOI: 10.1016/j.imavis.2025.105600
Quentin Jodelet , Xin Liu , Yin Jun Phua , Tsuyoshi Murata
{"title":"Memory augmented using diffusion model for class-incremental learning","authors":"Quentin Jodelet ,&nbsp;Xin Liu ,&nbsp;Yin Jun Phua ,&nbsp;Tsuyoshi Murata","doi":"10.1016/j.imavis.2025.105600","DOIUrl":"10.1016/j.imavis.2025.105600","url":null,"abstract":"<div><div>Class-incremental learning aims to learn new classes in an incremental fashion without forgetting the previously learned ones. Several research works have shown how additional data can be used by incremental models to help mitigate catastrophic forgetting. In this work, following the recent breakthrough in text-to-image generative models and their wide distribution, we propose the use of a pre-trained Diffusion Model as a source of additional data for class-incremental learning. Compared to competitive methods that rely on external, often unlabeled, datasets of real images, our approach can generate synthetic samples that belong to the same classes as the previously encountered images. This allows us to use those additional data samples not only in the distillation loss but also for replay in supervised losses such as the classification loss. Experiments on the competitive benchmarks CIFAR100, ImageNet-Subset, and ImageNet demonstrate how this new approach can be used to further improve the performance of state-of-the-art methods for class-incremental learning on large scale datasets.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"161 ","pages":"Article 105600"},"PeriodicalIF":4.2,"publicationDate":"2025-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144291605","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信