Multi-View Fusion Graph Attention Network for Multilabel Class Incremental Learning

IF 14.7 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Information Fusion Pub Date : 2025-05-22 DOI:10.1016/j.inffus.2025.103309

Anhui Tan , Yu Wang , Wei-Zhi Wu , Weiping Ding , Jiye Liang

{"title":"Multi-View Fusion Graph Attention Network for Multilabel Class Incremental Learning","authors":"Anhui Tan , Yu Wang , Wei-Zhi Wu , Weiping Ding , Jiye Liang","doi":"10.1016/j.inffus.2025.103309","DOIUrl":null,"url":null,"abstract":"<div><div>Multilabel Class-Incremental Learning (MLCIL) refers to a variant of class-incremental learning and multilabel learning where models are required to learn from images or data associated with multiple labels, and new sets of classes are introduced incrementally. However, most existing MLCIL methods tend to rely heavily on limited single-view features, which makes it challenging for them to effectively capture class-specific characteristics and the correlations between different labels. Furthermore, MLCIL faces difficulties related to both intra-class and inter-class imbalances, which arise from the varying frequencies of class occurrences during each incremental session. To address these issues, we propose a novel MLCIL model called the Multi-View Fusion Graph Attention Network (MVGAT). First, the MVGAT architecture includes a multi-view feature extraction module that fuses class node features from three different perspectives of images, effectively capturing both local and global class-specific information. Second, MVGAT introduces a multi-view attention fusion module that combines the multi-view class node features based on label correlations. Importantly, the attention fusion modules trained in previous learning sessions are preserved, helping to mitigate catastrophic forgetting by providing independent probability predictions for their respective learned classes. Additionally, MVGAT is equipped with a pseudo-label correction module to enhance the accuracy of pseudo-labels by integrating predictions from the current session with those from historical frozen attention fusion modules. Moreover, an asymmetric loss function has been developed to balance intra-class and inter-class performance by dynamically adjusting negative focus parameters based on class occurrence frequency. Finally, experimental results on benchmark datasets demonstrate that MVGAT outperforms existing state-of-the-art methods.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"123 ","pages":"Article 103309"},"PeriodicalIF":14.7000,"publicationDate":"2025-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Fusion","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1566253525003823","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Multilabel Class-Incremental Learning (MLCIL) refers to a variant of class-incremental learning and multilabel learning where models are required to learn from images or data associated with multiple labels, and new sets of classes are introduced incrementally. However, most existing MLCIL methods tend to rely heavily on limited single-view features, which makes it challenging for them to effectively capture class-specific characteristics and the correlations between different labels. Furthermore, MLCIL faces difficulties related to both intra-class and inter-class imbalances, which arise from the varying frequencies of class occurrences during each incremental session. To address these issues, we propose a novel MLCIL model called the Multi-View Fusion Graph Attention Network (MVGAT). First, the MVGAT architecture includes a multi-view feature extraction module that fuses class node features from three different perspectives of images, effectively capturing both local and global class-specific information. Second, MVGAT introduces a multi-view attention fusion module that combines the multi-view class node features based on label correlations. Importantly, the attention fusion modules trained in previous learning sessions are preserved, helping to mitigate catastrophic forgetting by providing independent probability predictions for their respective learned classes. Additionally, MVGAT is equipped with a pseudo-label correction module to enhance the accuracy of pseudo-labels by integrating predictions from the current session with those from historical frozen attention fusion modules. Moreover, an asymmetric loss function has been developed to balance intra-class and inter-class performance by dynamically adjusting negative focus parameters based on class occurrence frequency. Finally, experimental results on benchmark datasets demonstrate that MVGAT outperforms existing state-of-the-art methods.

查看原文本刊更多论文

多标签类增量学习的多视图融合图注意网络

多标签类增量学习（Multilabel Class-Incremental Learning， MLCIL）是类增量学习和多标签学习的一种变体，要求模型从与多个标签相关的图像或数据中学习，并逐步引入新的类集。然而，大多数现有的MLCIL方法往往严重依赖于有限的单视图特征，这使得它们很难有效地捕获特定于类的特征和不同标签之间的相关性。此外，MLCIL面临着与班级内和班级间不平衡有关的困难，这是由于每次增量课程中班级出现的频率不同而引起的。为了解决这些问题，我们提出了一种新的MLCIL模型，称为多视图融合图注意网络（MVGAT）。首先，MVGAT体系结构包括一个多视图特征提取模块，该模块融合了来自三个不同视角的图像的类节点特征，有效地捕获了局部和全局的类特定信息。其次，MVGAT引入了多视图注意力融合模块，该模块结合了基于标签相关性的多视图类节点特征。重要的是，在之前的学习过程中训练的注意力融合模块被保留下来，通过为各自的学习类提供独立的概率预测，帮助减轻灾难性遗忘。此外，MVGAT还配备了伪标签校正模块，通过将当前会话的预测与历史冻结注意力融合模块的预测相结合，提高伪标签的准确性。此外，还建立了一个非对称损失函数，通过根据类的出现频率动态调整负焦点参数来平衡类内和类间的性能。最后，在基准数据集上的实验结果表明，MVGAT优于现有的最先进的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Information Fusion 工程技术-计算机：理论方法

CiteScore

33.20

自引率

4.30%

发文量

161

审稿时长

7.9 months

期刊介绍： Information Fusion serves as a central platform for showcasing advancements in multi-sensor, multi-source, multi-process information fusion, fostering collaboration among diverse disciplines driving its progress. It is the leading outlet for sharing research and development in this field, focusing on architectures, algorithms, and applications. Papers dealing with fundamental theoretical analyses as well as those demonstrating their application to real-world problems will be welcome.