HiProIBM：通过分层原型跨层判别和信息瓶颈子网掩码进行无监督持续学习

IF 3.5 2区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Applied Intelligence Pub Date : 2025-02-25 DOI:10.1007/s10489-025-06362-z

Ankit Malviya, Chandresh Kumar Maurya

{"title":"HiProIBM：通过分层原型跨层判别和信息瓶颈子网掩码进行无监督持续学习","authors":"Ankit Malviya, Chandresh Kumar Maurya","doi":"10.1007/s10489-025-06362-z","DOIUrl":null,"url":null,"abstract":"<p>Catastrophic Forgetting (CF) occurs when a machine learning model forgets the experience of previous tasks while learning new tasks due to inadequate retention mechanisms. Unsupervised continual learning (UCL) addresses this by enabling the model to adapt to new tasks using unlabeled data while retaining past knowledge. To mitigate CF in UCL, we use a parameter isolation technique to mask sub-networks dedicated to each task, thus preventing interference with previous tasks. However, relying solely on weight magnitude for constructing these sub-networks can result in the retention of irrelevant weights and the creation of redundant sub-networks. This approach also risks capacity saturation and information suppression for tasks encountered later in the sequence. To overcome this, we use masked sub-networks, inspired by the information bottleneck (IB) concept. It accumulates valuable information into essential weights to construct redundancy-free sub-networks which effectively mitigates CF and enables the new task training. The IB subnetwork masking faces challenges in balancing input compression with meaningful pattern preservation without labels. It risks overcompression and loss of crucial latent structures, which degrades model performance. We address this by learning multiple semantic hierarchies present in the data using unsupervised contrastive learning. However traditional contrastive learning techniques learn meaningful representations by contrasting similar and dissimilar data points. These approaches lack adequate representational power for modeling datasets with multiple semantic hierarchies. The inherent hierarchical semantic structures in datasets are necessary to integrate semantically related clusters into larger, coarser-grained clusters, but existing contrastive learning methods often overlook this and limit semantic understanding. We address this by constructing and updating hierarchical prototypes with cross-level group discrimination to represent semantic structures in the latent space. Our experiments on four standard datasets show performance improvements over SOTA baselines for varying task-sequences from 5 to 100, with nearly-zero forgetting.</p>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 6","pages":""},"PeriodicalIF":3.5000,"publicationDate":"2025-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"HiProIBM: unsupervised continual learning through hierarchical prototypical cross-level discrimination along with information bottleneck subnetwork masking\",\"authors\":\"Ankit Malviya, Chandresh Kumar Maurya\",\"doi\":\"10.1007/s10489-025-06362-z\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Catastrophic Forgetting (CF) occurs when a machine learning model forgets the experience of previous tasks while learning new tasks due to inadequate retention mechanisms. Unsupervised continual learning (UCL) addresses this by enabling the model to adapt to new tasks using unlabeled data while retaining past knowledge. To mitigate CF in UCL, we use a parameter isolation technique to mask sub-networks dedicated to each task, thus preventing interference with previous tasks. However, relying solely on weight magnitude for constructing these sub-networks can result in the retention of irrelevant weights and the creation of redundant sub-networks. This approach also risks capacity saturation and information suppression for tasks encountered later in the sequence. To overcome this, we use masked sub-networks, inspired by the information bottleneck (IB) concept. It accumulates valuable information into essential weights to construct redundancy-free sub-networks which effectively mitigates CF and enables the new task training. The IB subnetwork masking faces challenges in balancing input compression with meaningful pattern preservation without labels. It risks overcompression and loss of crucial latent structures, which degrades model performance. We address this by learning multiple semantic hierarchies present in the data using unsupervised contrastive learning. However traditional contrastive learning techniques learn meaningful representations by contrasting similar and dissimilar data points. These approaches lack adequate representational power for modeling datasets with multiple semantic hierarchies. The inherent hierarchical semantic structures in datasets are necessary to integrate semantically related clusters into larger, coarser-grained clusters, but existing contrastive learning methods often overlook this and limit semantic understanding. We address this by constructing and updating hierarchical prototypes with cross-level group discrimination to represent semantic structures in the latent space. Our experiments on four standard datasets show performance improvements over SOTA baselines for varying task-sequences from 5 to 100, with nearly-zero forgetting.</p>\",\"PeriodicalId\":8041,\"journal\":{\"name\":\"Applied Intelligence\",\"volume\":\"55 6\",\"pages\":\"\"},\"PeriodicalIF\":3.5000,\"publicationDate\":\"2025-02-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Applied Intelligence\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://link.springer.com/article/10.1007/s10489-025-06362-z\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Intelligence","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10489-025-06362-z","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

灾难性遗忘（CF）是指机器学习模型在学习新任务时，由于保留机制不足而遗忘了之前任务的经验。无监督持续学习（UCL）通过使模型在保留过去知识的同时使用无标记数据适应新任务，从而解决了这一问题。为了减轻 UCL 中的 CF，我们使用了参数隔离技术来屏蔽专用于每个任务的子网络，从而防止干扰以前的任务。然而，仅仅依靠权重大小来构建这些子网络可能会导致无关权重的保留和冗余子网络的创建。这种方法还存在容量饱和和信息抑制的风险，不利于后面的任务。为了克服这一问题，我们受信息瓶颈（IB）概念的启发，使用了屏蔽子网络。它将有价值的信息积累到基本权重中，构建出无冗余子网络，从而有效缓解 CF 问题，实现新任务训练。IB 子网络屏蔽在平衡输入压缩和无标签有意义模式保存方面面临挑战。它存在过度压缩和丢失关键潜在结构的风险，从而降低了模型性能。我们通过使用无监督对比学习来学习数据中存在的多个语义层次来解决这个问题。然而，传统的对比学习技术是通过对比相似和不相似的数据点来学习有意义的表征。这些方法缺乏足够的表征能力，无法对具有多种语义层次的数据集进行建模。数据集固有的分层语义结构是将语义相关的聚类整合到更大、更粗粒度的聚类中所必需的，但现有的对比学习方法往往忽略了这一点，从而限制了对语义的理解。为了解决这个问题，我们构建并更新了具有跨级群组辨别能力的分层原型，以表示潜空间中的语义结构。我们在四个标准数据集上进行的实验表明，与 SOTA 基线相比，在 5 到 100 个不同任务序列中的性能均有所提高，遗忘几乎为零。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

HiProIBM: unsupervised continual learning through hierarchical prototypical cross-level discrimination along with information bottleneck subnetwork masking

Catastrophic Forgetting (CF) occurs when a machine learning model forgets the experience of previous tasks while learning new tasks due to inadequate retention mechanisms. Unsupervised continual learning (UCL) addresses this by enabling the model to adapt to new tasks using unlabeled data while retaining past knowledge. To mitigate CF in UCL, we use a parameter isolation technique to mask sub-networks dedicated to each task, thus preventing interference with previous tasks. However, relying solely on weight magnitude for constructing these sub-networks can result in the retention of irrelevant weights and the creation of redundant sub-networks. This approach also risks capacity saturation and information suppression for tasks encountered later in the sequence. To overcome this, we use masked sub-networks, inspired by the information bottleneck (IB) concept. It accumulates valuable information into essential weights to construct redundancy-free sub-networks which effectively mitigates CF and enables the new task training. The IB subnetwork masking faces challenges in balancing input compression with meaningful pattern preservation without labels. It risks overcompression and loss of crucial latent structures, which degrades model performance. We address this by learning multiple semantic hierarchies present in the data using unsupervised contrastive learning. However traditional contrastive learning techniques learn meaningful representations by contrasting similar and dissimilar data points. These approaches lack adequate representational power for modeling datasets with multiple semantic hierarchies. The inherent hierarchical semantic structures in datasets are necessary to integrate semantically related clusters into larger, coarser-grained clusters, but existing contrastive learning methods often overlook this and limit semantic understanding. We address this by constructing and updating hierarchical prototypes with cross-level group discrimination to represent semantic structures in the latent space. Our experiments on four standard datasets show performance improvements over SOTA baselines for varying task-sequences from 5 to 100, with nearly-zero forgetting.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Applied Intelligence 工程技术-计算机：人工智能

CiteScore

6.60

自引率

20.80%

发文量

1361

审稿时长

5.9 months

期刊介绍： With a focus on research in artificial intelligence and neural networks, this journal addresses issues involving solutions of real-life manufacturing, defense, management, government and industrial problems which are too complex to be solved through conventional approaches and require the simulation of intelligent thought processes, heuristics, applications of knowledge, and distributed and parallel processing. The integration of these multiple approaches in solving complex problems is of particular importance. The journal presents new and original research and technological developments, addressing real and complex issues applicable to difficult problems. It provides a medium for exchanging scientific research and technological achievements accomplished by the international community.