Continual Learning From a Stream of APIs

IEEE transactions on pattern analysis and machine intelligence Pub Date : 2024-09-20 DOI:10.1109/TPAMI.2024.3460871

Enneng Yang;Zhenyi Wang;Li Shen;Nan Yin;Tongliang Liu;Guibing Guo;Xingwei Wang;Dacheng Tao

{"title":"Continual Learning From a Stream of APIs","authors":"Enneng Yang;Zhenyi Wang;Li Shen;Nan Yin;Tongliang Liu;Guibing Guo;Xingwei Wang;Dacheng Tao","doi":"10.1109/TPAMI.2024.3460871","DOIUrl":null,"url":null,"abstract":"Continual learning (CL) aims to learn new tasks without forgetting previous tasks. However, existing CL methods require a large amount of raw data, which is often unavailable due to copyright considerations and privacy risks. Instead, stakeholders usually release pre-trained machine learning models as a service (MLaaS), which users can access via APIs. This paper considers two practical-yet-novel CL settings: data-efficient CL (DECL-APIs) and data-free CL (DFCL-APIs), which achieve CL from a stream of APIs with partial or no raw data. Performing CL under these two new settings faces several challenges: unavailable full raw data, unknown model parameters, heterogeneous models of arbitrary architecture and scale, and catastrophic forgetting of previous APIs. To overcome these issues, we propose a novel data-free cooperative continual distillation learning framework that distills knowledge from a stream of APIs into a CL model by generating pseudo data, just by querying APIs. Specifically, our framework includes two cooperative generators and one CL model, forming their training as an adversarial game. We first use the CL model and the current API as fixed discriminators to train generators via a derivative-free method. Generators adversarially generate hard and diverse synthetic data to maximize the response gap between the CL model and the API. Next, we train the CL model by minimizing the gap between the responses of the CL model and the black-box API on synthetic data, to transfer the API's knowledge to the CL model. Furthermore, we propose a new regularization term based on network similarity to prevent catastrophic forgetting of previous APIs. Our method performs comparably to classic CL with full raw data on the MNIST and SVHN datasets in the DFCL-APIs setting. In the DECL-APIs setting, our method achieves \n<inline-formula><tex-math>$0.97\\times$</tex-math></inline-formula>\n, \n<inline-formula><tex-math>$0.75\\times$</tex-math></inline-formula>\n and \n<inline-formula><tex-math>$0.69\\times$</tex-math></inline-formula>\n performance of classic CL on the more challenging CIFAR10, CIFAR100, and MiniImageNet, respectively.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"46 12","pages":"11432-11445"},"PeriodicalIF":0.0000,"publicationDate":"2024-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on pattern analysis and machine intelligence","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10684743/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Continual learning (CL) aims to learn new tasks without forgetting previous tasks. However, existing CL methods require a large amount of raw data, which is often unavailable due to copyright considerations and privacy risks. Instead, stakeholders usually release pre-trained machine learning models as a service (MLaaS), which users can access via APIs. This paper considers two practical-yet-novel CL settings: data-efficient CL (DECL-APIs) and data-free CL (DFCL-APIs), which achieve CL from a stream of APIs with partial or no raw data. Performing CL under these two new settings faces several challenges: unavailable full raw data, unknown model parameters, heterogeneous models of arbitrary architecture and scale, and catastrophic forgetting of previous APIs. To overcome these issues, we propose a novel data-free cooperative continual distillation learning framework that distills knowledge from a stream of APIs into a CL model by generating pseudo data, just by querying APIs. Specifically, our framework includes two cooperative generators and one CL model, forming their training as an adversarial game. We first use the CL model and the current API as fixed discriminators to train generators via a derivative-free method. Generators adversarially generate hard and diverse synthetic data to maximize the response gap between the CL model and the API. Next, we train the CL model by minimizing the gap between the responses of the CL model and the black-box API on synthetic data, to transfer the API's knowledge to the CL model. Furthermore, we propose a new regularization term based on network similarity to prevent catastrophic forgetting of previous APIs. Our method performs comparably to classic CL with full raw data on the MNIST and SVHN datasets in the DFCL-APIs setting. In the DECL-APIs setting, our method achieves

$0.97\times$

$0.75\times$

and

$0.69\times$

performance of classic CL on the more challenging CIFAR10, CIFAR100, and MiniImageNet, respectively.

查看原文本刊更多论文

从应用程序接口流中不断学习

持续学习（CL）旨在学习新任务的同时不遗忘以前的任务。然而，现有的持续学习方法需要大量原始数据，而出于版权和隐私风险的考虑，这些数据往往无法获得。相反，利益相关者通常会将预先训练好的机器学习模型作为服务（MLaaS）发布，用户可以通过 API 访问这些模型。本文考虑了两种实用但新颖的CL设置：数据高效CL（DECL-APIs）和无数据CL（DFCL-APIs），这两种设置可从API流中获取部分或无原始数据的CL。在这两种新环境下执行CL面临着一些挑战：无法获得完整的原始数据、未知的模型参数、任意架构和规模的异构模型，以及之前API的灾难性遗忘。为了克服这些问题，我们提出了一种新颖的无数据合作式持续蒸馏学习框架，该框架只需通过查询 API，就能生成伪数据，从而将知识从 API 流中蒸馏到 CL 模型中。具体来说，我们的框架包括两个合作生成器和一个CL模型，它们的训练形成了一个对抗博弈。我们首先使用 CL 模型和当前 API 作为固定判别器，通过无衍生方法训练生成器。生成器以对抗的方式生成艰难而多样的合成数据，以最大化 CL 模型与 API 之间的响应差距。接下来，我们通过最小化 CL 模型与黑盒 API 在合成数据上的响应差距来训练 CL 模型，从而将 API 的知识转移到 CL 模型中。此外，我们还基于网络相似性提出了一个新的正则化项，以防止对先前 API 的灾难性遗忘。在 DFCL-APIs 设置下，我们的方法在 MNIST 和 SVHN 数据集上的表现与使用完整原始数据的经典 CL 方法相当。在 DECL-APIs 环境中，我们的方法在更具挑战性的 CIFAR10、CIFAR100 和 MiniImageNet 数据集上的性能分别达到了经典 CL 的 0.97 美元/次、0.75 美元/次和 0.69 美元/次。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE transactions on pattern analysis and machine intelligence

自引率

0.00%

发文量