{"title":"Online Continual Learning Benefits From Large Number of Task Splits","authors":"Shilin Zhang;Chenlin Yi","doi":"10.1109/TAI.2024.3405404","DOIUrl":null,"url":null,"abstract":"This work tackles the significant challenges inherent in online continual learning (OCL), a domain characterized by its handling of numerous tasks over extended periods. OCL is designed to adapt evolving data distributions and previously unseen classes through a single-pass analysis of a data stream, mirroring the dynamic nature of real-world applications. Despite its promising potential, existing OCL methodologies often suffer from catastrophic forgetting (CF) when confronted with a large array of tasks, compounded by substantial computational demands that limit their practical utility. At the heart of our proposed solution is the adoption of a kernel density estimation (KDE) learning framework, aimed at resolving the task prediction (TP) dilemma and ensuring the separability of all tasks. This is achieved through the incorporation of a linear projection head and a probability density function (PDF) for each task, while a shared backbone is maintained across tasks to provide raw feature representation. During the inference phase, we leverage an ensemble of PDFs, which utilizes a self-reporting mechanism based on maximum PDF values to identify the most appropriate model for classifying incoming instances. This strategy ensures that samples with identical labels are cohesively grouped within higher density PDF regions, effectively segregating dissimilar instances across the feature space of different tasks. Extensive experimental validation across diverse OCL datasets has underscored our framework's efficacy, showcasing remarkable performance enhancements and significant gains over existing methodologies, all achieved with minimal time-space overhead. Our approach introduces a scalable and efficient paradigm for OCL, addressing both the challenge of CF and computational efficiency, thereby extending the applicability of OCL to more realistic and demanding scenarios.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"5 11","pages":"5746-5759"},"PeriodicalIF":0.0000,"publicationDate":"2024-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on artificial intelligence","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10539923/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
This work tackles the significant challenges inherent in online continual learning (OCL), a domain characterized by its handling of numerous tasks over extended periods. OCL is designed to adapt evolving data distributions and previously unseen classes through a single-pass analysis of a data stream, mirroring the dynamic nature of real-world applications. Despite its promising potential, existing OCL methodologies often suffer from catastrophic forgetting (CF) when confronted with a large array of tasks, compounded by substantial computational demands that limit their practical utility. At the heart of our proposed solution is the adoption of a kernel density estimation (KDE) learning framework, aimed at resolving the task prediction (TP) dilemma and ensuring the separability of all tasks. This is achieved through the incorporation of a linear projection head and a probability density function (PDF) for each task, while a shared backbone is maintained across tasks to provide raw feature representation. During the inference phase, we leverage an ensemble of PDFs, which utilizes a self-reporting mechanism based on maximum PDF values to identify the most appropriate model for classifying incoming instances. This strategy ensures that samples with identical labels are cohesively grouped within higher density PDF regions, effectively segregating dissimilar instances across the feature space of different tasks. Extensive experimental validation across diverse OCL datasets has underscored our framework's efficacy, showcasing remarkable performance enhancements and significant gains over existing methodologies, all achieved with minimal time-space overhead. Our approach introduces a scalable and efficient paradigm for OCL, addressing both the challenge of CF and computational efficiency, thereby extending the applicability of OCL to more realistic and demanding scenarios.