课程数据集蒸馏

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society Pub Date : 2025-07-02 DOI:10.1109/TIP.2025.3579228

Zhiheng Ma;Anjia Cao;Funing Yang;Yihong Gong;Xing Wei

{"title":"课程数据集蒸馏","authors":"Zhiheng Ma;Anjia Cao;Funing Yang;Yihong Gong;Xing Wei","doi":"10.1109/TIP.2025.3579228","DOIUrl":null,"url":null,"abstract":"Most dataset distillation methods struggle to accommodate large-scale datasets due to their substantial computational and memory requirements. Recent research has begun to explore scalable disentanglement methods. However, there are still performance bottlenecks and room for optimization in this direction. In this paper, we present a curriculum-based dataset distillation framework aiming to harmonize performance and scalability. This framework strategically distills synthetic images, adhering to a curriculum that transitions from simple to complex. By incorporating curriculum evaluation, we address the issue of previous methods generating images that tend to be homogeneous and simplistic, doing so at a manageable computational cost. Furthermore, we introduce adversarial optimization towards synthetic images to further improve their representativeness and safeguard against their overfitting to the neural network involved in distilling. This enhances the generalization capability of the distilled images across various neural network architectures and also increases their robustness to noise. Extensive experiments demonstrate that our framework sets new benchmarks in large-scale dataset distillation, achieving substantial improvements of 11.1% on Tiny-ImageNet, 9.0% on ImageNet-1K, and 7.3% on ImageNet-21K. Our distilled datasets and code are available at <uri>https://github.com/MIV-XJTU/CUDD</uri>","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"4176-4187"},"PeriodicalIF":0.0000,"publicationDate":"2025-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Curriculum Dataset Distillation\",\"authors\":\"Zhiheng Ma;Anjia Cao;Funing Yang;Yihong Gong;Xing Wei\",\"doi\":\"10.1109/TIP.2025.3579228\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Most dataset distillation methods struggle to accommodate large-scale datasets due to their substantial computational and memory requirements. Recent research has begun to explore scalable disentanglement methods. However, there are still performance bottlenecks and room for optimization in this direction. In this paper, we present a curriculum-based dataset distillation framework aiming to harmonize performance and scalability. This framework strategically distills synthetic images, adhering to a curriculum that transitions from simple to complex. By incorporating curriculum evaluation, we address the issue of previous methods generating images that tend to be homogeneous and simplistic, doing so at a manageable computational cost. Furthermore, we introduce adversarial optimization towards synthetic images to further improve their representativeness and safeguard against their overfitting to the neural network involved in distilling. This enhances the generalization capability of the distilled images across various neural network architectures and also increases their robustness to noise. Extensive experiments demonstrate that our framework sets new benchmarks in large-scale dataset distillation, achieving substantial improvements of 11.1% on Tiny-ImageNet, 9.0% on ImageNet-1K, and 7.3% on ImageNet-21K. Our distilled datasets and code are available at <uri>https://github.com/MIV-XJTU/CUDD</uri>\",\"PeriodicalId\":94032,\"journal\":{\"name\":\"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society\",\"volume\":\"34 \",\"pages\":\"4176-4187\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-07-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/11063689/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/11063689/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

由于大量的计算和内存需求，大多数数据集蒸馏方法难以适应大规模数据集。最近的研究已经开始探索可扩展的解纠缠方法。但是，在这个方向上仍然存在性能瓶颈和优化空间。在本文中，我们提出了一个基于课程的数据集蒸馏框架，旨在协调性能和可扩展性。这个框架战略性地提炼了合成图像，坚持从简单到复杂过渡的课程。通过整合课程评估，我们解决了以前生成图像的方法往往是同质和简单的问题，这样做是在一个可管理的计算成本。此外，我们引入了对合成图像的对抗优化，以进一步提高它们的代表性，并防止它们对涉及蒸馏的神经网络的过拟合。这提高了提取图像在不同神经网络结构下的泛化能力，同时也提高了其对噪声的鲁棒性。大量的实验表明，我们的框架为大规模数据集蒸馏设定了新的基准，在Tiny-ImageNet上实现了11.1%的大幅改进，在ImageNet-1K上实现了9.0%的大幅改进，在ImageNet-21K上实现了7.3%的大幅改进。我们提炼的数据集和代码可在https://github.com/MIV-XJTU/CUDD上获得

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Curriculum Dataset Distillation

Most dataset distillation methods struggle to accommodate large-scale datasets due to their substantial computational and memory requirements. Recent research has begun to explore scalable disentanglement methods. However, there are still performance bottlenecks and room for optimization in this direction. In this paper, we present a curriculum-based dataset distillation framework aiming to harmonize performance and scalability. This framework strategically distills synthetic images, adhering to a curriculum that transitions from simple to complex. By incorporating curriculum evaluation, we address the issue of previous methods generating images that tend to be homogeneous and simplistic, doing so at a manageable computational cost. Furthermore, we introduce adversarial optimization towards synthetic images to further improve their representativeness and safeguard against their overfitting to the neural network involved in distilling. This enhances the generalization capability of the distilled images across various neural network architectures and also increases their robustness to noise. Extensive experiments demonstrate that our framework sets new benchmarks in large-scale dataset distillation, achieving substantial improvements of 11.1% on Tiny-ImageNet, 9.0% on ImageNet-1K, and 7.3% on ImageNet-21K. Our distilled datasets and code are available at https://github.com/MIV-XJTU/CUDD

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society

自引率

0.00%

发文量