Dataset Distillation for Super-Resolution Without Class Labels and Pre-Trained Models

IF 3.9 2区工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Signal Processing Letters Pub Date : 2025-09-18 DOI:10.1109/LSP.2025.3611694

Sunwoo Cho;Yejin Jung;Nam Ik Cho;Jae Woong Soh

{"title":"Dataset Distillation for Super-Resolution Without Class Labels and Pre-Trained Models","authors":"Sunwoo Cho;Yejin Jung;Nam Ik Cho;Jae Woong Soh","doi":"10.1109/LSP.2025.3611694","DOIUrl":null,"url":null,"abstract":"Training deep neural networks has become increasingly demanding, requiring large datasets and significant computational resources, especially as model complexity advances. Data distillation methods, which aim to improve data efficiency, have emerged as promising solutions to this challenge. In the field of single image super-resolution (SISR), the reliance on large training datasets highlights the importance of these techniques. Recently, a generative adversarial network (GAN) inversion-based data distillation framework for SR was proposed, showing potential for better data utilization. However, the current method depends heavily on pre-trained SR networks and class-specific information, limiting its generalizability and applicability. To address these issues, we introduce a new data distillation approach for image SR that does not need class labels or pre-trained SR models. In particular, we first extract high-gradient patches and categorize images based on CLIP features, then fine-tune a diffusion model on the selected patches to learn their distribution and synthesize distilled training images. Experimental results show that our method achieves state-of-the-art performance while using significantly less training data and requiring less computational time. Specifically, when we train a baseline Transformer model for SR with only 0.68% of the original dataset, the performance drop is just 0.3 dB. In this case, diffusion model fine-tuning takes 4 hours, and SR model training completes within 1 h, much shorter than the 11-hour training time with the full dataset.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"3700-3704"},"PeriodicalIF":3.9000,"publicationDate":"2025-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Signal Processing Letters","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/11170427/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

Abstract

Training deep neural networks has become increasingly demanding, requiring large datasets and significant computational resources, especially as model complexity advances. Data distillation methods, which aim to improve data efficiency, have emerged as promising solutions to this challenge. In the field of single image super-resolution (SISR), the reliance on large training datasets highlights the importance of these techniques. Recently, a generative adversarial network (GAN) inversion-based data distillation framework for SR was proposed, showing potential for better data utilization. However, the current method depends heavily on pre-trained SR networks and class-specific information, limiting its generalizability and applicability. To address these issues, we introduce a new data distillation approach for image SR that does not need class labels or pre-trained SR models. In particular, we first extract high-gradient patches and categorize images based on CLIP features, then fine-tune a diffusion model on the selected patches to learn their distribution and synthesize distilled training images. Experimental results show that our method achieves state-of-the-art performance while using significantly less training data and requiring less computational time. Specifically, when we train a baseline Transformer model for SR with only 0.68% of the original dataset, the performance drop is just 0.3 dB. In this case, diffusion model fine-tuning takes 4 hours, and SR model training completes within 1 h, much shorter than the 11-hour training time with the full dataset.

查看原文本刊更多论文

无类别标签和预训练模型的超分辨率数据集蒸馏

训练深度神经网络的要求越来越高，需要大量的数据集和大量的计算资源，特别是随着模型复杂性的提高。旨在提高数据效率的数据蒸馏方法已经成为应对这一挑战的有希望的解决方案。在单幅图像超分辨率（SISR）领域，对大型训练数据集的依赖凸显了这些技术的重要性。最近，一种基于生成对抗网络（GAN）反转的SR数据蒸馏框架被提出，显示出更好的数据利用潜力。然而，目前的方法严重依赖于预训练的SR网络和特定类别的信息，限制了其泛化和适用性。为了解决这些问题，我们为图像SR引入了一种新的数据蒸馏方法，该方法不需要类标签或预训练的SR模型。其中，我们首先提取高梯度的小块，并基于CLIP特征对图像进行分类，然后在选择的小块上微调扩散模型，学习它们的分布并合成蒸馏的训练图像。实验结果表明，我们的方法在使用更少的训练数据和更少的计算时间的同时达到了最先进的性能。具体来说，当我们仅使用原始数据集的0.68%来训练SR的基线Transformer模型时，性能下降仅为0.3 dB。在这种情况下，扩散模型的微调时间为4小时，SR模型的训练时间为1小时，大大缩短了完整数据集11小时的训练时间。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Signal Processing Letters 工程技术-工程：电子与电气

CiteScore

7.40

自引率

12.80%

发文量

339

审稿时长

2.8 months

期刊介绍： The IEEE Signal Processing Letters is a monthly, archival publication designed to provide rapid dissemination of original, cutting-edge ideas and timely, significant contributions in signal, image, speech, language and audio processing. Papers published in the Letters can be presented within one year of their appearance in signal processing conferences such as ICASSP, GlobalSIP and ICIP, and also in several workshop organized by the Signal Processing Society.