{"title":"Dataset condensation with coarse-to-fine regularization","authors":"Hyundong Jin, Eunwoo Kim","doi":"10.1016/j.patrec.2024.12.018","DOIUrl":null,"url":null,"abstract":"<div><div>State-of-the-art artificial intelligence models heavily rely on datasets with large numbers of samples, necessitating substantial memory allocation for data storage and high computational costs for model training. To alleviate storage and computational overheads, dataset condensation has recently gained attention. This approach encapsulates large samples into a more compact sample set while preserving the accuracy of a network trained on an entire sample set. Existing methods focus on aligning the output logits or network parameters trained on synthetic images with those of networks trained on real images. However, these approaches fail to encapsulate the diverse information because of their inability to account for relationships between synthetic images, leading to information redundancy between multiple synthetic images. To address these issues, we exploit the relationships among synthetic samples. This allows us to create diverse representations of synthetic images across distinct classes and to encourage diversity within the same class. We further promote diverse representations between synthetic image sub-regions. Experimental results with various datasets demonstrate that our method outperforms competitors by up to 12.2%. Moreover, the networks, which were not encountered during the condensation process, and were trained using our synthesized dataset, outperform other methods.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"188 ","pages":"Pages 178-184"},"PeriodicalIF":3.9000,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition Letters","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167865524003726","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
State-of-the-art artificial intelligence models heavily rely on datasets with large numbers of samples, necessitating substantial memory allocation for data storage and high computational costs for model training. To alleviate storage and computational overheads, dataset condensation has recently gained attention. This approach encapsulates large samples into a more compact sample set while preserving the accuracy of a network trained on an entire sample set. Existing methods focus on aligning the output logits or network parameters trained on synthetic images with those of networks trained on real images. However, these approaches fail to encapsulate the diverse information because of their inability to account for relationships between synthetic images, leading to information redundancy between multiple synthetic images. To address these issues, we exploit the relationships among synthetic samples. This allows us to create diverse representations of synthetic images across distinct classes and to encourage diversity within the same class. We further promote diverse representations between synthetic image sub-regions. Experimental results with various datasets demonstrate that our method outperforms competitors by up to 12.2%. Moreover, the networks, which were not encountered during the condensation process, and were trained using our synthesized dataset, outperform other methods.
期刊介绍:
Pattern Recognition Letters aims at rapid publication of concise articles of a broad interest in pattern recognition.
Subject areas include all the current fields of interest represented by the Technical Committees of the International Association of Pattern Recognition, and other developing themes involving learning and recognition.