{"title":"DDFAD:音频数据的数据集蒸馏框架","authors":"Wenbo Jiang, Rui Zhang, Hongwei Li, Xiaoyuan Liu, Haomiao Yang, Shui Yu","doi":"arxiv-2407.10446","DOIUrl":null,"url":null,"abstract":"Deep neural networks (DNNs) have achieved significant success in numerous\napplications. The remarkable performance of DNNs is largely attributed to the\navailability of massive, high-quality training datasets. However, processing\nsuch massive training data requires huge computational and storage resources.\nDataset distillation is a promising solution to this problem, offering the\ncapability to compress a large dataset into a smaller distilled dataset. The\nmodel trained on the distilled dataset can achieve comparable performance to\nthe model trained on the whole dataset. While dataset distillation has been demonstrated in image data, none have\nexplored dataset distillation for audio data. In this work, for the first time,\nwe propose a Dataset Distillation Framework for Audio Data (DDFAD).\nSpecifically, we first propose the Fused Differential MFCC (FD-MFCC) as\nextracted features for audio data. After that, the FD-MFCC is distilled through\nthe matching training trajectory distillation method. Finally, we propose an\naudio signal reconstruction algorithm based on the Griffin-Lim Algorithm to\nreconstruct the audio signal from the distilled FD-MFCC. Extensive experiments\ndemonstrate the effectiveness of DDFAD on various audio datasets. In addition,\nwe show that DDFAD has promising application prospects in many applications,\nsuch as continual learning and neural architecture search.","PeriodicalId":501178,"journal":{"name":"arXiv - CS - Sound","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"DDFAD: Dataset Distillation Framework for Audio Data\",\"authors\":\"Wenbo Jiang, Rui Zhang, Hongwei Li, Xiaoyuan Liu, Haomiao Yang, Shui Yu\",\"doi\":\"arxiv-2407.10446\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Deep neural networks (DNNs) have achieved significant success in numerous\\napplications. The remarkable performance of DNNs is largely attributed to the\\navailability of massive, high-quality training datasets. However, processing\\nsuch massive training data requires huge computational and storage resources.\\nDataset distillation is a promising solution to this problem, offering the\\ncapability to compress a large dataset into a smaller distilled dataset. The\\nmodel trained on the distilled dataset can achieve comparable performance to\\nthe model trained on the whole dataset. While dataset distillation has been demonstrated in image data, none have\\nexplored dataset distillation for audio data. In this work, for the first time,\\nwe propose a Dataset Distillation Framework for Audio Data (DDFAD).\\nSpecifically, we first propose the Fused Differential MFCC (FD-MFCC) as\\nextracted features for audio data. After that, the FD-MFCC is distilled through\\nthe matching training trajectory distillation method. Finally, we propose an\\naudio signal reconstruction algorithm based on the Griffin-Lim Algorithm to\\nreconstruct the audio signal from the distilled FD-MFCC. Extensive experiments\\ndemonstrate the effectiveness of DDFAD on various audio datasets. In addition,\\nwe show that DDFAD has promising application prospects in many applications,\\nsuch as continual learning and neural architecture search.\",\"PeriodicalId\":501178,\"journal\":{\"name\":\"arXiv - CS - Sound\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-07-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Sound\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2407.10446\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Sound","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2407.10446","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
DDFAD: Dataset Distillation Framework for Audio Data
Deep neural networks (DNNs) have achieved significant success in numerous
applications. The remarkable performance of DNNs is largely attributed to the
availability of massive, high-quality training datasets. However, processing
such massive training data requires huge computational and storage resources.
Dataset distillation is a promising solution to this problem, offering the
capability to compress a large dataset into a smaller distilled dataset. The
model trained on the distilled dataset can achieve comparable performance to
the model trained on the whole dataset. While dataset distillation has been demonstrated in image data, none have
explored dataset distillation for audio data. In this work, for the first time,
we propose a Dataset Distillation Framework for Audio Data (DDFAD).
Specifically, we first propose the Fused Differential MFCC (FD-MFCC) as
extracted features for audio data. After that, the FD-MFCC is distilled through
the matching training trajectory distillation method. Finally, we propose an
audio signal reconstruction algorithm based on the Griffin-Lim Algorithm to
reconstruct the audio signal from the distilled FD-MFCC. Extensive experiments
demonstrate the effectiveness of DDFAD on various audio datasets. In addition,
we show that DDFAD has promising application prospects in many applications,
such as continual learning and neural architecture search.