DDFAD: Dataset Distillation Framework for Audio Data

arXiv - CS - Sound Pub Date : 2024-07-15 DOI:arxiv-2407.10446

Wenbo Jiang, Rui Zhang, Hongwei Li, Xiaoyuan Liu, Haomiao Yang, Shui Yu

{"title":"DDFAD: Dataset Distillation Framework for Audio Data","authors":"Wenbo Jiang, Rui Zhang, Hongwei Li, Xiaoyuan Liu, Haomiao Yang, Shui Yu","doi":"arxiv-2407.10446","DOIUrl":null,"url":null,"abstract":"Deep neural networks (DNNs) have achieved significant success in numerous\napplications. The remarkable performance of DNNs is largely attributed to the\navailability of massive, high-quality training datasets. However, processing\nsuch massive training data requires huge computational and storage resources.\nDataset distillation is a promising solution to this problem, offering the\ncapability to compress a large dataset into a smaller distilled dataset. The\nmodel trained on the distilled dataset can achieve comparable performance to\nthe model trained on the whole dataset. While dataset distillation has been demonstrated in image data, none have\nexplored dataset distillation for audio data. In this work, for the first time,\nwe propose a Dataset Distillation Framework for Audio Data (DDFAD).\nSpecifically, we first propose the Fused Differential MFCC (FD-MFCC) as\nextracted features for audio data. After that, the FD-MFCC is distilled through\nthe matching training trajectory distillation method. Finally, we propose an\naudio signal reconstruction algorithm based on the Griffin-Lim Algorithm to\nreconstruct the audio signal from the distilled FD-MFCC. Extensive experiments\ndemonstrate the effectiveness of DDFAD on various audio datasets. In addition,\nwe show that DDFAD has promising application prospects in many applications,\nsuch as continual learning and neural architecture search.","PeriodicalId":501178,"journal":{"name":"arXiv - CS - Sound","volume":"25 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Sound","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2407.10446","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Deep neural networks (DNNs) have achieved significant success in numerous applications. The remarkable performance of DNNs is largely attributed to the availability of massive, high-quality training datasets. However, processing such massive training data requires huge computational and storage resources. Dataset distillation is a promising solution to this problem, offering the capability to compress a large dataset into a smaller distilled dataset. The model trained on the distilled dataset can achieve comparable performance to the model trained on the whole dataset. While dataset distillation has been demonstrated in image data, none have explored dataset distillation for audio data. In this work, for the first time, we propose a Dataset Distillation Framework for Audio Data (DDFAD). Specifically, we first propose the Fused Differential MFCC (FD-MFCC) as extracted features for audio data. After that, the FD-MFCC is distilled through the matching training trajectory distillation method. Finally, we propose an audio signal reconstruction algorithm based on the Griffin-Lim Algorithm to reconstruct the audio signal from the distilled FD-MFCC. Extensive experiments demonstrate the effectiveness of DDFAD on various audio datasets. In addition, we show that DDFAD has promising application prospects in many applications, such as continual learning and neural architecture search.

查看原文本刊更多论文

DDFAD：音频数据的数据集蒸馏框架

深度神经网络（DNN）在众多应用中取得了巨大成功。DNNs 的卓越性能在很大程度上归功于海量、高质量训练数据集的可用性。然而，处理如此海量的训练数据需要巨大的计算和存储资源。数据集蒸馏是解决这一问题的一个很有前途的方法，它能够将大型数据集压缩成较小的蒸馏数据集。在蒸馏数据集上训练的模型可以达到与在整个数据集上训练的模型相当的性能。虽然数据集蒸馏已在图像数据中得到证实，但还没有人探索过音频数据的数据集蒸馏。在这项工作中，我们首次提出了音频数据的数据集蒸馏框架（DDFAD）。具体来说，我们首先提出了融合差分 MFCC（FD-MFCC）作为音频数据的提取特征。然后，通过匹配训练轨迹蒸馏法对 FD-MFCC 进行蒸馏。最后，我们提出了一种基于 Griffin-Lim 算法的音频信号重建算法，从提炼出的 FD-MFCC 中重建音频信号。广泛的实验证明了 DDFAD 在各种音频数据集上的有效性。此外，我们还证明了 DDFAD 在持续学习和神经架构搜索等许多应用领域具有广阔的应用前景。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

arXiv - CS - Sound

自引率

0.00%

发文量