DDFAD:音频数据的数据集蒸馏框架

Wenbo Jiang, Rui Zhang, Hongwei Li, Xiaoyuan Liu, Haomiao Yang, Shui Yu
{"title":"DDFAD:音频数据的数据集蒸馏框架","authors":"Wenbo Jiang, Rui Zhang, Hongwei Li, Xiaoyuan Liu, Haomiao Yang, Shui Yu","doi":"arxiv-2407.10446","DOIUrl":null,"url":null,"abstract":"Deep neural networks (DNNs) have achieved significant success in numerous\napplications. The remarkable performance of DNNs is largely attributed to the\navailability of massive, high-quality training datasets. However, processing\nsuch massive training data requires huge computational and storage resources.\nDataset distillation is a promising solution to this problem, offering the\ncapability to compress a large dataset into a smaller distilled dataset. The\nmodel trained on the distilled dataset can achieve comparable performance to\nthe model trained on the whole dataset. While dataset distillation has been demonstrated in image data, none have\nexplored dataset distillation for audio data. In this work, for the first time,\nwe propose a Dataset Distillation Framework for Audio Data (DDFAD).\nSpecifically, we first propose the Fused Differential MFCC (FD-MFCC) as\nextracted features for audio data. After that, the FD-MFCC is distilled through\nthe matching training trajectory distillation method. Finally, we propose an\naudio signal reconstruction algorithm based on the Griffin-Lim Algorithm to\nreconstruct the audio signal from the distilled FD-MFCC. Extensive experiments\ndemonstrate the effectiveness of DDFAD on various audio datasets. In addition,\nwe show that DDFAD has promising application prospects in many applications,\nsuch as continual learning and neural architecture search.","PeriodicalId":501178,"journal":{"name":"arXiv - CS - Sound","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"DDFAD: Dataset Distillation Framework for Audio Data\",\"authors\":\"Wenbo Jiang, Rui Zhang, Hongwei Li, Xiaoyuan Liu, Haomiao Yang, Shui Yu\",\"doi\":\"arxiv-2407.10446\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Deep neural networks (DNNs) have achieved significant success in numerous\\napplications. The remarkable performance of DNNs is largely attributed to the\\navailability of massive, high-quality training datasets. However, processing\\nsuch massive training data requires huge computational and storage resources.\\nDataset distillation is a promising solution to this problem, offering the\\ncapability to compress a large dataset into a smaller distilled dataset. The\\nmodel trained on the distilled dataset can achieve comparable performance to\\nthe model trained on the whole dataset. While dataset distillation has been demonstrated in image data, none have\\nexplored dataset distillation for audio data. In this work, for the first time,\\nwe propose a Dataset Distillation Framework for Audio Data (DDFAD).\\nSpecifically, we first propose the Fused Differential MFCC (FD-MFCC) as\\nextracted features for audio data. After that, the FD-MFCC is distilled through\\nthe matching training trajectory distillation method. Finally, we propose an\\naudio signal reconstruction algorithm based on the Griffin-Lim Algorithm to\\nreconstruct the audio signal from the distilled FD-MFCC. Extensive experiments\\ndemonstrate the effectiveness of DDFAD on various audio datasets. In addition,\\nwe show that DDFAD has promising application prospects in many applications,\\nsuch as continual learning and neural architecture search.\",\"PeriodicalId\":501178,\"journal\":{\"name\":\"arXiv - CS - Sound\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-07-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Sound\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2407.10446\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Sound","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2407.10446","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

深度神经网络(DNN)在众多应用中取得了巨大成功。DNNs 的卓越性能在很大程度上归功于海量、高质量训练数据集的可用性。然而,处理如此海量的训练数据需要巨大的计算和存储资源。数据集蒸馏是解决这一问题的一个很有前途的方法,它能够将大型数据集压缩成较小的蒸馏数据集。在蒸馏数据集上训练的模型可以达到与在整个数据集上训练的模型相当的性能。虽然数据集蒸馏已在图像数据中得到证实,但还没有人探索过音频数据的数据集蒸馏。在这项工作中,我们首次提出了音频数据的数据集蒸馏框架(DDFAD)。具体来说,我们首先提出了融合差分 MFCC(FD-MFCC)作为音频数据的提取特征。然后,通过匹配训练轨迹蒸馏法对 FD-MFCC 进行蒸馏。最后,我们提出了一种基于 Griffin-Lim 算法的音频信号重建算法,从提炼出的 FD-MFCC 中重建音频信号。广泛的实验证明了 DDFAD 在各种音频数据集上的有效性。此外,我们还证明了 DDFAD 在持续学习和神经架构搜索等许多应用领域具有广阔的应用前景。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
DDFAD: Dataset Distillation Framework for Audio Data
Deep neural networks (DNNs) have achieved significant success in numerous applications. The remarkable performance of DNNs is largely attributed to the availability of massive, high-quality training datasets. However, processing such massive training data requires huge computational and storage resources. Dataset distillation is a promising solution to this problem, offering the capability to compress a large dataset into a smaller distilled dataset. The model trained on the distilled dataset can achieve comparable performance to the model trained on the whole dataset. While dataset distillation has been demonstrated in image data, none have explored dataset distillation for audio data. In this work, for the first time, we propose a Dataset Distillation Framework for Audio Data (DDFAD). Specifically, we first propose the Fused Differential MFCC (FD-MFCC) as extracted features for audio data. After that, the FD-MFCC is distilled through the matching training trajectory distillation method. Finally, we propose an audio signal reconstruction algorithm based on the Griffin-Lim Algorithm to reconstruct the audio signal from the distilled FD-MFCC. Extensive experiments demonstrate the effectiveness of DDFAD on various audio datasets. In addition, we show that DDFAD has promising application prospects in many applications, such as continual learning and neural architecture search.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信