Capturing Temporal Dependencies Through Future Prediction for CNN-Based Audio Classifiers

Hongwei Song, Jiqing Han, Shiwen Deng, Zhihao Du
{"title":"Capturing Temporal Dependencies Through Future Prediction for CNN-Based Audio Classifiers","authors":"Hongwei Song, Jiqing Han, Shiwen Deng, Zhihao Du","doi":"10.1109/ICASSP39728.2021.9414018","DOIUrl":null,"url":null,"abstract":"This paper focuses on the problem of temporal dependency modeling in the CNN-based models for audio classification tasks. To capture audio temporal dependencies using CNNs, we take a different approach from the purely architecture-induced method and explicitly encode temporal dependencies into the CNN-based audio classifiers. More specifically, in addition to the classification objective, we require the CNN model to solve an auxiliary task of predicting the future features, which is formulated by leveraging the Contrastive Predictive Coding (CPC) loss. Furthermore, a novel hierarchical CPC (HCPC) model is proposed for capturing multi-level temporal dependencies at the same time. The proposed model is evaluated on a wide range of non-speech audio signals, including musical and in-the-wild environmental audio signals. We show that the proposed approach improves the backbone CNNs consistently on all tested benchmark datasets and outperforms a DenseNet model trained from scratch.","PeriodicalId":347060,"journal":{"name":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICASSP39728.2021.9414018","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

This paper focuses on the problem of temporal dependency modeling in the CNN-based models for audio classification tasks. To capture audio temporal dependencies using CNNs, we take a different approach from the purely architecture-induced method and explicitly encode temporal dependencies into the CNN-based audio classifiers. More specifically, in addition to the classification objective, we require the CNN model to solve an auxiliary task of predicting the future features, which is formulated by leveraging the Contrastive Predictive Coding (CPC) loss. Furthermore, a novel hierarchical CPC (HCPC) model is proposed for capturing multi-level temporal dependencies at the same time. The proposed model is evaluated on a wide range of non-speech audio signals, including musical and in-the-wild environmental audio signals. We show that the proposed approach improves the backbone CNNs consistently on all tested benchmark datasets and outperforms a DenseNet model trained from scratch.
基于cnn的音频分类器的未来预测捕获时间依赖关系
本文主要研究基于cnn的音频分类任务模型中的时间依赖建模问题。为了使用cnn捕获音频的时间依赖关系,我们采用了一种与纯粹的架构诱导方法不同的方法,将时间依赖关系显式编码到基于cnn的音频分类器中。更具体地说,除了分类目标之外,我们还要求CNN模型解决预测未来特征的辅助任务,该任务通过利用对比预测编码(CPC)损失来制定。在此基础上,提出了一种新的分层CPC (HCPC)模型,用于同时捕获多层次的时间依赖性。该模型在广泛的非语音音频信号上进行了评估,包括音乐和野外环境音频信号。我们表明,所提出的方法在所有测试的基准数据集上一致地改进了骨干cnn,并且优于从头开始训练的DenseNet模型。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
文献相关原料
公司名称 产品信息 采购帮参考价格
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信