Capturing Temporal Dependencies Through Future Prediction for CNN-Based Audio Classifiers

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2021-06-06 DOI:10.1109/ICASSP39728.2021.9414018

Hongwei Song, Jiqing Han, Shiwen Deng, Zhihao Du

{"title":"Capturing Temporal Dependencies Through Future Prediction for CNN-Based Audio Classifiers","authors":"Hongwei Song, Jiqing Han, Shiwen Deng, Zhihao Du","doi":"10.1109/ICASSP39728.2021.9414018","DOIUrl":null,"url":null,"abstract":"This paper focuses on the problem of temporal dependency modeling in the CNN-based models for audio classification tasks. To capture audio temporal dependencies using CNNs, we take a different approach from the purely architecture-induced method and explicitly encode temporal dependencies into the CNN-based audio classifiers. More specifically, in addition to the classification objective, we require the CNN model to solve an auxiliary task of predicting the future features, which is formulated by leveraging the Contrastive Predictive Coding (CPC) loss. Furthermore, a novel hierarchical CPC (HCPC) model is proposed for capturing multi-level temporal dependencies at the same time. The proposed model is evaluated on a wide range of non-speech audio signals, including musical and in-the-wild environmental audio signals. We show that the proposed approach improves the backbone CNNs consistently on all tested benchmark datasets and outperforms a DenseNet model trained from scratch.","PeriodicalId":347060,"journal":{"name":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"296 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICASSP39728.2021.9414018","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

This paper focuses on the problem of temporal dependency modeling in the CNN-based models for audio classification tasks. To capture audio temporal dependencies using CNNs, we take a different approach from the purely architecture-induced method and explicitly encode temporal dependencies into the CNN-based audio classifiers. More specifically, in addition to the classification objective, we require the CNN model to solve an auxiliary task of predicting the future features, which is formulated by leveraging the Contrastive Predictive Coding (CPC) loss. Furthermore, a novel hierarchical CPC (HCPC) model is proposed for capturing multi-level temporal dependencies at the same time. The proposed model is evaluated on a wide range of non-speech audio signals, including musical and in-the-wild environmental audio signals. We show that the proposed approach improves the backbone CNNs consistently on all tested benchmark datasets and outperforms a DenseNet model trained from scratch.

查看原文本刊更多论文

基于cnn的音频分类器的未来预测捕获时间依赖关系

本文主要研究基于cnn的音频分类任务模型中的时间依赖建模问题。为了使用cnn捕获音频的时间依赖关系，我们采用了一种与纯粹的架构诱导方法不同的方法，将时间依赖关系显式编码到基于cnn的音频分类器中。更具体地说，除了分类目标之外，我们还要求CNN模型解决预测未来特征的辅助任务，该任务通过利用对比预测编码(CPC)损失来制定。在此基础上，提出了一种新的分层CPC (HCPC)模型，用于同时捕获多层次的时间依赖性。该模型在广泛的非语音音频信号上进行了评估，包括音乐和野外环境音频信号。我们表明，所提出的方法在所有测试的基准数据集上一致地改进了骨干cnn，并且优于从头开始训练的DenseNet模型。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

自引率

0.00%

发文量