Cure数据集:音频事件分类的阶梯网络

2019 IEEE Pacific Rim Conference on Communications, Computers and Signal Processing (PACRIM) Pub Date : 2019-08-01 DOI:10.1109/PACRIM47961.2019.8985061

Harishchandra Dubey, Dimitra Emmanouilidou, I. Tashev

{"title":"Cure数据集:音频事件分类的阶梯网络","authors":"Harishchandra Dubey, Dimitra Emmanouilidou, I. Tashev","doi":"10.1109/PACRIM47961.2019.8985061","DOIUrl":null,"url":null,"abstract":"Audio event classification is an important task for several applications such as surveillance, audio, video and multimedia retrieval etc. There are approximately 340 million people with hearing loss who can’t perceive events happening around them. This paper establishes the CURE dataset which contains curated set of specific audio events most relevant for people with hearing loss. It is formatted as 5 sec sound recordings derived from the Freesound project. We propose a ladder network based audio event classifier. We adopted the state-of-the-art convolutional neural network (CNN) embeddings as audio features for this task. We start with signal and feature normalization that aims to reduce the mismatch between different recordings scenarios. Initially, a CNN is trained on weakly labeled Audioset data. Next, the pre-trained model is adopted as feature extractor for proposed CURE corpus. We also explore the performance of extreme learning machine (ELM) and use support vector machine (SVM) as baseline classifier. As a second evaluation set we incorporate ESC-50. Results and discussions validate the superiority of Ladder network over ELM and SVM classifier in terms of robustness and increased classification accuracy.","PeriodicalId":152556,"journal":{"name":"2019 IEEE Pacific Rim Conference on Communications, Computers and Signal Processing (PACRIM)","volume":"88 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Cure Dataset: Ladder Networks for Audio Event Classification\",\"authors\":\"Harishchandra Dubey, Dimitra Emmanouilidou, I. Tashev\",\"doi\":\"10.1109/PACRIM47961.2019.8985061\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Audio event classification is an important task for several applications such as surveillance, audio, video and multimedia retrieval etc. There are approximately 340 million people with hearing loss who can’t perceive events happening around them. This paper establishes the CURE dataset which contains curated set of specific audio events most relevant for people with hearing loss. It is formatted as 5 sec sound recordings derived from the Freesound project. We propose a ladder network based audio event classifier. We adopted the state-of-the-art convolutional neural network (CNN) embeddings as audio features for this task. We start with signal and feature normalization that aims to reduce the mismatch between different recordings scenarios. Initially, a CNN is trained on weakly labeled Audioset data. Next, the pre-trained model is adopted as feature extractor for proposed CURE corpus. We also explore the performance of extreme learning machine (ELM) and use support vector machine (SVM) as baseline classifier. As a second evaluation set we incorporate ESC-50. Results and discussions validate the superiority of Ladder network over ELM and SVM classifier in terms of robustness and increased classification accuracy.\",\"PeriodicalId\":152556,\"journal\":{\"name\":\"2019 IEEE Pacific Rim Conference on Communications, Computers and Signal Processing (PACRIM)\",\"volume\":\"88 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 IEEE Pacific Rim Conference on Communications, Computers and Signal Processing (PACRIM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/PACRIM47961.2019.8985061\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE Pacific Rim Conference on Communications, Computers and Signal Processing (PACRIM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PACRIM47961.2019.8985061","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

音频事件分类是监控、音频、视频和多媒体检索等应用的重要内容。大约有3.4亿人患有听力损失，他们无法感知周围发生的事情。本文建立了CURE数据集，其中包含与听力损失患者最相关的特定音频事件的策划集。它被格式化为来自Freesound项目的5秒录音。提出了一种基于阶梯网络的音频事件分类器。我们采用了最先进的卷积神经网络(CNN)嵌入作为该任务的音频特征。我们从信号和特征规范化开始，旨在减少不同录音场景之间的不匹配。最初，CNN是在弱标记Audioset数据上训练的。然后，采用预训练模型作为CURE语料库的特征提取器。我们还探讨了极限学习机(ELM)的性能，并使用支持向量机(SVM)作为基线分类器。作为第二个评估集，我们纳入了ESC-50。结果和讨论验证了梯形网络在鲁棒性和分类精度方面优于ELM和SVM分类器。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Cure Dataset: Ladder Networks for Audio Event Classification

Audio event classification is an important task for several applications such as surveillance, audio, video and multimedia retrieval etc. There are approximately 340 million people with hearing loss who can’t perceive events happening around them. This paper establishes the CURE dataset which contains curated set of specific audio events most relevant for people with hearing loss. It is formatted as 5 sec sound recordings derived from the Freesound project. We propose a ladder network based audio event classifier. We adopted the state-of-the-art convolutional neural network (CNN) embeddings as audio features for this task. We start with signal and feature normalization that aims to reduce the mismatch between different recordings scenarios. Initially, a CNN is trained on weakly labeled Audioset data. Next, the pre-trained model is adopted as feature extractor for proposed CURE corpus. We also explore the performance of extreme learning machine (ELM) and use support vector machine (SVM) as baseline classifier. As a second evaluation set we incorporate ESC-50. Results and discussions validate the superiority of Ladder network over ELM and SVM classifier in terms of robustness and increased classification accuracy.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2019 IEEE Pacific Rim Conference on Communications, Computers and Signal Processing (PACRIM)

自引率

0.00%

发文量