Cure数据集:音频事件分类的阶梯网络

Harishchandra Dubey, Dimitra Emmanouilidou, I. Tashev
{"title":"Cure数据集:音频事件分类的阶梯网络","authors":"Harishchandra Dubey, Dimitra Emmanouilidou, I. Tashev","doi":"10.1109/PACRIM47961.2019.8985061","DOIUrl":null,"url":null,"abstract":"Audio event classification is an important task for several applications such as surveillance, audio, video and multimedia retrieval etc. There are approximately 340 million people with hearing loss who can’t perceive events happening around them. This paper establishes the CURE dataset which contains curated set of specific audio events most relevant for people with hearing loss. It is formatted as 5 sec sound recordings derived from the Freesound project. We propose a ladder network based audio event classifier. We adopted the state-of-the-art convolutional neural network (CNN) embeddings as audio features for this task. We start with signal and feature normalization that aims to reduce the mismatch between different recordings scenarios. Initially, a CNN is trained on weakly labeled Audioset data. Next, the pre-trained model is adopted as feature extractor for proposed CURE corpus. We also explore the performance of extreme learning machine (ELM) and use support vector machine (SVM) as baseline classifier. As a second evaluation set we incorporate ESC-50. Results and discussions validate the superiority of Ladder network over ELM and SVM classifier in terms of robustness and increased classification accuracy.","PeriodicalId":152556,"journal":{"name":"2019 IEEE Pacific Rim Conference on Communications, Computers and Signal Processing (PACRIM)","volume":"88 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Cure Dataset: Ladder Networks for Audio Event Classification\",\"authors\":\"Harishchandra Dubey, Dimitra Emmanouilidou, I. Tashev\",\"doi\":\"10.1109/PACRIM47961.2019.8985061\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Audio event classification is an important task for several applications such as surveillance, audio, video and multimedia retrieval etc. There are approximately 340 million people with hearing loss who can’t perceive events happening around them. This paper establishes the CURE dataset which contains curated set of specific audio events most relevant for people with hearing loss. It is formatted as 5 sec sound recordings derived from the Freesound project. We propose a ladder network based audio event classifier. We adopted the state-of-the-art convolutional neural network (CNN) embeddings as audio features for this task. We start with signal and feature normalization that aims to reduce the mismatch between different recordings scenarios. Initially, a CNN is trained on weakly labeled Audioset data. Next, the pre-trained model is adopted as feature extractor for proposed CURE corpus. We also explore the performance of extreme learning machine (ELM) and use support vector machine (SVM) as baseline classifier. As a second evaluation set we incorporate ESC-50. Results and discussions validate the superiority of Ladder network over ELM and SVM classifier in terms of robustness and increased classification accuracy.\",\"PeriodicalId\":152556,\"journal\":{\"name\":\"2019 IEEE Pacific Rim Conference on Communications, Computers and Signal Processing (PACRIM)\",\"volume\":\"88 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 IEEE Pacific Rim Conference on Communications, Computers and Signal Processing (PACRIM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/PACRIM47961.2019.8985061\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE Pacific Rim Conference on Communications, Computers and Signal Processing (PACRIM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PACRIM47961.2019.8985061","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

摘要

音频事件分类是监控、音频、视频和多媒体检索等应用的重要内容。大约有3.4亿人患有听力损失,他们无法感知周围发生的事情。本文建立了CURE数据集,其中包含与听力损失患者最相关的特定音频事件的策划集。它被格式化为来自Freesound项目的5秒录音。提出了一种基于阶梯网络的音频事件分类器。我们采用了最先进的卷积神经网络(CNN)嵌入作为该任务的音频特征。我们从信号和特征规范化开始,旨在减少不同录音场景之间的不匹配。最初,CNN是在弱标记Audioset数据上训练的。然后,采用预训练模型作为CURE语料库的特征提取器。我们还探讨了极限学习机(ELM)的性能,并使用支持向量机(SVM)作为基线分类器。作为第二个评估集,我们纳入了ESC-50。结果和讨论验证了梯形网络在鲁棒性和分类精度方面优于ELM和SVM分类器。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Cure Dataset: Ladder Networks for Audio Event Classification
Audio event classification is an important task for several applications such as surveillance, audio, video and multimedia retrieval etc. There are approximately 340 million people with hearing loss who can’t perceive events happening around them. This paper establishes the CURE dataset which contains curated set of specific audio events most relevant for people with hearing loss. It is formatted as 5 sec sound recordings derived from the Freesound project. We propose a ladder network based audio event classifier. We adopted the state-of-the-art convolutional neural network (CNN) embeddings as audio features for this task. We start with signal and feature normalization that aims to reduce the mismatch between different recordings scenarios. Initially, a CNN is trained on weakly labeled Audioset data. Next, the pre-trained model is adopted as feature extractor for proposed CURE corpus. We also explore the performance of extreme learning machine (ELM) and use support vector machine (SVM) as baseline classifier. As a second evaluation set we incorporate ESC-50. Results and discussions validate the superiority of Ladder network over ELM and SVM classifier in terms of robustness and increased classification accuracy.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信