Poster: SlideCNN: Deep Learning for Auditory Spatial Scenes with Limited Annotated Data

Wenkai Li, Theo Gueuret, Beiyu Lin
{"title":"Poster: SlideCNN: Deep Learning for Auditory Spatial Scenes with Limited Annotated Data","authors":"Wenkai Li, Theo Gueuret, Beiyu Lin","doi":"10.1109/SEC54971.2022.00044","DOIUrl":null,"url":null,"abstract":"Sound is an important modality to perceive and understand the spatial environment. With the development of digital technology, massive amounts of smart devices in use around the world can collect sound data. Auditory spatial scenes, a spatial environment to understand and distinguish sound, are important to be detected by analyzing sounds collected via those devices. Given limited annotated auditory spatial samples, the current best-performing model can predict an auditory scene with an accuracy of 73%. We propose a novel yet simple Sliding Window based Convolutional Neural Network, SlideCNN, without manually designing features. SlideCNN leverages windowing operation to increase samples for limited annotation problems and improves the prediction accuracy by over 12% compared to the current best-performing models. It can detect real-life indoor and outdoor scenes with a 85% accuracy. The results will enhance practical applications of ML to analyze auditory scenes with limited annotated samples. It will further improve the recognition of environments that may potentially influence the safety of people, especially people with hearing aids and cochlear implant processors.","PeriodicalId":364062,"journal":{"name":"2022 IEEE/ACM 7th Symposium on Edge Computing (SEC)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE/ACM 7th Symposium on Edge Computing (SEC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SEC54971.2022.00044","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Sound is an important modality to perceive and understand the spatial environment. With the development of digital technology, massive amounts of smart devices in use around the world can collect sound data. Auditory spatial scenes, a spatial environment to understand and distinguish sound, are important to be detected by analyzing sounds collected via those devices. Given limited annotated auditory spatial samples, the current best-performing model can predict an auditory scene with an accuracy of 73%. We propose a novel yet simple Sliding Window based Convolutional Neural Network, SlideCNN, without manually designing features. SlideCNN leverages windowing operation to increase samples for limited annotation problems and improves the prediction accuracy by over 12% compared to the current best-performing models. It can detect real-life indoor and outdoor scenes with a 85% accuracy. The results will enhance practical applications of ML to analyze auditory scenes with limited annotated samples. It will further improve the recognition of environments that may potentially influence the safety of people, especially people with hearing aids and cochlear implant processors.
基于有限注释数据的听觉空间场景的深度学习
声音是感知和认识空间环境的重要方式。随着数字技术的发展,世界各地使用的大量智能设备都可以收集声音数据。听觉空间场景是一种理解和区分声音的空间环境,通过分析这些设备收集的声音来检测听觉空间场景非常重要。给定有限的注释听觉空间样本,目前表现最好的模型可以以73%的准确率预测听觉场景。我们提出了一种新颖而简单的基于滑动窗口的卷积神经网络,SlideCNN,无需手动设计特征。SlideCNN利用开窗操作来增加有限注释问题的样本,与目前性能最好的模型相比,预测精度提高了12%以上。它可以以85%的准确率检测真实的室内和室外场景。该结果将增强ML在有限注释样本中分析听觉场景的实际应用。它将进一步提高对可能影响人们安全的环境的认识,特别是佩戴助听器和人工耳蜗处理器的人。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信