Multi-channel Environmental Sound Segmentation utilizing Sound Source Localization and Separation U-Net

Yui Sudo, Katsutoshi Itoyama, Kenji Nishida, K. Nakadai
{"title":"Multi-channel Environmental Sound Segmentation utilizing Sound Source Localization and Separation U-Net","authors":"Yui Sudo, Katsutoshi Itoyama, Kenji Nishida, K. Nakadai","doi":"10.1109/IEEECONF49454.2021.9382730","DOIUrl":null,"url":null,"abstract":"This paper proposes a multi-channel environmental sound segmentation method. Environmental sound segmentation is an integrated method that deals with sound source localization, sound source separation and class identification. When multiple microphones are available, spatial features can be used to improve the separation accuracy of signals from different directions; however, conventional methods have two drawbacks: (a) Since sound source localization and sound source separation using spatial features and class identification using spectral features are trained in the same neural network, it overfits to the relationship between the direction of arrival and the class. (b) Although the permutation invariant training used in speech recognition could be extended, it is not practical for environmental sounds due to the maximum number of speakers limitation. This paper proposes multi-channel environmental sound segmentation method that combines U-Net which simultaneously performs sound source localization and sound source separation, and convolutional neural network which classifies the separated sounds. This method prevents overfitting to the relationship between the direction of arrival and the class. Simulation experiments using the created datasets including 75-class environmental sounds showed that the root mean squared error of the proposed method was lower than that of the conventional method.","PeriodicalId":395378,"journal":{"name":"2021 IEEE/SICE International Symposium on System Integration (SII)","volume":"54 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE/SICE International Symposium on System Integration (SII)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IEEECONF49454.2021.9382730","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

Abstract

This paper proposes a multi-channel environmental sound segmentation method. Environmental sound segmentation is an integrated method that deals with sound source localization, sound source separation and class identification. When multiple microphones are available, spatial features can be used to improve the separation accuracy of signals from different directions; however, conventional methods have two drawbacks: (a) Since sound source localization and sound source separation using spatial features and class identification using spectral features are trained in the same neural network, it overfits to the relationship between the direction of arrival and the class. (b) Although the permutation invariant training used in speech recognition could be extended, it is not practical for environmental sounds due to the maximum number of speakers limitation. This paper proposes multi-channel environmental sound segmentation method that combines U-Net which simultaneously performs sound source localization and sound source separation, and convolutional neural network which classifies the separated sounds. This method prevents overfitting to the relationship between the direction of arrival and the class. Simulation experiments using the created datasets including 75-class environmental sounds showed that the root mean squared error of the proposed method was lower than that of the conventional method.
基于声源定位和分离U-Net的多通道环境声分割
提出了一种多通道环境声分割方法。环境声分割是一种综合处理声源定位、声源分离和声源分类的方法。当有多个麦克风时,可以利用空间特征提高不同方向信号的分离精度;然而,传统的方法存在两个缺点:(a)由于利用空间特征的声源定位和声源分离与利用频谱特征的类识别是在同一个神经网络中训练的,因此对到达方向与类之间的关系进行了过拟合。(b)虽然语音识别中使用的排列不变量训练可以扩展,但由于最大说话人数量的限制,它不适用于环境声音。本文提出了一种结合U-Net(同时进行声源定位和声源分离)和卷积神经网络(对分离后的声音进行分类)的多通道环境声音分割方法。这种方法防止了对到达方向和类之间关系的过度拟合。在75类环境声音数据集上进行的仿真实验表明,该方法的均方根误差低于常规方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信