基于相位特征和小波变换的会议系统说话人定位

Rafal Samborski, M. Ziólko
{"title":"基于相位特征和小波变换的会议系统说话人定位","authors":"Rafal Samborski, M. Ziólko","doi":"10.1109/ISSPIT.2013.6781903","DOIUrl":null,"url":null,"abstract":"Some of existing conference system employ a distant microphone array instead of microphones dedicated for each user. This approach is much more convenient although suffers from much higher noise sensitivity. One of the possible solutions is employing beamforming techniques to focus on the user that is speaking at the moment. However, beamformer needs information about the direction of arrival (DOA) parameter which is usually provided by analysing the phase differences between signals. Effectiveness of such solution decrease dramatically when the environment becomes noisy. In this paper, a novel, robust meetings diarization system is described. The decision about which user is speaking at the moment is based not only on spacial features of signal (i.e., speaker's localization) but also on spectral features. The microphone array estimates speaker localization employing generalized cross-correlation with phase transform (GCC-PHAT). Additionally, the speaker recognition system which employs wavelet-Fourier transform (WFT) extracts spectral features of voice. Described solution is much more robust than the one basing on speaker recognition or speaker localization only. The experiments during meetings in regular meeting room show that it is less noise sensitive and the switching between speakers is several times faster.","PeriodicalId":88960,"journal":{"name":"Proceedings of the ... IEEE International Symposium on Signal Processing and Information Technology. IEEE International Symposium on Signal Processing and Information Technology","volume":"28 1","pages":"000333-000337"},"PeriodicalIF":0.0000,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Speaker localization in conferencing systems employing phase features and wavelet transform\",\"authors\":\"Rafal Samborski, M. Ziólko\",\"doi\":\"10.1109/ISSPIT.2013.6781903\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Some of existing conference system employ a distant microphone array instead of microphones dedicated for each user. This approach is much more convenient although suffers from much higher noise sensitivity. One of the possible solutions is employing beamforming techniques to focus on the user that is speaking at the moment. However, beamformer needs information about the direction of arrival (DOA) parameter which is usually provided by analysing the phase differences between signals. Effectiveness of such solution decrease dramatically when the environment becomes noisy. In this paper, a novel, robust meetings diarization system is described. The decision about which user is speaking at the moment is based not only on spacial features of signal (i.e., speaker's localization) but also on spectral features. The microphone array estimates speaker localization employing generalized cross-correlation with phase transform (GCC-PHAT). Additionally, the speaker recognition system which employs wavelet-Fourier transform (WFT) extracts spectral features of voice. Described solution is much more robust than the one basing on speaker recognition or speaker localization only. The experiments during meetings in regular meeting room show that it is less noise sensitive and the switching between speakers is several times faster.\",\"PeriodicalId\":88960,\"journal\":{\"name\":\"Proceedings of the ... IEEE International Symposium on Signal Processing and Information Technology. IEEE International Symposium on Signal Processing and Information Technology\",\"volume\":\"28 1\",\"pages\":\"000333-000337\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the ... IEEE International Symposium on Signal Processing and Information Technology. IEEE International Symposium on Signal Processing and Information Technology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISSPIT.2013.6781903\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ... IEEE International Symposium on Signal Processing and Information Technology. IEEE International Symposium on Signal Processing and Information Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISSPIT.2013.6781903","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

摘要

现有的一些会议系统采用远端麦克风阵列,而不是每个用户专用的麦克风。这种方法方便得多,但噪声敏感性要高得多。一种可能的解决方案是采用波束成形技术,将注意力集中在正在说话的用户身上。然而,波束形成器需要关于到达方向(DOA)参数的信息,这些信息通常是通过分析信号之间的相位差来提供的。当环境变得嘈杂时,这种方法的有效性急剧下降。本文描述了一种新颖的、鲁棒的会议记录系统。判断哪个用户正在说话,不仅基于信号的空间特征(即说话者的定位),还基于频谱特征。麦克风阵列采用广义相关相变换(GCC-PHAT)估计扬声器定位。此外,利用小波傅里叶变换(WFT)提取语音的频谱特征。该方法比仅基于说话人识别或定位的方法具有更强的鲁棒性。在常规会议室进行的会议实验表明,该系统对噪声的敏感性较低,发言者之间的切换速度提高了数倍。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Speaker localization in conferencing systems employing phase features and wavelet transform
Some of existing conference system employ a distant microphone array instead of microphones dedicated for each user. This approach is much more convenient although suffers from much higher noise sensitivity. One of the possible solutions is employing beamforming techniques to focus on the user that is speaking at the moment. However, beamformer needs information about the direction of arrival (DOA) parameter which is usually provided by analysing the phase differences between signals. Effectiveness of such solution decrease dramatically when the environment becomes noisy. In this paper, a novel, robust meetings diarization system is described. The decision about which user is speaking at the moment is based not only on spacial features of signal (i.e., speaker's localization) but also on spectral features. The microphone array estimates speaker localization employing generalized cross-correlation with phase transform (GCC-PHAT). Additionally, the speaker recognition system which employs wavelet-Fourier transform (WFT) extracts spectral features of voice. Described solution is much more robust than the one basing on speaker recognition or speaker localization only. The experiments during meetings in regular meeting room show that it is less noise sensitive and the switching between speakers is several times faster.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信