A Comparison of Boosted Deep Neural Networks for Voice Activity Detection

Harshit Krishnakumar, D. Williamson
{"title":"A Comparison of Boosted Deep Neural Networks for Voice Activity Detection","authors":"Harshit Krishnakumar, D. Williamson","doi":"10.1109/GlobalSIP45357.2019.8969258","DOIUrl":null,"url":null,"abstract":"Voice activity detection (VAD) is an integral part of speech processing for real world problems, and a lot of work has been done to improve VAD performance. Of late, deep neural networks have been used to detect the presence of speech and this has offered tremendous gains. Unfortunately, these efforts have been either restricted to feed-forward neural networks that do not adequately capture frequency and temporal correlations, or the recurrent architectures have not been adequately tested in noisy environments. In this paper, we investigate different neural network configurations for voice activity detection. More specifically, we explore solutions that incorporate multi-resolution stacking and ensemble learning using convolutional, long short-term memory (LSTM), and dilated convolutional neural network architectures. We evaluate our approach using various speech signals that are captured in different amounts of noise. Our results show that a multi-resolution ensemble approach using LSTM recurrent neural networks performs best. This is demonstrated for seen and unseen testing scenarios.","PeriodicalId":221378,"journal":{"name":"2019 IEEE Global Conference on Signal and Information Processing (GlobalSIP)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE Global Conference on Signal and Information Processing (GlobalSIP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/GlobalSIP45357.2019.8969258","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

Voice activity detection (VAD) is an integral part of speech processing for real world problems, and a lot of work has been done to improve VAD performance. Of late, deep neural networks have been used to detect the presence of speech and this has offered tremendous gains. Unfortunately, these efforts have been either restricted to feed-forward neural networks that do not adequately capture frequency and temporal correlations, or the recurrent architectures have not been adequately tested in noisy environments. In this paper, we investigate different neural network configurations for voice activity detection. More specifically, we explore solutions that incorporate multi-resolution stacking and ensemble learning using convolutional, long short-term memory (LSTM), and dilated convolutional neural network architectures. We evaluate our approach using various speech signals that are captured in different amounts of noise. Our results show that a multi-resolution ensemble approach using LSTM recurrent neural networks performs best. This is demonstrated for seen and unseen testing scenarios.
增强深度神经网络语音活动检测的比较
语音活动检测(VAD)是现实问题语音处理中不可或缺的一部分,人们已经做了大量的工作来提高VAD的性能。最近,深度神经网络被用于检测语音的存在,这带来了巨大的收益。不幸的是,这些努力要么局限于前馈神经网络,不能充分捕获频率和时间相关性,要么循环架构没有在嘈杂环境中进行充分的测试。在本文中,我们研究了不同的神经网络配置用于语音活动检测。更具体地说,我们探索了使用卷积、长短期记忆(LSTM)和扩展卷积神经网络架构结合多分辨率堆叠和集成学习的解决方案。我们使用在不同噪声量中捕获的各种语音信号来评估我们的方法。我们的研究结果表明,使用LSTM递归神经网络的多分辨率集成方法效果最好。这在可见和不可见的测试场景中得到了演示。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信