sVAD: A Robust, Low-Power, and Light-Weight Voice Activity Detection with Spiking Neural Networks

Qu Yang, Qianhui Liu, Nan Li, Meng Ge, Zeyang Song, Haizhou Li
{"title":"sVAD: A Robust, Low-Power, and Light-Weight Voice Activity Detection with Spiking Neural Networks","authors":"Qu Yang, Qianhui Liu, Nan Li, Meng Ge, Zeyang Song, Haizhou Li","doi":"10.1109/icassp48485.2024.10446945","DOIUrl":null,"url":null,"abstract":"Speech applications are expected to be low-power and robust under noisy conditions. An effective Voice Activity Detection (VAD) front-end lowers the computational need. Spiking Neural Networks (SNNs) are known to be biologically plausible and power-efficient. However, SNN-based VADs have yet to achieve noise robustness and often require large models for high performance. This paper introduces a novel SNN-based VAD model, referred to as sVAD, which features an auditory encoder with an SNN-based attention mechanism. Particularly, it provides effective auditory feature representation through SincNet and 1D convolution, and improves noise robustness with attention mechanisms. The classifier utilizes Spiking Recurrent Neural Networks (sRNN) to exploit temporal speech information. Experimental results demonstrate that our sVAD achieves remarkable noise robustness and meanwhile maintains low power consumption and a small footprint, making it a promising solution for real-world VAD applications.","PeriodicalId":513202,"journal":{"name":"ArXiv","volume":"8 2","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ArXiv","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/icassp48485.2024.10446945","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Speech applications are expected to be low-power and robust under noisy conditions. An effective Voice Activity Detection (VAD) front-end lowers the computational need. Spiking Neural Networks (SNNs) are known to be biologically plausible and power-efficient. However, SNN-based VADs have yet to achieve noise robustness and often require large models for high performance. This paper introduces a novel SNN-based VAD model, referred to as sVAD, which features an auditory encoder with an SNN-based attention mechanism. Particularly, it provides effective auditory feature representation through SincNet and 1D convolution, and improves noise robustness with attention mechanisms. The classifier utilizes Spiking Recurrent Neural Networks (sRNN) to exploit temporal speech information. Experimental results demonstrate that our sVAD achieves remarkable noise robustness and meanwhile maintains low power consumption and a small footprint, making it a promising solution for real-world VAD applications.
sVAD:利用尖峰神经网络进行鲁棒、低功耗和轻量级语音活动检测
语音应用需要低功耗,并能在噪声条件下保持稳定。有效的语音活动检测(VAD)前端可降低计算需求。众所周知,尖峰神经网络(SNN)具有生物合理性和高能效。然而,基于 SNN 的 VAD 尚未实现噪声鲁棒性,而且通常需要大型模型才能实现高性能。本文介绍了一种新颖的基于 SNN 的 VAD 模型(简称为 sVAD),其特点是听觉编码器具有基于 SNN 的注意机制。特别是,它通过 SincNet 和一维卷积提供了有效的听觉特征表示,并通过注意力机制提高了噪声鲁棒性。分类器利用尖峰递归神经网络(sRNN)来利用时态语音信息。实验结果表明,我们的 sVAD 具有显著的噪声鲁棒性,同时功耗低、占用空间小,是一种很有前途的实际 VAD 应用解决方案。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信