IF 3.4 2区 物理与天体物理 Q1 ACOUSTICS
Manjiri Bhat , R.B. Keskar
{"title":"Self-supervised random forests for robust voice activity detection with limited labeled data","authors":"Manjiri Bhat ,&nbsp;R.B. Keskar","doi":"10.1016/j.apacoust.2025.110636","DOIUrl":null,"url":null,"abstract":"<div><div>Voice activity detection is essential for various downstream speech-related applications. Existing deep learning models for voice activity detection and speech recognition are available, but they often require substantial annotated data and assume noise-free environments. This limitation hinders their application to the vast but sparsely labeled audio datasets available. To address this gap, we propose a novel approach: self-supervised random forest voice activity detection (SSRF-VAD), designed for noisy environments and limited labeled data. We integrate a set of five handcrafted features to optimize performance under mixed signal-to-noise ratios (SNRs). The study incorporates various noise classes covering diverse environmental sounds such as urban sounds, water sounds, indoor appliances, and animals. Our SSRF-VAD approach achieves an improvement of 3 % in F1-score using only 20 % of the labeled training data compared to state-of-the-art MarbleNet model trained on the complete training dataset. Feature selection, implemented using two distinct feature importance techniques, SHAP and GINI, reduces the feature vector dimensionality by 75 % while preserving accuracy. Further, a novel three-class classification for separating clean speech, noisy speech, and non-speech audio segments with the proposed technique achieves 98.74 % accuracy with 0.982 F1-score. This framework enhances speech analysis and noise characterization, contributing to efficient speech enhancement. Thus, the proposed SSRF-VAD method reduces the requirement for labeled data and can be implemented on resource-constrained devices such as smart hearing aids and smart home assistants.</div></div>","PeriodicalId":55506,"journal":{"name":"Applied Acoustics","volume":"234 ","pages":"Article 110636"},"PeriodicalIF":3.4000,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Acoustics","FirstCategoryId":"101","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0003682X25001082","RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ACOUSTICS","Score":null,"Total":0}
引用次数: 0

摘要

语音活动检测对于各种下游语音相关应用至关重要。现有的深度学习模型可用于语音活动检测和语音识别,但它们通常需要大量的注释数据,并假定在无噪声环境中进行。这一局限性阻碍了它们在大量但标注稀少的音频数据集上的应用。为了弥补这一不足,我们提出了一种新方法:自监督随机森林语音活动检测(SSRF-VAD),专为噪声环境和有限的标注数据而设计。我们整合了一组五个手工制作的特征,以优化混合信噪比(SNR)下的性能。这项研究结合了各种噪声类别,涵盖了城市声音、水声、室内电器和动物等各种环境声音。与在完整训练数据集上训练的最先进的 MarbleNet 模型相比,我们的 SSRF-VAD 方法仅使用了 20% 的标注训练数据,就将 F1 分数提高了 3%。使用 SHAP 和 GINI 两种不同的特征重要性技术进行特征选择,可将特征向量维度降低 75%,同时保持准确性。此外,利用所提出的技术对干净语音、噪声语音和非语音音频片段进行新颖的三类分类,准确率达到 98.74%,F1 分数为 0.982。该框架增强了语音分析和噪声表征,有助于实现高效的语音增强。因此,所提出的 SSRF-VAD 方法降低了对标记数据的要求,可在智能助听器和智能家居助手等资源有限的设备上实现。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Self-supervised random forests for robust voice activity detection with limited labeled data
Voice activity detection is essential for various downstream speech-related applications. Existing deep learning models for voice activity detection and speech recognition are available, but they often require substantial annotated data and assume noise-free environments. This limitation hinders their application to the vast but sparsely labeled audio datasets available. To address this gap, we propose a novel approach: self-supervised random forest voice activity detection (SSRF-VAD), designed for noisy environments and limited labeled data. We integrate a set of five handcrafted features to optimize performance under mixed signal-to-noise ratios (SNRs). The study incorporates various noise classes covering diverse environmental sounds such as urban sounds, water sounds, indoor appliances, and animals. Our SSRF-VAD approach achieves an improvement of 3 % in F1-score using only 20 % of the labeled training data compared to state-of-the-art MarbleNet model trained on the complete training dataset. Feature selection, implemented using two distinct feature importance techniques, SHAP and GINI, reduces the feature vector dimensionality by 75 % while preserving accuracy. Further, a novel three-class classification for separating clean speech, noisy speech, and non-speech audio segments with the proposed technique achieves 98.74 % accuracy with 0.982 F1-score. This framework enhances speech analysis and noise characterization, contributing to efficient speech enhancement. Thus, the proposed SSRF-VAD method reduces the requirement for labeled data and can be implemented on resource-constrained devices such as smart hearing aids and smart home assistants.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Applied Acoustics
Applied Acoustics 物理-声学
CiteScore
7.40
自引率
11.80%
发文量
618
审稿时长
7.5 months
期刊介绍: Since its launch in 1968, Applied Acoustics has been publishing high quality research papers providing state-of-the-art coverage of research findings for engineers and scientists involved in applications of acoustics in the widest sense. Applied Acoustics looks not only at recent developments in the understanding of acoustics but also at ways of exploiting that understanding. The Journal aims to encourage the exchange of practical experience through publication and in so doing creates a fund of technological information that can be used for solving related problems. The presentation of information in graphical or tabular form is especially encouraged. If a report of a mathematical development is a necessary part of a paper it is important to ensure that it is there only as an integral part of a practical solution to a problem and is supported by data. Applied Acoustics encourages the exchange of practical experience in the following ways: • Complete Papers • Short Technical Notes • Review Articles; and thereby provides a wealth of technological information that can be used to solve related problems. Manuscripts that address all fields of applications of acoustics ranging from medicine and NDT to the environment and buildings are welcome.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信