A Voice Activity Detection Algorithm Using Sparse Non-negative Matrix Factorization-based Model Learning in Spectro-Temporal Domain

IF 1.5 Q2 ENGINEERING, MULTIDISCIPLINARY
S. Mavaddati
{"title":"A Voice Activity Detection Algorithm Using Sparse Non-negative Matrix Factorization-based Model Learning in Spectro-Temporal Domain","authors":"S. Mavaddati","doi":"10.5829/ije.2023.36.08b.08","DOIUrl":null,"url":null,"abstract":"Voice activity detectors are presented to extract silence/speech segments of the speech signal to eliminate different background noise signals. A novel voice activity detector is proposed in this paper using spectro-temporal features extracted from the auditory model of the speech signal. After extracting the scale, rate, and frequency features from this feature space, a sparse structured principal component analysis algorithm is used to consider the basic components of these features and reduce the dimension of learning data. Then these feature vectors are employed to learn the models by the sparse non-negative matrix factorization algorithm. The model learning procedure is performed to represent each feature vector with a proper sparse rate based on the selected atoms. Voice activity detection of the input frames is performed by computing the energy of the sparse representation for each input frame over the composite model. If the calculated energy exceeds a specified threshold, it indicates that the input frame has a structure similar to the atoms of the learned models and concludes that the observed frame has voice content. The results of the proposed detector were compared with other baseline methods and classifiers in this processing field. These results in the presence of stationary, non-stationary and periodic noises were investigated and they are shown that the proposed method based on model learning with spectro-temporal features can correctly detect","PeriodicalId":14109,"journal":{"name":"International Journal of Engineering","volume":"1 1","pages":""},"PeriodicalIF":1.5000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5829/ije.2023.36.08b.08","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

Abstract

Voice activity detectors are presented to extract silence/speech segments of the speech signal to eliminate different background noise signals. A novel voice activity detector is proposed in this paper using spectro-temporal features extracted from the auditory model of the speech signal. After extracting the scale, rate, and frequency features from this feature space, a sparse structured principal component analysis algorithm is used to consider the basic components of these features and reduce the dimension of learning data. Then these feature vectors are employed to learn the models by the sparse non-negative matrix factorization algorithm. The model learning procedure is performed to represent each feature vector with a proper sparse rate based on the selected atoms. Voice activity detection of the input frames is performed by computing the energy of the sparse representation for each input frame over the composite model. If the calculated energy exceeds a specified threshold, it indicates that the input frame has a structure similar to the atoms of the learned models and concludes that the observed frame has voice content. The results of the proposed detector were compared with other baseline methods and classifiers in this processing field. These results in the presence of stationary, non-stationary and periodic noises were investigated and they are shown that the proposed method based on model learning with spectro-temporal features can correctly detect
基于谱时域稀疏非负矩阵分解模型学习的语音活动检测算法
语音活动检测器用于提取语音信号中的沉默/语音片段,以消除不同背景噪声信号。本文提出了一种新的语音活动检测器,该检测器利用从语音信号的听觉模型中提取的频谱-时间特征。从该特征空间中提取尺度、速率和频率特征后,采用稀疏结构化主成分分析算法考虑这些特征的基本成分,对学习数据进行降维处理。然后利用这些特征向量通过稀疏非负矩阵分解算法学习模型。模型学习过程是基于选择的原子以适当的稀疏率表示每个特征向量。通过计算复合模型上每个输入帧的稀疏表示的能量来执行输入帧的语音活动检测。如果计算的能量超过指定的阈值,则表明输入帧具有与学习模型的原子相似的结构,并得出观察帧具有语音内容的结论。将该检测器的结果与该处理领域的其他基线方法和分类器进行了比较。研究了平稳噪声、非平稳噪声和周期性噪声存在时的检测结果,结果表明,基于光谱-时间特征的模型学习方法能够正确检测噪声
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
International Journal of Engineering
International Journal of Engineering ENGINEERING, MULTIDISCIPLINARY-
自引率
23.10%
发文量
0
期刊介绍: The objective of the International Journal of Engineering is to provide a forum for communication of information among the world''s scientific and technological community and Iranian scientists and engineers. This journal intends to be of interest and utility to researchers and practitioners in the academic, industrial and governmental sectors. All original research contributions of significant value in all areas of engineering discipline are welcome. This journal is published in two quarterly transactions. Transactions A (Basics) deals with the engineering fundamentals. Transactions B (Applications) are concerned with the application of engineering knowledge in the daily life of the human being and Transactions C (Aspects) - starting from January 2012 - emphasize on the main engineering aspects whose elaboration can yield knowledge and expertise that can equally serve all branches of engineering discipline. This journal will publish authoritative papers on theoretical and experimental researches and advanced applications embodying the results of extensive field, plant, laboratory or theoretical investigation or new interpretations of existing problems. It may also feature - when appropriate - research notes, technical notes, state-of-the-art survey type papers, short communications, letters to the editor, meeting schedules and conference announcements. The language of publication is English. Each paper should contain an abstract both in English and Persian. However, for the authors who are not familiar with Persian language, the publisher will prepare the translations. The abstracts should not exceed 250 words.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信