使用epstral特征的费舍尔向量表示法检测神经源性嗓音疾病

IF 2.5 4区 医学 Q1 AUDIOLOGY & SPEECH-LANGUAGE PATHOLOGY
Madhu Keerthana Yagnavajjula , Paavo Alku , Krothapalli Sreenivasa Rao , Pabitra Mitra
{"title":"使用epstral特征的费舍尔向量表示法检测神经源性嗓音疾病","authors":"Madhu Keerthana Yagnavajjula ,&nbsp;Paavo Alku ,&nbsp;Krothapalli Sreenivasa Rao ,&nbsp;Pabitra Mitra","doi":"10.1016/j.jvoice.2022.10.016","DOIUrl":null,"url":null,"abstract":"<div><div>Neurogenic voice disorders (NVDs) are caused by damage or malfunction of the central or peripheral nervous system that controls vocal fold movement. In this paper, we investigate the potential of the Fisher vector (FV) encoding in automatic detection of people with NVDs. FVs are used to convert features from frame level (local descriptors) to utterance level (global descriptors). At the frame level, we extract two popular cepstral representations, namely, Mel-frequency cepstral coefficients (MFCCs) and perceptual linear prediction cepstral coefficients (PLPCCs), from acoustic voice signals. In addition, the MFCC features are also extracted from every frame of the glottal source signal computed using a glottal inverse filtering (GIF) technique. The global descriptors derived from the local descriptors are used to train a support vector machine (SVM) classifier. Experiments are conducted using voice signals from 80 healthy speakers and 80 patients with NVDs (40 with spasmodic dysphonia (SD) and 40 with recurrent laryngeal nerve palsy (RLNP)) taken from the Saarbruecken voice disorder (SVD) database. The overall results indicate that the use of the FV encoding leads to better identification of people with NVDs, compared to the defacto temporal encoding. Furthermore, the SVM trained using the combination of FVs derived from the cepstral and glottal features provides the overall best detection performance.</div></div>","PeriodicalId":49954,"journal":{"name":"Journal of Voice","volume":"39 3","pages":"Pages 757-763"},"PeriodicalIF":2.5000,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Detection of Neurogenic Voice Disorders Using the Fisher Vector Representation of Cepstral Features\",\"authors\":\"Madhu Keerthana Yagnavajjula ,&nbsp;Paavo Alku ,&nbsp;Krothapalli Sreenivasa Rao ,&nbsp;Pabitra Mitra\",\"doi\":\"10.1016/j.jvoice.2022.10.016\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Neurogenic voice disorders (NVDs) are caused by damage or malfunction of the central or peripheral nervous system that controls vocal fold movement. In this paper, we investigate the potential of the Fisher vector (FV) encoding in automatic detection of people with NVDs. FVs are used to convert features from frame level (local descriptors) to utterance level (global descriptors). At the frame level, we extract two popular cepstral representations, namely, Mel-frequency cepstral coefficients (MFCCs) and perceptual linear prediction cepstral coefficients (PLPCCs), from acoustic voice signals. In addition, the MFCC features are also extracted from every frame of the glottal source signal computed using a glottal inverse filtering (GIF) technique. The global descriptors derived from the local descriptors are used to train a support vector machine (SVM) classifier. Experiments are conducted using voice signals from 80 healthy speakers and 80 patients with NVDs (40 with spasmodic dysphonia (SD) and 40 with recurrent laryngeal nerve palsy (RLNP)) taken from the Saarbruecken voice disorder (SVD) database. The overall results indicate that the use of the FV encoding leads to better identification of people with NVDs, compared to the defacto temporal encoding. Furthermore, the SVM trained using the combination of FVs derived from the cepstral and glottal features provides the overall best detection performance.</div></div>\",\"PeriodicalId\":49954,\"journal\":{\"name\":\"Journal of Voice\",\"volume\":\"39 3\",\"pages\":\"Pages 757-763\"},\"PeriodicalIF\":2.5000,\"publicationDate\":\"2025-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Voice\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0892199722003228\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"AUDIOLOGY & SPEECH-LANGUAGE PATHOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Voice","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0892199722003228","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUDIOLOGY & SPEECH-LANGUAGE PATHOLOGY","Score":null,"Total":0}
引用次数: 0

摘要

神经源性嗓音疾病(NVDs)是由于控制声带运动的中枢或外周神经系统受损或功能失常造成的。在本文中,我们研究了费舍尔向量(FV)编码在自动检测 NVDs 患者方面的潜力。费雪向量用于将帧级别(局部描述符)的特征转换为语句级别(全局描述符)的特征。在帧级,我们从声学语音信号中提取两种常用的共谱表示,即梅尔频率共谱系数(MFCC)和感知线性预测共谱系数(PLPCC)。此外,还使用声门反滤波(GIF)技术从声源信号的每一帧计算中提取 MFCC 特征。从局部描述符得出的全局描述符用于训练支持向量机(SVM)分类器。实验使用了从萨尔布吕肯嗓音疾病(SVD)数据库中提取的 80 位健康说话者和 80 位 NVD 患者(40 位痉挛性发音障碍患者和 40 位喉返神经麻痹患者)的嗓音信号。总体结果表明,与事实时间编码相比,使用 FV 编码能更好地识别 NVD 患者。此外,使用epstral和声门特征得出的FVs组合训练的SVM具有最佳的整体检测性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Detection of Neurogenic Voice Disorders Using the Fisher Vector Representation of Cepstral Features
Neurogenic voice disorders (NVDs) are caused by damage or malfunction of the central or peripheral nervous system that controls vocal fold movement. In this paper, we investigate the potential of the Fisher vector (FV) encoding in automatic detection of people with NVDs. FVs are used to convert features from frame level (local descriptors) to utterance level (global descriptors). At the frame level, we extract two popular cepstral representations, namely, Mel-frequency cepstral coefficients (MFCCs) and perceptual linear prediction cepstral coefficients (PLPCCs), from acoustic voice signals. In addition, the MFCC features are also extracted from every frame of the glottal source signal computed using a glottal inverse filtering (GIF) technique. The global descriptors derived from the local descriptors are used to train a support vector machine (SVM) classifier. Experiments are conducted using voice signals from 80 healthy speakers and 80 patients with NVDs (40 with spasmodic dysphonia (SD) and 40 with recurrent laryngeal nerve palsy (RLNP)) taken from the Saarbruecken voice disorder (SVD) database. The overall results indicate that the use of the FV encoding leads to better identification of people with NVDs, compared to the defacto temporal encoding. Furthermore, the SVM trained using the combination of FVs derived from the cepstral and glottal features provides the overall best detection performance.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Journal of Voice
Journal of Voice 医学-耳鼻喉科学
CiteScore
4.00
自引率
13.60%
发文量
395
审稿时长
59 days
期刊介绍: The Journal of Voice is widely regarded as the world''s premiere journal for voice medicine and research. This peer-reviewed publication is listed in Index Medicus and is indexed by the Institute for Scientific Information. The journal contains articles written by experts throughout the world on all topics in voice sciences, voice medicine and surgery, and speech-language pathologists'' management of voice-related problems. The journal includes clinical articles, clinical research, and laboratory research. Members of the Foundation receive the journal as a benefit of membership.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信