基于多模态融合的语音信号和EGG信号的语音病理检测与分类。

IF 1.8 4区医学 Q4 ENGINEERING, BIOMEDICAL

Biomedical Engineering / Biomedizinische Technik Pub Date : 2021-11-29 Print Date: 2021-12-20 DOI:10.1515/bmt-2021-0112

Lei Geng, Hongfeng Shan, Zhitao Xiao, Wei Wang, Mei Wei

{"title":"基于多模态融合的语音信号和EGG信号的语音病理检测与分类。","authors":"Lei Geng, Hongfeng Shan, Zhitao Xiao, Wei Wang, Mei Wei","doi":"10.1515/bmt-2021-0112","DOIUrl":null,"url":null,"abstract":"Automatic voice pathology detection and classification plays an important role in the diagnosis and prevention of voice disorders. To accurately describe the pronunciation characteristics of patients with dysarthria and improve the effect of pathological voice detection, this study proposes a pathological voice detection method based on a multi-modal network structure. First, speech signals and electroglottography (EGG) signals are mapped from the time domain to the frequency domain spectrogram via a short-time Fourier transform (STFT). The Mel filter bank acts on the spectrogram to enhance the signal's harmonics and denoise. Second, a pre-trained convolutional neural network (CNN) is used as the backbone network to extract sound state features and vocal cord vibration features from the two signals. To obtain a better classification effect, the fused features are input into the long short-term memory (LSTM) network for voice feature selection and enhancement. The proposed system achieves 95.73% for accuracy with 96.10% F1-score and 96.73% recall using the Saarbrucken Voice Database (SVD); thus, enabling a new method for pathological speech detection.","PeriodicalId":8900,"journal":{"name":"Biomedical Engineering / Biomedizinische Technik","volume":"66 6","pages":"613-625"},"PeriodicalIF":1.8000,"publicationDate":"2021-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Voice pathology detection and classification from speech signals and EGG signals based on a multimodal fusion method.\",\"authors\":\"Lei Geng, Hongfeng Shan, Zhitao Xiao, Wei Wang, Mei Wei\",\"doi\":\"10.1515/bmt-2021-0112\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Automatic voice pathology detection and classification plays an important role in the diagnosis and prevention of voice disorders. To accurately describe the pronunciation characteristics of patients with dysarthria and improve the effect of pathological voice detection, this study proposes a pathological voice detection method based on a multi-modal network structure. First, speech signals and electroglottography (EGG) signals are mapped from the time domain to the frequency domain spectrogram via a short-time Fourier transform (STFT). The Mel filter bank acts on the spectrogram to enhance the signal's harmonics and denoise. Second, a pre-trained convolutional neural network (CNN) is used as the backbone network to extract sound state features and vocal cord vibration features from the two signals. To obtain a better classification effect, the fused features are input into the long short-term memory (LSTM) network for voice feature selection and enhancement. The proposed system achieves 95.73% for accuracy with 96.10% F1-score and 96.73% recall using the Saarbrucken Voice Database (SVD); thus, enabling a new method for pathological speech detection.\",\"PeriodicalId\":8900,\"journal\":{\"name\":\"Biomedical Engineering / Biomedizinische Technik\",\"volume\":\"66 6\",\"pages\":\"613-625\"},\"PeriodicalIF\":1.8000,\"publicationDate\":\"2021-11-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Biomedical Engineering / Biomedizinische Technik\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://doi.org/10.1515/bmt-2021-0112\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2021/12/20 0:00:00\",\"PubModel\":\"Print\",\"JCR\":\"Q4\",\"JCRName\":\"ENGINEERING, BIOMEDICAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biomedical Engineering / Biomedizinische Technik","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1515/bmt-2021-0112","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2021/12/20 0:00:00","PubModel":"Print","JCR":"Q4","JCRName":"ENGINEERING, BIOMEDICAL","Score":null,"Total":0}

引用次数: 2

摘要

语音病理自动检测与分类对语音疾病的诊断和预防具有重要作用。为了准确描述构音障碍患者的发音特征，提高病理语音检测的效果，本研究提出了一种基于多模态网络结构的病理语音检测方法。首先，通过短时傅里叶变换(STFT)将语音信号和声门电信号从时域映射到频域频谱图。Mel滤波器组作用于频谱图，增强信号的谐波和噪声。其次，利用预训练的卷积神经网络(CNN)作为主干网络，从两个信号中提取声音状态特征和声带振动特征。为了获得更好的分类效果，将融合后的特征输入到长短期记忆(LSTM)网络中进行语音特征的选择和增强。使用萨尔布吕肯语音数据库(SVD)，系统准确率达到95.73%，f1分数为96.10%，召回率为96.73%;从而为病理语音检测提供了一种新的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Voice pathology detection and classification from speech signals and EGG signals based on a multimodal fusion method.

Automatic voice pathology detection and classification plays an important role in the diagnosis and prevention of voice disorders. To accurately describe the pronunciation characteristics of patients with dysarthria and improve the effect of pathological voice detection, this study proposes a pathological voice detection method based on a multi-modal network structure. First, speech signals and electroglottography (EGG) signals are mapped from the time domain to the frequency domain spectrogram via a short-time Fourier transform (STFT). The Mel filter bank acts on the spectrogram to enhance the signal's harmonics and denoise. Second, a pre-trained convolutional neural network (CNN) is used as the backbone network to extract sound state features and vocal cord vibration features from the two signals. To obtain a better classification effect, the fused features are input into the long short-term memory (LSTM) network for voice feature selection and enhancement. The proposed system achieves 95.73% for accuracy with 96.10% F1-score and 96.73% recall using the Saarbrucken Voice Database (SVD); thus, enabling a new method for pathological speech detection.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Biomedical Engineering / Biomedizinische Technik 医学-工程：生物医学

CiteScore

3.50

自引率

5.90%

发文量

审稿时长

2-3 weeks

期刊介绍： Biomedical Engineering / Biomedizinische Technik (BMT) is a high-quality forum for the exchange of knowledge in the fields of biomedical engineering, medical information technology and biotechnology/bioengineering. As an established journal with a tradition of more than 60 years, BMT addresses engineers, natural scientists, and clinicians working in research, industry, or clinical practice.