Lei Geng, Hongfeng Shan, Zhitao Xiao, Wei Wang, Mei Wei
{"title":"基于多模态融合的语音信号和EGG信号的语音病理检测与分类。","authors":"Lei Geng, Hongfeng Shan, Zhitao Xiao, Wei Wang, Mei Wei","doi":"10.1515/bmt-2021-0112","DOIUrl":null,"url":null,"abstract":"<p><p>Automatic voice pathology detection and classification plays an important role in the diagnosis and prevention of voice disorders. To accurately describe the pronunciation characteristics of patients with dysarthria and improve the effect of pathological voice detection, this study proposes a pathological voice detection method based on a multi-modal network structure. First, speech signals and electroglottography (EGG) signals are mapped from the time domain to the frequency domain spectrogram via a short-time Fourier transform (STFT). The Mel filter bank acts on the spectrogram to enhance the signal's harmonics and denoise. Second, a pre-trained convolutional neural network (CNN) is used as the backbone network to extract sound state features and vocal cord vibration features from the two signals. To obtain a better classification effect, the fused features are input into the long short-term memory (LSTM) network for voice feature selection and enhancement. The proposed system achieves 95.73% for accuracy with 96.10% F1-score and 96.73% recall using the Saarbrucken Voice Database (SVD); thus, enabling a new method for pathological speech detection.</p>","PeriodicalId":8900,"journal":{"name":"Biomedical Engineering / Biomedizinische Technik","volume":"66 6","pages":"613-625"},"PeriodicalIF":1.3000,"publicationDate":"2021-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Voice pathology detection and classification from speech signals and EGG signals based on a multimodal fusion method.\",\"authors\":\"Lei Geng, Hongfeng Shan, Zhitao Xiao, Wei Wang, Mei Wei\",\"doi\":\"10.1515/bmt-2021-0112\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Automatic voice pathology detection and classification plays an important role in the diagnosis and prevention of voice disorders. To accurately describe the pronunciation characteristics of patients with dysarthria and improve the effect of pathological voice detection, this study proposes a pathological voice detection method based on a multi-modal network structure. First, speech signals and electroglottography (EGG) signals are mapped from the time domain to the frequency domain spectrogram via a short-time Fourier transform (STFT). The Mel filter bank acts on the spectrogram to enhance the signal's harmonics and denoise. Second, a pre-trained convolutional neural network (CNN) is used as the backbone network to extract sound state features and vocal cord vibration features from the two signals. To obtain a better classification effect, the fused features are input into the long short-term memory (LSTM) network for voice feature selection and enhancement. The proposed system achieves 95.73% for accuracy with 96.10% F1-score and 96.73% recall using the Saarbrucken Voice Database (SVD); thus, enabling a new method for pathological speech detection.</p>\",\"PeriodicalId\":8900,\"journal\":{\"name\":\"Biomedical Engineering / Biomedizinische Technik\",\"volume\":\"66 6\",\"pages\":\"613-625\"},\"PeriodicalIF\":1.3000,\"publicationDate\":\"2021-11-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Biomedical Engineering / Biomedizinische Technik\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://doi.org/10.1515/bmt-2021-0112\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2021/12/20 0:00:00\",\"PubModel\":\"Print\",\"JCR\":\"Q4\",\"JCRName\":\"ENGINEERING, BIOMEDICAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biomedical Engineering / Biomedizinische Technik","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1515/bmt-2021-0112","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2021/12/20 0:00:00","PubModel":"Print","JCR":"Q4","JCRName":"ENGINEERING, BIOMEDICAL","Score":null,"Total":0}
Voice pathology detection and classification from speech signals and EGG signals based on a multimodal fusion method.
Automatic voice pathology detection and classification plays an important role in the diagnosis and prevention of voice disorders. To accurately describe the pronunciation characteristics of patients with dysarthria and improve the effect of pathological voice detection, this study proposes a pathological voice detection method based on a multi-modal network structure. First, speech signals and electroglottography (EGG) signals are mapped from the time domain to the frequency domain spectrogram via a short-time Fourier transform (STFT). The Mel filter bank acts on the spectrogram to enhance the signal's harmonics and denoise. Second, a pre-trained convolutional neural network (CNN) is used as the backbone network to extract sound state features and vocal cord vibration features from the two signals. To obtain a better classification effect, the fused features are input into the long short-term memory (LSTM) network for voice feature selection and enhancement. The proposed system achieves 95.73% for accuracy with 96.10% F1-score and 96.73% recall using the Saarbrucken Voice Database (SVD); thus, enabling a new method for pathological speech detection.
期刊介绍:
Biomedical Engineering / Biomedizinische Technik (BMT) is a high-quality forum for the exchange of knowledge in the fields of biomedical engineering, medical information technology and biotechnology/bioengineering. As an established journal with a tradition of more than 60 years, BMT addresses engineers, natural scientists, and clinicians working in research, industry, or clinical practice.