Lei Geng , Yan Liang , Hongfeng Shan , Zhitao Xiao , Wei Wang , Mei Wei
{"title":"Pathological Voice Detection and Classification Based on Multimodal Transmission Network","authors":"Lei Geng , Yan Liang , Hongfeng Shan , Zhitao Xiao , Wei Wang , Mei Wei","doi":"10.1016/j.jvoice.2022.11.018","DOIUrl":null,"url":null,"abstract":"<div><h3>Objectives</h3><div><span><span>Describing pronunciation features from multiple perspectives can help doctors accurately diagnose the pathological type of a patient's voice. According to the two modal information of sound signal and electroglottography (EGG) signal, this paper proposes a pathological voice detection and </span>classification algorithm based on multimodal </span>transmission network.</div></div><div><h3>Methods</h3><div>Firstly, we used the short-time Fourier transform (STFT) to map the features of the two signals, and designed the Mel filter to obtain the Mel spectogram. Then, the constructed multimodal transmission network extracted features from Mel spectogram and applied Multimodal Transfer Module (MMTM) module. Finally, the fusion layer can integrate multimodal information, and the full connection layer diagnoses and classifies voice pathology according to the fused features.</div></div><div><h3>Results</h3><div>The experiment was based on 1179 subjects in Saarbrücken voice database (SVD), and the average accuracy, recall, specificity and F1 score of pathological voice classification reached 98.02%, 98.23%, 97.82% and 97.95% respectively. Compared with other algorithms, the classification accuracy is significantly improved.</div></div><div><h3>Conclusions</h3><div>The proposed model can integrate multiple modal information to obtain more comprehensive and stable voice features and improve the accuracy of pathological voice classification. Future research will further explore in reducing the time-consuming and complexity of the model.</div></div>","PeriodicalId":49954,"journal":{"name":"Journal of Voice","volume":"39 3","pages":"Pages 591-601"},"PeriodicalIF":2.5000,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Voice","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0892199722003708","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUDIOLOGY & SPEECH-LANGUAGE PATHOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Objectives
Describing pronunciation features from multiple perspectives can help doctors accurately diagnose the pathological type of a patient's voice. According to the two modal information of sound signal and electroglottography (EGG) signal, this paper proposes a pathological voice detection and classification algorithm based on multimodal transmission network.
Methods
Firstly, we used the short-time Fourier transform (STFT) to map the features of the two signals, and designed the Mel filter to obtain the Mel spectogram. Then, the constructed multimodal transmission network extracted features from Mel spectogram and applied Multimodal Transfer Module (MMTM) module. Finally, the fusion layer can integrate multimodal information, and the full connection layer diagnoses and classifies voice pathology according to the fused features.
Results
The experiment was based on 1179 subjects in Saarbrücken voice database (SVD), and the average accuracy, recall, specificity and F1 score of pathological voice classification reached 98.02%, 98.23%, 97.82% and 97.95% respectively. Compared with other algorithms, the classification accuracy is significantly improved.
Conclusions
The proposed model can integrate multiple modal information to obtain more comprehensive and stable voice features and improve the accuracy of pathological voice classification. Future research will further explore in reducing the time-consuming and complexity of the model.
期刊介绍:
The Journal of Voice is widely regarded as the world''s premiere journal for voice medicine and research. This peer-reviewed publication is listed in Index Medicus and is indexed by the Institute for Scientific Information. The journal contains articles written by experts throughout the world on all topics in voice sciences, voice medicine and surgery, and speech-language pathologists'' management of voice-related problems. The journal includes clinical articles, clinical research, and laboratory research. Members of the Foundation receive the journal as a benefit of membership.