Meimei Chen, Yang Wang, Huangwei Lei, Fei Zhang, Ruina Huang, Zhaoyang Yang
{"title":"[基于多种机器学习算法和声音情绪特征的阈下抑郁识别模型构建]。","authors":"Meimei Chen, Yang Wang, Huangwei Lei, Fei Zhang, Ruina Huang, Zhaoyang Yang","doi":"10.12122/j.issn.1673-4254.2025.04.05","DOIUrl":null,"url":null,"abstract":"<p><strong>Objectives: </strong>To construct vocal recognition classification models using 6 machine learning algorithms and vocal emotional characteristics of individuals with subthreshold depression to facilitate early identification of subthreshold depression.</p><p><strong>Methods: </strong>We collected voice data from both normal individuals and participants with subthreshold depression by asking them to read specifically chosen words and texts. From each voice sample, 384-dimensional vocal emotional feature variables were extracted, including energy feature, Meir frequency cepstrum coefficient, zero cross rate feature, sound probability feature, fundamental frequency feature, difference feature. The Recursive Feature Elimination (RFE) method was employed to select voice feature variables. Classification models were then built using the machine learning algorithms Adaptive Boosting (AdaBoost), Random Forest (RF), Linear Discriminant Analysis (LDA), Logistic Regression (LR), Lasso Regression (LRLasso), and Support Vector Machine (SVM), and the performance of these models was evaluated. To assess generalization capability of the models, we used real-world speech data to evaluate the best speech recognition classification model.</p><p><strong>Results: </strong>The AdaBoost, RF, and LDA models achieved high prediction accuracies of 100%, 100%, and 93.3% on word-reading speech test set, respectively. In the text-reading speech test set, the accuracies of the AdaBoost, RF, and LDA models were 90%, 80%, and 90%, respectively, while the accuracies of the other 3 models were all below 80%. On real-world word-reading and text-reading speech data, the classification models using AdaBoost and Random Forest still achieved high predictive accuracies (91.7% and 80.6% for AdaBoost and 86.1% and 77.8% for Random, respectively).</p><p><strong>Conclusions: </strong>Analyzing vocal emotional characteristics allows effective identification of individuals with subthreshold depression. The AdaBoost and RF models show excellent performance for classifying subthreshold depression individuals, and may thus potentially offer valuable assistance in the clinical and research settings.</p>","PeriodicalId":18962,"journal":{"name":"南方医科大学学报杂志","volume":"45 4","pages":"711-717"},"PeriodicalIF":0.0000,"publicationDate":"2025-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12037279/pdf/","citationCount":"0","resultStr":"{\"title\":\"[Construction of recognition models for subthreshold depression based on multiple machine learning algorithms and vocal emotional characteristics].\",\"authors\":\"Meimei Chen, Yang Wang, Huangwei Lei, Fei Zhang, Ruina Huang, Zhaoyang Yang\",\"doi\":\"10.12122/j.issn.1673-4254.2025.04.05\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Objectives: </strong>To construct vocal recognition classification models using 6 machine learning algorithms and vocal emotional characteristics of individuals with subthreshold depression to facilitate early identification of subthreshold depression.</p><p><strong>Methods: </strong>We collected voice data from both normal individuals and participants with subthreshold depression by asking them to read specifically chosen words and texts. From each voice sample, 384-dimensional vocal emotional feature variables were extracted, including energy feature, Meir frequency cepstrum coefficient, zero cross rate feature, sound probability feature, fundamental frequency feature, difference feature. The Recursive Feature Elimination (RFE) method was employed to select voice feature variables. Classification models were then built using the machine learning algorithms Adaptive Boosting (AdaBoost), Random Forest (RF), Linear Discriminant Analysis (LDA), Logistic Regression (LR), Lasso Regression (LRLasso), and Support Vector Machine (SVM), and the performance of these models was evaluated. To assess generalization capability of the models, we used real-world speech data to evaluate the best speech recognition classification model.</p><p><strong>Results: </strong>The AdaBoost, RF, and LDA models achieved high prediction accuracies of 100%, 100%, and 93.3% on word-reading speech test set, respectively. In the text-reading speech test set, the accuracies of the AdaBoost, RF, and LDA models were 90%, 80%, and 90%, respectively, while the accuracies of the other 3 models were all below 80%. On real-world word-reading and text-reading speech data, the classification models using AdaBoost and Random Forest still achieved high predictive accuracies (91.7% and 80.6% for AdaBoost and 86.1% and 77.8% for Random, respectively).</p><p><strong>Conclusions: </strong>Analyzing vocal emotional characteristics allows effective identification of individuals with subthreshold depression. The AdaBoost and RF models show excellent performance for classifying subthreshold depression individuals, and may thus potentially offer valuable assistance in the clinical and research settings.</p>\",\"PeriodicalId\":18962,\"journal\":{\"name\":\"南方医科大学学报杂志\",\"volume\":\"45 4\",\"pages\":\"711-717\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-04-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12037279/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"南方医科大学学报杂志\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.12122/j.issn.1673-4254.2025.04.05\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"Medicine\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"南方医科大学学报杂志","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.12122/j.issn.1673-4254.2025.04.05","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Medicine","Score":null,"Total":0}
[Construction of recognition models for subthreshold depression based on multiple machine learning algorithms and vocal emotional characteristics].
Objectives: To construct vocal recognition classification models using 6 machine learning algorithms and vocal emotional characteristics of individuals with subthreshold depression to facilitate early identification of subthreshold depression.
Methods: We collected voice data from both normal individuals and participants with subthreshold depression by asking them to read specifically chosen words and texts. From each voice sample, 384-dimensional vocal emotional feature variables were extracted, including energy feature, Meir frequency cepstrum coefficient, zero cross rate feature, sound probability feature, fundamental frequency feature, difference feature. The Recursive Feature Elimination (RFE) method was employed to select voice feature variables. Classification models were then built using the machine learning algorithms Adaptive Boosting (AdaBoost), Random Forest (RF), Linear Discriminant Analysis (LDA), Logistic Regression (LR), Lasso Regression (LRLasso), and Support Vector Machine (SVM), and the performance of these models was evaluated. To assess generalization capability of the models, we used real-world speech data to evaluate the best speech recognition classification model.
Results: The AdaBoost, RF, and LDA models achieved high prediction accuracies of 100%, 100%, and 93.3% on word-reading speech test set, respectively. In the text-reading speech test set, the accuracies of the AdaBoost, RF, and LDA models were 90%, 80%, and 90%, respectively, while the accuracies of the other 3 models were all below 80%. On real-world word-reading and text-reading speech data, the classification models using AdaBoost and Random Forest still achieved high predictive accuracies (91.7% and 80.6% for AdaBoost and 86.1% and 77.8% for Random, respectively).
Conclusions: Analyzing vocal emotional characteristics allows effective identification of individuals with subthreshold depression. The AdaBoost and RF models show excellent performance for classifying subthreshold depression individuals, and may thus potentially offer valuable assistance in the clinical and research settings.