声如洪钟：通过语音声学生物标记进行健康状况分类的深度学习方法。

IF 5.3 3区医学 Q1 INTEGRATIVE & COMPLEMENTARY MEDICINE

Chinese Medicine Pub Date : 2024-07-24 DOI:10.1186/s13020-024-00973-3

Yanbing Wang, Haiyan Wang, Zhuoxuan Li, Haoran Zhang, Liwen Yang, Jiarui Li, Zixiang Tang, Shujuan Hou, Qi Wang

{"title":"声如洪钟：通过语音声学生物标记进行健康状况分类的深度学习方法。","authors":"Yanbing Wang, Haiyan Wang, Zhuoxuan Li, Haoran Zhang, Liwen Yang, Jiarui Li, Zixiang Tang, Shujuan Hou, Qi Wang","doi":"10.1186/s13020-024-00973-3","DOIUrl":null,"url":null,"abstract":"Background: Human health is a complex, dynamic concept encompassing a spectrum of states influenced by genetic, environmental, physiological, and psychological factors. Traditional Chinese Medicine categorizes health into nine body constitutional types, each reflecting unique balances or imbalances in vital energies, influencing physical, mental, and emotional states. Advances in machine learning models offer promising avenues for diagnosing conditions like Alzheimer's, dementia, and respiratory diseases by analyzing speech patterns, enabling complementary non-invasive disease diagnosis. The study aims to use speech audio to identify subhealth populations characterized by unbalanced constitution types.Methods: Participants, aged 18-45, were selected from the Acoustic Study of Health. Audio recordings were collected using ATR2500X-USB microphones and Praat software. Exclusion criteria included recent illness, dental issues, and specific medical histories. The audio data were preprocessed to Mel-frequency cepstral coefficients (MFCCs) for model training. Three deep learning models-1-Dimensional Convolution Network (Conv1D), 2-Dimensional Convolution Network (Conv2D), and Long Short-Term Memory (LSTM)-were implemented using Python to classify health status. Saliency maps were generated to provide model explainability.Results: The study used 1,378 recordings from balanced (healthy) and 1,413 from unbalanced (subhealth) types. The Conv1D model achieved a training accuracy of 91.91% and validation accuracy of 84.19%. The Conv2D model had 96.19% training accuracy and 84.93% validation accuracy. The LSTM model showed 92.79% training accuracy and 87.13% validation accuracy, with early signs of overfitting. AUC scores were 0.92 and 0.94 (Conv1D), 0.99 (Conv2D), and 0.97 (LSTM). All models demonstrated robust performance, with Conv2D excelling in discrimination accuracy.Conclusions: The deep learning classification of human speech audio for health status using body constitution types showed promising results with Conv1D, Conv2D, and LSTM models. Analysis of ROC curves, training accuracy, and validation accuracy showed all models robustly distinguished between balanced and unbalanced constitution types. Conv2D excelled with good accuracy, while Conv1D and LSTM also performed well, affirming their reliability. The study integrates constitution theory and deep learning technologies to classify subhealth populations using noninvasive approach, thereby promoting personalized medicine and early intervention strategies.","PeriodicalId":10266,"journal":{"name":"Chinese Medicine","volume":"19 1","pages":"101"},"PeriodicalIF":5.3000,"publicationDate":"2024-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11267751/pdf/","citationCount":"0","resultStr":"{\"title\":\"Sound as a bell: a deep learning approach for health status classification through speech acoustic biomarkers.\",\"authors\":\"Yanbing Wang, Haiyan Wang, Zhuoxuan Li, Haoran Zhang, Liwen Yang, Jiarui Li, Zixiang Tang, Shujuan Hou, Qi Wang\",\"doi\":\"10.1186/s13020-024-00973-3\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background: Human health is a complex, dynamic concept encompassing a spectrum of states influenced by genetic, environmental, physiological, and psychological factors. Traditional Chinese Medicine categorizes health into nine body constitutional types, each reflecting unique balances or imbalances in vital energies, influencing physical, mental, and emotional states. Advances in machine learning models offer promising avenues for diagnosing conditions like Alzheimer's, dementia, and respiratory diseases by analyzing speech patterns, enabling complementary non-invasive disease diagnosis. The study aims to use speech audio to identify subhealth populations characterized by unbalanced constitution types.Methods: Participants, aged 18-45, were selected from the Acoustic Study of Health. Audio recordings were collected using ATR2500X-USB microphones and Praat software. Exclusion criteria included recent illness, dental issues, and specific medical histories. The audio data were preprocessed to Mel-frequency cepstral coefficients (MFCCs) for model training. Three deep learning models-1-Dimensional Convolution Network (Conv1D), 2-Dimensional Convolution Network (Conv2D), and Long Short-Term Memory (LSTM)-were implemented using Python to classify health status. Saliency maps were generated to provide model explainability.Results: The study used 1,378 recordings from balanced (healthy) and 1,413 from unbalanced (subhealth) types. The Conv1D model achieved a training accuracy of 91.91% and validation accuracy of 84.19%. The Conv2D model had 96.19% training accuracy and 84.93% validation accuracy. The LSTM model showed 92.79% training accuracy and 87.13% validation accuracy, with early signs of overfitting. AUC scores were 0.92 and 0.94 (Conv1D), 0.99 (Conv2D), and 0.97 (LSTM). All models demonstrated robust performance, with Conv2D excelling in discrimination accuracy.Conclusions: The deep learning classification of human speech audio for health status using body constitution types showed promising results with Conv1D, Conv2D, and LSTM models. Analysis of ROC curves, training accuracy, and validation accuracy showed all models robustly distinguished between balanced and unbalanced constitution types. Conv2D excelled with good accuracy, while Conv1D and LSTM also performed well, affirming their reliability. The study integrates constitution theory and deep learning technologies to classify subhealth populations using noninvasive approach, thereby promoting personalized medicine and early intervention strategies.\",\"PeriodicalId\":10266,\"journal\":{\"name\":\"Chinese Medicine\",\"volume\":\"19 1\",\"pages\":\"101\"},\"PeriodicalIF\":5.3000,\"publicationDate\":\"2024-07-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11267751/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Chinese Medicine\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1186/s13020-024-00973-3\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"INTEGRATIVE & COMPLEMENTARY MEDICINE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Chinese Medicine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s13020-024-00973-3","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"INTEGRATIVE & COMPLEMENTARY MEDICINE","Score":null,"Total":0}

引用次数: 0

摘要

背景：人体健康是一个复杂、动态的概念，它包含一系列受遗传、环境、生理和心理因素影响的状态。传统中医将健康分为九种体质类型，每种类型都反映了独特的生命能量平衡或失衡，影响着身体、精神和情绪状态。机器学习模型的进步为通过分析语音模式来诊断阿尔茨海默氏症、痴呆症和呼吸系统疾病等疾病提供了前景广阔的途径，从而实现互补的非侵入性疾病诊断。本研究旨在利用语音音频识别以体质类型不平衡为特征的亚健康人群：方法：从健康声学研究中选取 18-45 岁的参与者。使用 ATR2500X-USB 麦克风和 Praat 软件收集录音。排除标准包括近期疾病、牙科问题和特殊病史。音频数据被预处理为梅尔频率共振频率系数（MFCC），用于模型训练。使用 Python 实现了三种深度学习模型--一维卷积网络（Conv1D）、二维卷积网络（Conv2D）和长短期记忆（LSTM），用于对健康状况进行分类。生成的显著性地图提供了模型的可解释性：研究使用了 1,378 份来自平衡（健康）类型和 1,413 份来自不平衡（亚健康）类型的录音。Conv1D 模型的训练准确率为 91.91%，验证准确率为 84.19%。Conv2D 模型的训练准确率为 96.19%，验证准确率为 84.93%。LSTM 模型的训练准确率为 92.79%，验证准确率为 87.13%，出现了过拟合的早期迹象。AUC 分数分别为 0.92 和 0.94（Conv1D）、0.99（Conv2D）和 0.97（LSTM）。所有模型都表现出稳健的性能，其中 Conv2D 在判别准确性方面表现突出：结论：使用 Conv1D、Conv2D 和 LSTM 模型对人类语音音频进行健康状况的深度学习分类显示出良好的效果。对 ROC 曲线、训练准确率和验证准确率的分析表明，所有模型都能稳健地区分平衡和不平衡体质类型。其中，Conv2D 模型的准确率较高，而 Conv1D 和 LSTM 模型的准确率也较高，证明了它们的可靠性。该研究将体质理论与深度学习技术相结合，利用无创方法对亚健康人群进行分类，从而促进个性化医疗和早期干预策略的实施。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Sound as a bell: a deep learning approach for health status classification through speech acoustic biomarkers.

Background: Human health is a complex, dynamic concept encompassing a spectrum of states influenced by genetic, environmental, physiological, and psychological factors. Traditional Chinese Medicine categorizes health into nine body constitutional types, each reflecting unique balances or imbalances in vital energies, influencing physical, mental, and emotional states. Advances in machine learning models offer promising avenues for diagnosing conditions like Alzheimer's, dementia, and respiratory diseases by analyzing speech patterns, enabling complementary non-invasive disease diagnosis. The study aims to use speech audio to identify subhealth populations characterized by unbalanced constitution types.

Methods: Participants, aged 18-45, were selected from the Acoustic Study of Health. Audio recordings were collected using ATR2500X-USB microphones and Praat software. Exclusion criteria included recent illness, dental issues, and specific medical histories. The audio data were preprocessed to Mel-frequency cepstral coefficients (MFCCs) for model training. Three deep learning models-1-Dimensional Convolution Network (Conv1D), 2-Dimensional Convolution Network (Conv2D), and Long Short-Term Memory (LSTM)-were implemented using Python to classify health status. Saliency maps were generated to provide model explainability.

Results: The study used 1,378 recordings from balanced (healthy) and 1,413 from unbalanced (subhealth) types. The Conv1D model achieved a training accuracy of 91.91% and validation accuracy of 84.19%. The Conv2D model had 96.19% training accuracy and 84.93% validation accuracy. The LSTM model showed 92.79% training accuracy and 87.13% validation accuracy, with early signs of overfitting. AUC scores were 0.92 and 0.94 (Conv1D), 0.99 (Conv2D), and 0.97 (LSTM). All models demonstrated robust performance, with Conv2D excelling in discrimination accuracy.

Conclusions: The deep learning classification of human speech audio for health status using body constitution types showed promising results with Conv1D, Conv2D, and LSTM models. Analysis of ROC curves, training accuracy, and validation accuracy showed all models robustly distinguished between balanced and unbalanced constitution types. Conv2D excelled with good accuracy, while Conv1D and LSTM also performed well, affirming their reliability. The study integrates constitution theory and deep learning technologies to classify subhealth populations using noninvasive approach, thereby promoting personalized medicine and early intervention strategies.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Chinese Medicine INTEGRATIVE & COMPLEMENTARY MEDICINE-PHARMACOLOGY & PHARMACY

CiteScore

7.90

自引率

4.10%

发文量

133

审稿时长

31 weeks

期刊介绍： Chinese Medicine is an open access, online journal publishing evidence-based, scientifically justified, and ethical research into all aspects of Chinese medicine. Areas of interest include recent advances in herbal medicine, clinical nutrition, clinical diagnosis, acupuncture, pharmaceutics, biomedical sciences, epidemiology, education, informatics, sociology, and psychology that are relevant and significant to Chinese medicine. Examples of research approaches include biomedical experimentation, high-throughput technology, clinical trials, systematic reviews, meta-analysis, sampled surveys, simulation, data curation, statistics, omics, translational medicine, and integrative methodologies. Chinese Medicine is a credible channel to communicate unbiased scientific data, information, and knowledge in Chinese medicine among researchers, clinicians, academics, and students in Chinese medicine and other scientific disciplines of medicine.