Sound as a bell: a deep learning approach for health status classification through speech acoustic biomarkers.

IF 5.3 3区 医学 Q1 INTEGRATIVE & COMPLEMENTARY MEDICINE
Yanbing Wang, Haiyan Wang, Zhuoxuan Li, Haoran Zhang, Liwen Yang, Jiarui Li, Zixiang Tang, Shujuan Hou, Qi Wang
{"title":"Sound as a bell: a deep learning approach for health status classification through speech acoustic biomarkers.","authors":"Yanbing Wang, Haiyan Wang, Zhuoxuan Li, Haoran Zhang, Liwen Yang, Jiarui Li, Zixiang Tang, Shujuan Hou, Qi Wang","doi":"10.1186/s13020-024-00973-3","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Human health is a complex, dynamic concept encompassing a spectrum of states influenced by genetic, environmental, physiological, and psychological factors. Traditional Chinese Medicine categorizes health into nine body constitutional types, each reflecting unique balances or imbalances in vital energies, influencing physical, mental, and emotional states. Advances in machine learning models offer promising avenues for diagnosing conditions like Alzheimer's, dementia, and respiratory diseases by analyzing speech patterns, enabling complementary non-invasive disease diagnosis. The study aims to use speech audio to identify subhealth populations characterized by unbalanced constitution types.</p><p><strong>Methods: </strong>Participants, aged 18-45, were selected from the Acoustic Study of Health. Audio recordings were collected using ATR2500X-USB microphones and Praat software. Exclusion criteria included recent illness, dental issues, and specific medical histories. The audio data were preprocessed to Mel-frequency cepstral coefficients (MFCCs) for model training. Three deep learning models-1-Dimensional Convolution Network (Conv1D), 2-Dimensional Convolution Network (Conv2D), and Long Short-Term Memory (LSTM)-were implemented using Python to classify health status. Saliency maps were generated to provide model explainability.</p><p><strong>Results: </strong>The study used 1,378 recordings from balanced (healthy) and 1,413 from unbalanced (subhealth) types. The Conv1D model achieved a training accuracy of 91.91% and validation accuracy of 84.19%. The Conv2D model had 96.19% training accuracy and 84.93% validation accuracy. The LSTM model showed 92.79% training accuracy and 87.13% validation accuracy, with early signs of overfitting. AUC scores were 0.92 and 0.94 (Conv1D), 0.99 (Conv2D), and 0.97 (LSTM). All models demonstrated robust performance, with Conv2D excelling in discrimination accuracy.</p><p><strong>Conclusions: </strong>The deep learning classification of human speech audio for health status using body constitution types showed promising results with Conv1D, Conv2D, and LSTM models. Analysis of ROC curves, training accuracy, and validation accuracy showed all models robustly distinguished between balanced and unbalanced constitution types. Conv2D excelled with good accuracy, while Conv1D and LSTM also performed well, affirming their reliability. The study integrates constitution theory and deep learning technologies to classify subhealth populations using noninvasive approach, thereby promoting personalized medicine and early intervention strategies.</p>","PeriodicalId":10266,"journal":{"name":"Chinese Medicine","volume":"19 1","pages":"101"},"PeriodicalIF":5.3000,"publicationDate":"2024-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11267751/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Chinese Medicine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s13020-024-00973-3","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"INTEGRATIVE & COMPLEMENTARY MEDICINE","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Human health is a complex, dynamic concept encompassing a spectrum of states influenced by genetic, environmental, physiological, and psychological factors. Traditional Chinese Medicine categorizes health into nine body constitutional types, each reflecting unique balances or imbalances in vital energies, influencing physical, mental, and emotional states. Advances in machine learning models offer promising avenues for diagnosing conditions like Alzheimer's, dementia, and respiratory diseases by analyzing speech patterns, enabling complementary non-invasive disease diagnosis. The study aims to use speech audio to identify subhealth populations characterized by unbalanced constitution types.

Methods: Participants, aged 18-45, were selected from the Acoustic Study of Health. Audio recordings were collected using ATR2500X-USB microphones and Praat software. Exclusion criteria included recent illness, dental issues, and specific medical histories. The audio data were preprocessed to Mel-frequency cepstral coefficients (MFCCs) for model training. Three deep learning models-1-Dimensional Convolution Network (Conv1D), 2-Dimensional Convolution Network (Conv2D), and Long Short-Term Memory (LSTM)-were implemented using Python to classify health status. Saliency maps were generated to provide model explainability.

Results: The study used 1,378 recordings from balanced (healthy) and 1,413 from unbalanced (subhealth) types. The Conv1D model achieved a training accuracy of 91.91% and validation accuracy of 84.19%. The Conv2D model had 96.19% training accuracy and 84.93% validation accuracy. The LSTM model showed 92.79% training accuracy and 87.13% validation accuracy, with early signs of overfitting. AUC scores were 0.92 and 0.94 (Conv1D), 0.99 (Conv2D), and 0.97 (LSTM). All models demonstrated robust performance, with Conv2D excelling in discrimination accuracy.

Conclusions: The deep learning classification of human speech audio for health status using body constitution types showed promising results with Conv1D, Conv2D, and LSTM models. Analysis of ROC curves, training accuracy, and validation accuracy showed all models robustly distinguished between balanced and unbalanced constitution types. Conv2D excelled with good accuracy, while Conv1D and LSTM also performed well, affirming their reliability. The study integrates constitution theory and deep learning technologies to classify subhealth populations using noninvasive approach, thereby promoting personalized medicine and early intervention strategies.

声如洪钟:通过语音声学生物标记进行健康状况分类的深度学习方法。
背景:人体健康是一个复杂、动态的概念,它包含一系列受遗传、环境、生理和心理因素影响的状态。传统中医将健康分为九种体质类型,每种类型都反映了独特的生命能量平衡或失衡,影响着身体、精神和情绪状态。机器学习模型的进步为通过分析语音模式来诊断阿尔茨海默氏症、痴呆症和呼吸系统疾病等疾病提供了前景广阔的途径,从而实现互补的非侵入性疾病诊断。本研究旨在利用语音音频识别以体质类型不平衡为特征的亚健康人群:方法:从健康声学研究中选取 18-45 岁的参与者。使用 ATR2500X-USB 麦克风和 Praat 软件收集录音。排除标准包括近期疾病、牙科问题和特殊病史。音频数据被预处理为梅尔频率共振频率系数(MFCC),用于模型训练。使用 Python 实现了三种深度学习模型--一维卷积网络(Conv1D)、二维卷积网络(Conv2D)和长短期记忆(LSTM),用于对健康状况进行分类。生成的显著性地图提供了模型的可解释性:研究使用了 1,378 份来自平衡(健康)类型和 1,413 份来自不平衡(亚健康)类型的录音。Conv1D 模型的训练准确率为 91.91%,验证准确率为 84.19%。Conv2D 模型的训练准确率为 96.19%,验证准确率为 84.93%。LSTM 模型的训练准确率为 92.79%,验证准确率为 87.13%,出现了过拟合的早期迹象。AUC 分数分别为 0.92 和 0.94(Conv1D)、0.99(Conv2D)和 0.97(LSTM)。所有模型都表现出稳健的性能,其中 Conv2D 在判别准确性方面表现突出:结论:使用 Conv1D、Conv2D 和 LSTM 模型对人类语音音频进行健康状况的深度学习分类显示出良好的效果。对 ROC 曲线、训练准确率和验证准确率的分析表明,所有模型都能稳健地区分平衡和不平衡体质类型。其中,Conv2D 模型的准确率较高,而 Conv1D 和 LSTM 模型的准确率也较高,证明了它们的可靠性。该研究将体质理论与深度学习技术相结合,利用无创方法对亚健康人群进行分类,从而促进个性化医疗和早期干预策略的实施。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Chinese Medicine
Chinese Medicine INTEGRATIVE & COMPLEMENTARY MEDICINE-PHARMACOLOGY & PHARMACY
CiteScore
7.90
自引率
4.10%
发文量
133
审稿时长
31 weeks
期刊介绍: Chinese Medicine is an open access, online journal publishing evidence-based, scientifically justified, and ethical research into all aspects of Chinese medicine. Areas of interest include recent advances in herbal medicine, clinical nutrition, clinical diagnosis, acupuncture, pharmaceutics, biomedical sciences, epidemiology, education, informatics, sociology, and psychology that are relevant and significant to Chinese medicine. Examples of research approaches include biomedical experimentation, high-throughput technology, clinical trials, systematic reviews, meta-analysis, sampled surveys, simulation, data curation, statistics, omics, translational medicine, and integrative methodologies. Chinese Medicine is a credible channel to communicate unbiased scientific data, information, and knowledge in Chinese medicine among researchers, clinicians, academics, and students in Chinese medicine and other scientific disciplines of medicine.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信