基于机器学习的听力损失预测：美国NHANES 2003年至2018年的研究结果

IF 2.5 2区医学 Q1 AUDIOLOGY & SPEECH-LANGUAGE PATHOLOGY

Hearing Research Pub Date : 2025-03-30 DOI:10.1016/j.heares.2025.109252

Yi Mi, Pin Sun

{"title":"基于机器学习的听力损失预测：美国NHANES 2003年至2018年的研究结果","authors":"Yi Mi, Pin Sun","doi":"10.1016/j.heares.2025.109252","DOIUrl":null,"url":null,"abstract":"<div><div>The prevalence of hearing loss (HL) has emerged as an escalating public health concern globally. The objective of this study was to leverage data from the National Health and Nutritional Examination Survey (NHANES) to develop an interpretable predictive machine learning (ML) model for HL.</div><div>In accordance with the established inclusion and exclusion criteria, a total of 2814 participants were randomly assigned to one of two distinct groups for the training and validation of the predictive models. We identified the most significant variables using Recursive Feature Elimination and constructed a HL prediction model through various ML models. The generalization ability of the models was evaluated via 10-fold cross-validation. Eight different models were utilized to develop the optimal prediction model for HL. Subsequently, three interpretable methods, Feature importance analysis, Generalized linear model (GLM) and Restricted cubic spline (RCS) were integrated into a pipeline and embedded in ML for model interpretation.</div><div>In this study, the Random Forest (RF) exhibited superior performance across all evaluation metrics after balancing the data using the Synthetic Minority Oversampling Technique (SMOTE), particularly excelling in AUC, PR-AUC and F1 score. Feature importance analysis uncovered significant correlations between HL and top 10 features, including age, blood lead (Pb) level, urine thallium (Tl) level, BMI, total energy, urine antimon (Sb) level, vitamin E intake, urine cobalt (Co) level, calcium intake and urine cesium (Cs) level. Moreover, both univariate and multivariate GLMs identified blood Pb [OR (95 % CI):1.169 (1.037,1.311)] and vitamin E intake [OR (95 % CI):0.776 (0.641,0.928)] as the main features associated with HL. The RCS analysis further revealed that increased blood Pb level and decreased vitamin E intake correspond to a proportional rise in the anticipated risk of HL after adjusted by confounders.</div><div>Our ML models identify key factors that, if validated by future studies, will have important implications for hearing conservation. Furthermore, these ML-based point-of-care prediction models will help overcome barriers to hearing healthcare and enable the efficient allocation of resources by accurately identifying individuals who are in dire need of hearing assessment.</div></div>","PeriodicalId":12881,"journal":{"name":"Hearing Research","volume":"461 ","pages":"Article 109252"},"PeriodicalIF":2.5000,"publicationDate":"2025-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Machine learning-based prediction of hearing loss: Findings of the US NHANES from 2003 to 2018\",\"authors\":\"Yi Mi, Pin Sun\",\"doi\":\"10.1016/j.heares.2025.109252\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>The prevalence of hearing loss (HL) has emerged as an escalating public health concern globally. The objective of this study was to leverage data from the National Health and Nutritional Examination Survey (NHANES) to develop an interpretable predictive machine learning (ML) model for HL.</div><div>In accordance with the established inclusion and exclusion criteria, a total of 2814 participants were randomly assigned to one of two distinct groups for the training and validation of the predictive models. We identified the most significant variables using Recursive Feature Elimination and constructed a HL prediction model through various ML models. The generalization ability of the models was evaluated via 10-fold cross-validation. Eight different models were utilized to develop the optimal prediction model for HL. Subsequently, three interpretable methods, Feature importance analysis, Generalized linear model (GLM) and Restricted cubic spline (RCS) were integrated into a pipeline and embedded in ML for model interpretation.</div><div>In this study, the Random Forest (RF) exhibited superior performance across all evaluation metrics after balancing the data using the Synthetic Minority Oversampling Technique (SMOTE), particularly excelling in AUC, PR-AUC and F1 score. Feature importance analysis uncovered significant correlations between HL and top 10 features, including age, blood lead (Pb) level, urine thallium (Tl) level, BMI, total energy, urine antimon (Sb) level, vitamin E intake, urine cobalt (Co) level, calcium intake and urine cesium (Cs) level. Moreover, both univariate and multivariate GLMs identified blood Pb [OR (95 % CI):1.169 (1.037,1.311)] and vitamin E intake [OR (95 % CI):0.776 (0.641,0.928)] as the main features associated with HL. The RCS analysis further revealed that increased blood Pb level and decreased vitamin E intake correspond to a proportional rise in the anticipated risk of HL after adjusted by confounders.</div><div>Our ML models identify key factors that, if validated by future studies, will have important implications for hearing conservation. Furthermore, these ML-based point-of-care prediction models will help overcome barriers to hearing healthcare and enable the efficient allocation of resources by accurately identifying individuals who are in dire need of hearing assessment.</div></div>\",\"PeriodicalId\":12881,\"journal\":{\"name\":\"Hearing Research\",\"volume\":\"461 \",\"pages\":\"Article 109252\"},\"PeriodicalIF\":2.5000,\"publicationDate\":\"2025-03-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Hearing Research\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0378595525000711\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"AUDIOLOGY & SPEECH-LANGUAGE PATHOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Hearing Research","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0378595525000711","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUDIOLOGY & SPEECH-LANGUAGE PATHOLOGY","Score":null,"Total":0}

引用次数: 0

摘要

听力损失（HL）的流行已成为全球日益严重的公共卫生问题。本研究的目的是利用国家健康和营养检查调查（NHANES）的数据，为HL开发可解释的预测机器学习（ML）模型。根据建立的纳入和排除标准，共有2814名参与者被随机分配到两个不同的组之一，用于训练和验证预测模型。我们使用递归特征消除识别最重要的变量，并通过各种ML模型构建HL预测模型。通过10倍交叉验证评估模型的泛化能力。利用8种不同的模型建立了HL的最优预测模型。随后，将特征重要性分析（Feature importance analysis）、广义线性模型（Generalized linear model， GLM）和受限三次样条（Restricted cubic spline， RCS）三种可解释方法集成到一个管道中，并嵌入到ML中进行模型解释。在本研究中，随机森林（RF）在使用合成少数过采样技术（SMOTE）平衡数据后，在所有评估指标上表现优异，特别是在AUC、PR-AUC和F1得分方面表现优异。特征重要性分析发现，HL与年龄、血铅（Pb）水平、尿铊（Tl）水平、BMI、总能量、尿锑（Sb）水平、维生素E摄入量、尿钴（Co）水平、钙摄入量、尿铯（Cs）水平等前10个特征显著相关。此外，单变量和多变量GLMs均确定血铅[OR (95% CI):1.169(1.037,1.311)]和维生素E摄入量[OR (95% CI):0.776(0.641,0.928)]是与HL相关的主要特征。RCS分析进一步显示，经混杂因素调整后，血铅水平升高和维生素E摄入量减少对应于HL预期风险的比例升高。我们的机器学习模型确定了关键因素，如果通过未来的研究验证，将对听力保护产生重要影响。此外，这些基于机器学习的护理点预测模型将有助于克服听力保健障碍，并通过准确识别迫切需要听力评估的个体，实现资源的有效分配。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Machine learning-based prediction of hearing loss: Findings of the US NHANES from 2003 to 2018

The prevalence of hearing loss (HL) has emerged as an escalating public health concern globally. The objective of this study was to leverage data from the National Health and Nutritional Examination Survey (NHANES) to develop an interpretable predictive machine learning (ML) model for HL.

In accordance with the established inclusion and exclusion criteria, a total of 2814 participants were randomly assigned to one of two distinct groups for the training and validation of the predictive models. We identified the most significant variables using Recursive Feature Elimination and constructed a HL prediction model through various ML models. The generalization ability of the models was evaluated via 10-fold cross-validation. Eight different models were utilized to develop the optimal prediction model for HL. Subsequently, three interpretable methods, Feature importance analysis, Generalized linear model (GLM) and Restricted cubic spline (RCS) were integrated into a pipeline and embedded in ML for model interpretation.

In this study, the Random Forest (RF) exhibited superior performance across all evaluation metrics after balancing the data using the Synthetic Minority Oversampling Technique (SMOTE), particularly excelling in AUC, PR-AUC and F1 score. Feature importance analysis uncovered significant correlations between HL and top 10 features, including age, blood lead (Pb) level, urine thallium (Tl) level, BMI, total energy, urine antimon (Sb) level, vitamin E intake, urine cobalt (Co) level, calcium intake and urine cesium (Cs) level. Moreover, both univariate and multivariate GLMs identified blood Pb [OR (95 % CI):1.169 (1.037,1.311)] and vitamin E intake [OR (95 % CI):0.776 (0.641,0.928)] as the main features associated with HL. The RCS analysis further revealed that increased blood Pb level and decreased vitamin E intake correspond to a proportional rise in the anticipated risk of HL after adjusted by confounders.

Our ML models identify key factors that, if validated by future studies, will have important implications for hearing conservation. Furthermore, these ML-based point-of-care prediction models will help overcome barriers to hearing healthcare and enable the efficient allocation of resources by accurately identifying individuals who are in dire need of hearing assessment.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Hearing Research 医学-耳鼻喉科学

CiteScore

5.30

自引率

14.30%

发文量

163

审稿时长

75 days

期刊介绍： The aim of the journal is to provide a forum for papers concerned with basic peripheral and central auditory mechanisms. Emphasis is on experimental and clinical studies, but theoretical and methodological papers will also be considered. The journal publishes original research papers, review and mini- review articles, rapid communications, method/protocol and perspective articles. Papers submitted should deal with auditory anatomy, physiology, psychophysics, imaging, modeling and behavioural studies in animals and humans, as well as hearing aids and cochlear implants. Papers dealing with the vestibular system are also considered for publication. Papers on comparative aspects of hearing and on effects of drugs and environmental contaminants on hearing function will also be considered. Clinical papers will be accepted when they contribute to the understanding of normal and pathological hearing functions.