Zhengheng Zhang , Longteng Jiang , Meibian Zhang , Yuan Pan , Jinnan Zheng , Anqi Liu , Weijiang Hu , Xin Jin
{"title":"Role of risk factors and their variable types in predicting noise-induced hearing loss using artificial intelligence algorithms","authors":"Zhengheng Zhang , Longteng Jiang , Meibian Zhang , Yuan Pan , Jinnan Zheng , Anqi Liu , Weijiang Hu , Xin Jin","doi":"10.1016/j.heares.2025.109353","DOIUrl":null,"url":null,"abstract":"<div><div>Early prediction and warning of occupational noise-induced hearing loss (NIHL) in workers is critical. This study aimed to explore the role of risk factors and their variable types to NIHL prediction through machine learning (ML) techniques. Data on exposure and NIHL were sourced from the Chinese National Occupational Disease Surveillance Programs and field measurements involving 15,160 workers. We developed predictive models based on logistic regression, three tree-based algorithms (random forest [RF], extreme gradient boosting [XGBoost], light gradient boosting machine [LGBM]), and tabular neural network [TabNet]. Eight features, including age, sex, noise exposure duration (ED), A-weighted equivalent sound pressure (L<sub>Aeq,8</sub> <sub>h</sub>), kurtosis, systolic blood pressure, diastolic blood pressure, and hearing protection device (HPD) usage, were evaluated through logistic regression and ML feature importance analyses. Models were trained using both original and categorized versions of the variables to compare the predictive value of variable types and assess the applicability of each algorithm. Multivariate logistic regression indicated that age, noise ED, L<sub>Aeq,8</sub> <sub>h</sub>, sex, and HPD usage were significantly associated with NIHL (<em>P</em> < 0.05). Except for logistic regression, models built with original variable types using tree-based and TabNet algorithms outperformed those using categorized type (<em>P</em> < 0.05). The LGBM model utilizing original variable types, achieved the best performance on the test set [area under the curve (AUC) of 0.745 (95 % CI 0.729–0.763)]. Feature importance analysis revealed that L<sub>Aeq,8</sub> <sub>h</sub> (LGBM), sex (XGBoost), age (RF), and kurtosis (TabNet) were key predictive variables, consistent with logistic regression results. Our study concludes that continuous variable type of risk factors provided superior predictive value compared to categorized type for NIHL. Tree-based and TabNet algorithms offer effective methods for assessing and predicting NIHL.</div></div>","PeriodicalId":12881,"journal":{"name":"Hearing Research","volume":"465 ","pages":"Article 109353"},"PeriodicalIF":2.5000,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Hearing Research","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0378595525001716","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUDIOLOGY & SPEECH-LANGUAGE PATHOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Early prediction and warning of occupational noise-induced hearing loss (NIHL) in workers is critical. This study aimed to explore the role of risk factors and their variable types to NIHL prediction through machine learning (ML) techniques. Data on exposure and NIHL were sourced from the Chinese National Occupational Disease Surveillance Programs and field measurements involving 15,160 workers. We developed predictive models based on logistic regression, three tree-based algorithms (random forest [RF], extreme gradient boosting [XGBoost], light gradient boosting machine [LGBM]), and tabular neural network [TabNet]. Eight features, including age, sex, noise exposure duration (ED), A-weighted equivalent sound pressure (LAeq,8h), kurtosis, systolic blood pressure, diastolic blood pressure, and hearing protection device (HPD) usage, were evaluated through logistic regression and ML feature importance analyses. Models were trained using both original and categorized versions of the variables to compare the predictive value of variable types and assess the applicability of each algorithm. Multivariate logistic regression indicated that age, noise ED, LAeq,8h, sex, and HPD usage were significantly associated with NIHL (P < 0.05). Except for logistic regression, models built with original variable types using tree-based and TabNet algorithms outperformed those using categorized type (P < 0.05). The LGBM model utilizing original variable types, achieved the best performance on the test set [area under the curve (AUC) of 0.745 (95 % CI 0.729–0.763)]. Feature importance analysis revealed that LAeq,8h (LGBM), sex (XGBoost), age (RF), and kurtosis (TabNet) were key predictive variables, consistent with logistic regression results. Our study concludes that continuous variable type of risk factors provided superior predictive value compared to categorized type for NIHL. Tree-based and TabNet algorithms offer effective methods for assessing and predicting NIHL.
早期预测和预警职业性噪声性听力损失(NIHL)是至关重要的。本研究旨在通过机器学习(ML)技术探讨危险因素及其变量类型在NIHL预测中的作用。暴露和NIHL数据来自中国国家职业病监测规划和现场测量,涉及15160名工人。我们开发了基于逻辑回归、三种基于树的算法(随机森林[RF]、极端梯度增强[XGBoost]、轻梯度增强机[LGBM])和表格神经网络[TabNet]的预测模型。通过logistic回归和ML特征重要性分析,对年龄、性别、噪声暴露时间(ED)、a加权等效声压(LAeq,8 h)、峰度、收缩压、舒张压、听力保护装置(HPD)使用情况等8个特征进行评价。使用变量的原始版本和分类版本对模型进行训练,以比较变量类型的预测值并评估每种算法的适用性。多因素logistic回归显示,年龄、噪声ED、LAeq、8 h、性别、HPD使用与NIHL有显著相关性(P <;0.05)。除逻辑回归外,使用基于树和TabNet算法的原始变量类型构建的模型优于使用分类类型(P <;0.05)。使用原始变量类型的LGBM模型在测试集上取得了最佳性能[曲线下面积(AUC)为0.745 (95% CI 0.729-0.763)]。特征重要性分析显示LAeq、8 h (LGBM)、性别(XGBoost)、年龄(RF)和峰度(TabNet)是关键预测变量,与logistic回归结果一致。我们的研究表明,连续可变类型的危险因素对NIHL的预测价值优于分类类型。基于树的算法和TabNet算法提供了评估和预测NIHL的有效方法。
期刊介绍:
The aim of the journal is to provide a forum for papers concerned with basic peripheral and central auditory mechanisms. Emphasis is on experimental and clinical studies, but theoretical and methodological papers will also be considered. The journal publishes original research papers, review and mini- review articles, rapid communications, method/protocol and perspective articles.
Papers submitted should deal with auditory anatomy, physiology, psychophysics, imaging, modeling and behavioural studies in animals and humans, as well as hearing aids and cochlear implants. Papers dealing with the vestibular system are also considered for publication. Papers on comparative aspects of hearing and on effects of drugs and environmental contaminants on hearing function will also be considered. Clinical papers will be accepted when they contribute to the understanding of normal and pathological hearing functions.