利用机器学习建立青少年白天嗜睡模型的特征贡献和预测准确性：MeLiSA 研究

IF 2.7 3区医学 Q3 NEUROSCIENCES

Brain Sciences Pub Date : 2024-10-12 DOI:10.3390/brainsci14101015

Mohammed A Mamun, Jannatul Mawa Misti, Md Emran Hasan, Firoj Al-Mamun, Moneerah Mohammad ALmerab, Johurul Islam, Mohammad Muhit, David Gozal

{"title":"利用机器学习建立青少年白天嗜睡模型的特征贡献和预测准确性：MeLiSA 研究","authors":"Mohammed A Mamun, Jannatul Mawa Misti, Md Emran Hasan, Firoj Al-Mamun, Moneerah Mohammad ALmerab, Johurul Islam, Mohammad Muhit, David Gozal","doi":"10.3390/brainsci14101015","DOIUrl":null,"url":null,"abstract":"Background: Excessive daytime sleepiness (EDS) among adolescents poses significant risks to academic performance, mental health, and overall well-being. This study examines the prevalence and risk factors of EDS in adolescents in Bangladesh and utilizes machine learning approaches to predict the risk of EDS. Methods: A cross-sectional study was conducted among 1496 adolescents using a structured questionnaire. Data were collected through a two-stage stratified cluster sampling method. Chi-square tests and logistic regression analyses were performed using SPSS. Machine learning models, including Categorical Boosting (CatBoost), Extreme Gradient Boosting (XGBoost), Support Vector Machine (SVM), Random Forest (RF), K-Nearest Neighbors (KNN), and Gradient Boosting Machine (GBM), were employed to identify and predict EDS risk factors using Python and Google Colab. Results: The prevalence of EDS in the cohort was 11.6%. SHAP values from the CatBoost model identified self-rated health status, gender, and depression as the most significant predictors of EDS. Among the models, GBM achieved the highest accuracy (90.15%) and precision (88.81%), while CatBoost had comparable accuracy (89.48%) and the lowest log loss (0.25). ROC-AUC analysis showed that CatBoost and GBM performed robustly in distinguishing between EDS and non-EDS cases, with AUC scores of 0.86. Both models demonstrated the superior predictive performance for EDS compared to others. Conclusions: The study emphasizes the role of health and demographic factors in predicting EDS among adolescents in Bangladesh. Machine learning techniques offer valuable insights into the relative contribution of these factors, and can guide targeted interventions. Future research should include longitudinal and interventional studies in diverse settings to improve generalizability and develop effective strategies for managing EDS among adolescents.","PeriodicalId":9095,"journal":{"name":"Brain Sciences","volume":"14 10","pages":""},"PeriodicalIF":2.7000,"publicationDate":"2024-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11506069/pdf/","citationCount":"0","resultStr":"{\"title\":\"Feature Contributions and Predictive Accuracy in Modeling Adolescent Daytime Sleepiness Using Machine Learning: The MeLiSA Study.\",\"authors\":\"Mohammed A Mamun, Jannatul Mawa Misti, Md Emran Hasan, Firoj Al-Mamun, Moneerah Mohammad ALmerab, Johurul Islam, Mohammad Muhit, David Gozal\",\"doi\":\"10.3390/brainsci14101015\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background: Excessive daytime sleepiness (EDS) among adolescents poses significant risks to academic performance, mental health, and overall well-being. This study examines the prevalence and risk factors of EDS in adolescents in Bangladesh and utilizes machine learning approaches to predict the risk of EDS. Methods: A cross-sectional study was conducted among 1496 adolescents using a structured questionnaire. Data were collected through a two-stage stratified cluster sampling method. Chi-square tests and logistic regression analyses were performed using SPSS. Machine learning models, including Categorical Boosting (CatBoost), Extreme Gradient Boosting (XGBoost), Support Vector Machine (SVM), Random Forest (RF), K-Nearest Neighbors (KNN), and Gradient Boosting Machine (GBM), were employed to identify and predict EDS risk factors using Python and Google Colab. Results: The prevalence of EDS in the cohort was 11.6%. SHAP values from the CatBoost model identified self-rated health status, gender, and depression as the most significant predictors of EDS. Among the models, GBM achieved the highest accuracy (90.15%) and precision (88.81%), while CatBoost had comparable accuracy (89.48%) and the lowest log loss (0.25). ROC-AUC analysis showed that CatBoost and GBM performed robustly in distinguishing between EDS and non-EDS cases, with AUC scores of 0.86. Both models demonstrated the superior predictive performance for EDS compared to others. Conclusions: The study emphasizes the role of health and demographic factors in predicting EDS among adolescents in Bangladesh. Machine learning techniques offer valuable insights into the relative contribution of these factors, and can guide targeted interventions. Future research should include longitudinal and interventional studies in diverse settings to improve generalizability and develop effective strategies for managing EDS among adolescents.\",\"PeriodicalId\":9095,\"journal\":{\"name\":\"Brain Sciences\",\"volume\":\"14 10\",\"pages\":\"\"},\"PeriodicalIF\":2.7000,\"publicationDate\":\"2024-10-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11506069/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Brain Sciences\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.3390/brainsci14101015\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"NEUROSCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Brain Sciences","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.3390/brainsci14101015","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"NEUROSCIENCES","Score":null,"Total":0}

引用次数: 0

摘要

背景：青少年白天过度嗜睡（EDS）对学习成绩、心理健康和整体福祉构成重大风险。本研究调查了孟加拉国青少年中 EDS 的患病率和风险因素，并利用机器学习方法预测 EDS 的风险。研究方法使用结构化问卷对 1496 名青少年进行了横断面研究。数据通过两阶段分层分组抽样法收集。使用 SPSS 进行了卡方检验和逻辑回归分析。使用Python和Google Colab建立了机器学习模型，包括分类提升（CatBoost）、极梯度提升（XGBoost）、支持向量机（SVM）、随机森林（RF）、K-近邻（KNN）和梯度提升机（GBM），以识别和预测EDS风险因素。结果队列中 EDS 的患病率为 11.6%。CatBoost 模型的 SHAP 值确定了自评健康状况、性别和抑郁是 EDS 最重要的预测因素。在各种模型中，GBM 的准确率（90.15%）和精确度（88.81%）最高，而 CatBoost 的准确率（89.48%）和对数损失（0.25）最低。ROC-AUC 分析表明，CatBoost 和 GBM 在区分 EDS 和非 EDS 病例方面表现出色，AUC 得分为 0.86。与其他模型相比，这两种模型对 EDS 的预测性能更优。结论：本研究强调了健康和人口因素在预测孟加拉国青少年 EDS 中的作用。机器学习技术为了解这些因素的相对作用提供了宝贵的见解，并能指导有针对性的干预措施。未来的研究应包括在不同环境中进行纵向和干预性研究，以提高可推广性，并为管理青少年的 EDS 制定有效策略。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Feature Contributions and Predictive Accuracy in Modeling Adolescent Daytime Sleepiness Using Machine Learning: The MeLiSA Study.

Background: Excessive daytime sleepiness (EDS) among adolescents poses significant risks to academic performance, mental health, and overall well-being. This study examines the prevalence and risk factors of EDS in adolescents in Bangladesh and utilizes machine learning approaches to predict the risk of EDS. Methods: A cross-sectional study was conducted among 1496 adolescents using a structured questionnaire. Data were collected through a two-stage stratified cluster sampling method. Chi-square tests and logistic regression analyses were performed using SPSS. Machine learning models, including Categorical Boosting (CatBoost), Extreme Gradient Boosting (XGBoost), Support Vector Machine (SVM), Random Forest (RF), K-Nearest Neighbors (KNN), and Gradient Boosting Machine (GBM), were employed to identify and predict EDS risk factors using Python and Google Colab. Results: The prevalence of EDS in the cohort was 11.6%. SHAP values from the CatBoost model identified self-rated health status, gender, and depression as the most significant predictors of EDS. Among the models, GBM achieved the highest accuracy (90.15%) and precision (88.81%), while CatBoost had comparable accuracy (89.48%) and the lowest log loss (0.25). ROC-AUC analysis showed that CatBoost and GBM performed robustly in distinguishing between EDS and non-EDS cases, with AUC scores of 0.86. Both models demonstrated the superior predictive performance for EDS compared to others. Conclusions: The study emphasizes the role of health and demographic factors in predicting EDS among adolescents in Bangladesh. Machine learning techniques offer valuable insights into the relative contribution of these factors, and can guide targeted interventions. Future research should include longitudinal and interventional studies in diverse settings to improve generalizability and develop effective strategies for managing EDS among adolescents.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Brain Sciences Neuroscience-General Neuroscience

CiteScore

4.80

自引率

9.10%

发文量

1472

审稿时长

18.71 days

期刊介绍： Brain Sciences (ISSN 2076-3425) is a peer-reviewed scientific journal that publishes original articles, critical reviews, research notes and short communications in the areas of cognitive neuroscience, developmental neuroscience, molecular and cellular neuroscience, neural engineering, neuroimaging, neurolinguistics, neuropathy, systems neuroscience, and theoretical and computational neuroscience. Our aim is to encourage scientists to publish their experimental and theoretical results in as much detail as possible. There is no restriction on the length of the papers. The full experimental details must be provided so that the results can be reproduced. Electronic files or software regarding the full details of the calculation and experimental procedure, if unable to be published in a normal way, can be deposited as supplementary material.