探索和识别预测儿童阅读障碍的关键因素：从筛选到诊断的先进机器学习算法

IF 2.7 3区心理学 Q1 PSYCHOLOGY, CLINICAL

Clinical psychology & psychotherapy Pub Date : 2025-04-30 DOI:10.1002/cpp.70077

Abdullah Alrubaian

{"title":"探索和识别预测儿童阅读障碍的关键因素：从筛选到诊断的先进机器学习算法","authors":"Abdullah Alrubaian","doi":"10.1002/cpp.70077","DOIUrl":null,"url":null,"abstract":"<div>\n \n \n <section>\n \n <h3> Introduction</h3>\n \n <p>The current study aimed to develop and validate a machine learning (ML)–based predictive models for early dyslexia detection in children by integrating neurocognitive, linguistic and behavioural predictors.</p>\n </section>\n \n <section>\n \n <h3> Method</h3>\n \n <p>A cross-sectional study was conducted with 300 Saudi Arabian children (150 children with dyslexia, 150 controls) aged 6–12 years and their parents. Participants underwent assessments for attention, phonological awareness, rapid automatised naming (RAN), cognitive flexibility and other predictors. Four ML models—logistic regression, random forest, XGBoost and an ensemble—were trained and evaluated using performance metrics (AUC, sensitivity, specificity). Recursive feature elimination (RFE) identified key predictors.</p>\n </section>\n \n <section>\n \n <h3> Results</h3>\n \n <p>The RFE (15-fold cross-validation) identified attention, RAN, early language delay, phonological awareness and cognitive flexibility as the top five predictors of dyslexia. The ML models demonstrated high diagnostic accuracy for dyslexia detection. Logistic regression achieved superior performance with an area under the curve (AUC) of 0.95 (95% CI: 0.92–0.98), sensitivity of 97%, specificity of 91% and overall accuracy of 94%. Random forest and XGBoost yielded slightly lower but robust AUCs (0.91 and 0.93, respectively), with balanced sensitivity (95%) and specificity (91%). The ensemble model harmonised algorithmic strengths, retaining an AUC of 0.93 while prioritising interpretability through weighted contributions from XGBoost (40%), random forest (30%) and logistic regression (30%).</p>\n </section>\n \n <section>\n \n <h3> Conclusion</h3>\n \n <p>This study demonstrated the transformative potential of ML in dyslexia diagnostics. By systematically prioritising phonological awareness, RAN and attention deficits, ML models offer a scalable, objective framework for early identification. These tools could alleviate reliance on subjective assessments, enabling timely interventions to mitigate dyslexia's long-term impacts.</p>\n </section>\n </div>","PeriodicalId":10460,"journal":{"name":"Clinical psychology & psychotherapy","volume":"32 3","pages":""},"PeriodicalIF":2.7000,"publicationDate":"2025-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Exploring and Identifying Key Factors in Predicting Dyslexia in Children: Advanced Machine Learning Algorithms From Screening to Diagnosis\",\"authors\":\"Abdullah Alrubaian\",\"doi\":\"10.1002/cpp.70077\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div>\\n \\n \\n <section>\\n \\n <h3> Introduction</h3>\\n \\n <p>The current study aimed to develop and validate a machine learning (ML)–based predictive models for early dyslexia detection in children by integrating neurocognitive, linguistic and behavioural predictors.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Method</h3>\\n \\n <p>A cross-sectional study was conducted with 300 Saudi Arabian children (150 children with dyslexia, 150 controls) aged 6–12 years and their parents. Participants underwent assessments for attention, phonological awareness, rapid automatised naming (RAN), cognitive flexibility and other predictors. Four ML models—logistic regression, random forest, XGBoost and an ensemble—were trained and evaluated using performance metrics (AUC, sensitivity, specificity). Recursive feature elimination (RFE) identified key predictors.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Results</h3>\\n \\n <p>The RFE (15-fold cross-validation) identified attention, RAN, early language delay, phonological awareness and cognitive flexibility as the top five predictors of dyslexia. The ML models demonstrated high diagnostic accuracy for dyslexia detection. Logistic regression achieved superior performance with an area under the curve (AUC) of 0.95 (95% CI: 0.92–0.98), sensitivity of 97%, specificity of 91% and overall accuracy of 94%. Random forest and XGBoost yielded slightly lower but robust AUCs (0.91 and 0.93, respectively), with balanced sensitivity (95%) and specificity (91%). The ensemble model harmonised algorithmic strengths, retaining an AUC of 0.93 while prioritising interpretability through weighted contributions from XGBoost (40%), random forest (30%) and logistic regression (30%).</p>\\n </section>\\n \\n <section>\\n \\n <h3> Conclusion</h3>\\n \\n <p>This study demonstrated the transformative potential of ML in dyslexia diagnostics. By systematically prioritising phonological awareness, RAN and attention deficits, ML models offer a scalable, objective framework for early identification. These tools could alleviate reliance on subjective assessments, enabling timely interventions to mitigate dyslexia's long-term impacts.</p>\\n </section>\\n </div>\",\"PeriodicalId\":10460,\"journal\":{\"name\":\"Clinical psychology & psychotherapy\",\"volume\":\"32 3\",\"pages\":\"\"},\"PeriodicalIF\":2.7000,\"publicationDate\":\"2025-04-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Clinical psychology & psychotherapy\",\"FirstCategoryId\":\"102\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/cpp.70077\",\"RegionNum\":3,\"RegionCategory\":\"心理学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"PSYCHOLOGY, CLINICAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Clinical psychology & psychotherapy","FirstCategoryId":"102","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/cpp.70077","RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PSYCHOLOGY, CLINICAL","Score":null,"Total":0}

引用次数: 0

摘要

本研究旨在通过整合神经认知、语言和行为预测因素，开发和验证基于机器学习（ML）的儿童早期阅读障碍预测模型。方法对300名6 ~ 12岁的沙特阿拉伯儿童及其父母进行横断面研究，其中阅读障碍儿童150例，对照组150例。参与者接受了注意力、语音意识、快速自动命名（RAN）、认知灵活性和其他预测指标的评估。四种ML模型——逻辑回归、随机森林、XGBoost和一个集合——被训练并使用性能指标（AUC、灵敏度、特异性）进行评估。递归特征消除（RFE）识别关键预测因子。结果RFE（15倍交叉验证）发现注意力、RAN、早期语言延迟、语音意识和认知灵活性是阅读障碍的前五大预测因素。ML模型对阅读障碍的检测具有较高的诊断准确性。Logistic回归的曲线下面积（AUC）为0.95 (95% CI: 0.92-0.98)，灵敏度为97%，特异性为91%，总体准确率为94%。随机森林和XGBoost产生略低但稳健的auc（分别为0.91和0.93），具有平衡的敏感性（95%）和特异性（91%）。集成模型协调了算法的优势，保留了0.93的AUC，同时通过XGBoost（40%）、随机森林（30%）和逻辑回归（30%）的加权贡献来优先考虑可解释性。结论本研究证明了ML在阅读障碍诊断中的革命性潜力。通过系统地优先考虑语音意识、RAN和注意力缺陷，ML模型为早期识别提供了一个可扩展的、客观的框架。这些工具可以减轻对主观评估的依赖，使及时干预能够减轻阅读障碍的长期影响。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Exploring and Identifying Key Factors in Predicting Dyslexia in Children: Advanced Machine Learning Algorithms From Screening to Diagnosis

Introduction

The current study aimed to develop and validate a machine learning (ML)–based predictive models for early dyslexia detection in children by integrating neurocognitive, linguistic and behavioural predictors.

Method

A cross-sectional study was conducted with 300 Saudi Arabian children (150 children with dyslexia, 150 controls) aged 6–12 years and their parents. Participants underwent assessments for attention, phonological awareness, rapid automatised naming (RAN), cognitive flexibility and other predictors. Four ML models—logistic regression, random forest, XGBoost and an ensemble—were trained and evaluated using performance metrics (AUC, sensitivity, specificity). Recursive feature elimination (RFE) identified key predictors.

Results

The RFE (15-fold cross-validation) identified attention, RAN, early language delay, phonological awareness and cognitive flexibility as the top five predictors of dyslexia. The ML models demonstrated high diagnostic accuracy for dyslexia detection. Logistic regression achieved superior performance with an area under the curve (AUC) of 0.95 (95% CI: 0.92–0.98), sensitivity of 97%, specificity of 91% and overall accuracy of 94%. Random forest and XGBoost yielded slightly lower but robust AUCs (0.91 and 0.93, respectively), with balanced sensitivity (95%) and specificity (91%). The ensemble model harmonised algorithmic strengths, retaining an AUC of 0.93 while prioritising interpretability through weighted contributions from XGBoost (40%), random forest (30%) and logistic regression (30%).

Conclusion

This study demonstrated the transformative potential of ML in dyslexia diagnostics. By systematically prioritising phonological awareness, RAN and attention deficits, ML models offer a scalable, objective framework for early identification. These tools could alleviate reliance on subjective assessments, enabling timely interventions to mitigate dyslexia's long-term impacts.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Clinical psychology & psychotherapy PSYCHOLOGY, CLINICAL-

CiteScore

6.30

自引率

5.60%

发文量

106

期刊介绍： Clinical Psychology & Psychotherapy aims to keep clinical psychologists and psychotherapists up to date with new developments in their fields. The Journal will provide an integrative impetus both between theory and practice and between different orientations within clinical psychology and psychotherapy. Clinical Psychology & Psychotherapy will be a forum in which practitioners can present their wealth of expertise and innovations in order to make these available to a wider audience. Equally, the Journal will contain reports from researchers who want to address a larger clinical audience with clinically relevant issues and clinically valid research.