Mousa Moradi, Saber Kazeminasab Hashemabad, Daniel M Vu, Allison R Soneru, Asahi Fujita, Mengyu Wang, Tobias Elze, Mohammad Eslami, Nazlee Zebardast
{"title":"PyGlaucoMetrics: A Stacked Weight-Based Machine Learning Approach for Glaucoma Detection Using Visual Field Data.","authors":"Mousa Moradi, Saber Kazeminasab Hashemabad, Daniel M Vu, Allison R Soneru, Asahi Fujita, Mengyu Wang, Tobias Elze, Mohammad Eslami, Nazlee Zebardast","doi":"10.3390/medicina61030541","DOIUrl":null,"url":null,"abstract":"<p><p><i>Background and Objectives</i>: Glaucoma (GL) classification is crucial for early diagnosis and treatment, yet relying solely on stand-alone models or International Classification of Diseases (ICD) codes is insufficient due to limited predictive power and inconsistencies in clinical labeling. This study aims to improve GL classification using stacked weight-based machine learning models. <i>Materials and Methods</i>: We analyzed a subset of 33,636 participants (58% female) with 340,444 visual fields (VFs) from the Mass Eye and Ear (MEE) dataset. Five clinically relevant GL detection models (LoGTS, UKGTS, Kang, HAP2_part1, and Foster) were selected to serve as base models. Two multi-layer perceptron (MLP) models were trained using 52 total deviation (TD) and pattern deviation (PD) values from Humphrey field analyzer (HFA) 24-2 VF tests, along with four clinical variables (age, gender, follow-up time, and race) to extract model weights. These weights were then utilized to train three meta-learners, including logistic regression (LR), extreme gradient boosting (XGB), and MLP, to classify cases as GL or non-GL. <i>Results</i>: The MLP meta-learner achieved the highest performance, with an accuracy of 96.43%, an F-score of 96.01%, and an AUC of 97.96%, while also demonstrating the lowest prediction uncertainty (0.08 ± 0.13). XGB followed with 92.86% accuracy, a 92.31% F-score, and a 96.10% AUC. LR had the lowest performance, with 89.29% accuracy, an 86.96% F-score, and a 94.81% AUC, as well as the highest uncertainty (0.58 ± 0.07). Permutation importance analysis revealed that the superior temporal sector was the most influential VF feature, with importance scores of 0.08 in Kang's and 0.04 in HAP2_part1 models. Among clinical variables, age was the strongest contributor (score = 0.3). <i>Conclusions</i>: The meta-learner outperformed stand-alone models in GL classification, achieving an accuracy improvement of 8.92% over the best-performing stand-alone model (LoGTS with 87.51%), offering a valuable tool for automated glaucoma detection.</p>","PeriodicalId":49830,"journal":{"name":"Medicina-Lithuania","volume":"61 3","pages":""},"PeriodicalIF":2.4000,"publicationDate":"2025-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11944261/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Medicina-Lithuania","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.3390/medicina61030541","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MEDICINE, GENERAL & INTERNAL","Score":null,"Total":0}
引用次数: 0
Abstract
Background and Objectives: Glaucoma (GL) classification is crucial for early diagnosis and treatment, yet relying solely on stand-alone models or International Classification of Diseases (ICD) codes is insufficient due to limited predictive power and inconsistencies in clinical labeling. This study aims to improve GL classification using stacked weight-based machine learning models. Materials and Methods: We analyzed a subset of 33,636 participants (58% female) with 340,444 visual fields (VFs) from the Mass Eye and Ear (MEE) dataset. Five clinically relevant GL detection models (LoGTS, UKGTS, Kang, HAP2_part1, and Foster) were selected to serve as base models. Two multi-layer perceptron (MLP) models were trained using 52 total deviation (TD) and pattern deviation (PD) values from Humphrey field analyzer (HFA) 24-2 VF tests, along with four clinical variables (age, gender, follow-up time, and race) to extract model weights. These weights were then utilized to train three meta-learners, including logistic regression (LR), extreme gradient boosting (XGB), and MLP, to classify cases as GL or non-GL. Results: The MLP meta-learner achieved the highest performance, with an accuracy of 96.43%, an F-score of 96.01%, and an AUC of 97.96%, while also demonstrating the lowest prediction uncertainty (0.08 ± 0.13). XGB followed with 92.86% accuracy, a 92.31% F-score, and a 96.10% AUC. LR had the lowest performance, with 89.29% accuracy, an 86.96% F-score, and a 94.81% AUC, as well as the highest uncertainty (0.58 ± 0.07). Permutation importance analysis revealed that the superior temporal sector was the most influential VF feature, with importance scores of 0.08 in Kang's and 0.04 in HAP2_part1 models. Among clinical variables, age was the strongest contributor (score = 0.3). Conclusions: The meta-learner outperformed stand-alone models in GL classification, achieving an accuracy improvement of 8.92% over the best-performing stand-alone model (LoGTS with 87.51%), offering a valuable tool for automated glaucoma detection.
期刊介绍:
The journal’s main focus is on reviews as well as clinical and experimental investigations. The journal aims to advance knowledge related to problems in medicine in developing countries as well as developed economies, to disseminate research on global health, and to promote and foster prevention and treatment of diseases worldwide. MEDICINA publications cater to clinicians, diagnosticians and researchers, and serve as a forum to discuss the current status of health-related matters and their impact on a global and local scale.