Myopia is a prevalent refractive error, particularly among young adults, and is becoming a growing global concern. This study aims to predict myopia among undergraduate students using ensemble machine learning techniques and to identify key risk factors associated with its development.
A cross-sectional study was conducted in Dinajpur city, collecting 514 samples through a self-structured questionnaire covering demographic information, myopia prevalence and risk factors, knowledge and attitudes, and daily activities. Four feature selection techniques Boruta-based feature selection (BFS), Least Absolute Shrinkage and Selection Operator regression, Forward and Backward Selection and Random Forest (RF) identified 12 key predictive features. Using these features, ensemble methods, including logistic regression artificial neural network, RF, Support Vector Machine, extreme gradient boosting, and light gradient boosting machine were employed for prediction. Model performance was evaluated using accuracy, precision, recall, F1-score, and area under the curve (AUC).
The stacking ensemble model achieved the highest performance, with an accuracy of 95.42%, recall of 93.42%, precision of 98.85%, F1-score of 96.08%, and AUC of 0.979. SHapley Additive exPlanations analysis identified key risk factors, including visual impairment, family history of myopia, excessive screen time, and insufficient outdoor activities.
These findings demonstrate the effectiveness of ensemble machine learning in predicting myopia and highlight the potential for early intervention strategies. By identifying high-risk individuals, targeted awareness programs and lifestyle modifications can help mitigate myopia progression among undergraduate students.