{"title":"Predictive insights into U.S. students’ mathematics performance on PISA 2022 using ensemble tree-based machine learning models","authors":"Li Zhu , Hyesun You , Minju Hong , Zhenhan Fang","doi":"10.1016/j.ijer.2025.102537","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><div>In the latest Program for International Student Assessment (PISA) 2022 results, U.S. students earned the lowest math scores in two decades. Educators and stakeholders have endeavored to identify key malleable factors in an attempt to raise scores. Although more researchers are gradually incorporating machine learning (ML) techniques, most still rely on literature reviews by humans to identify important predictors. Here we focus on providing innovative insights into how to use ML models to identify predictors most strongly associated with students’ math performance.</div></div><div><h3>Methods and Results</h3><div>The dataset comprises 4,552 U.S. students in 154 schools from the PISA 2022. We used three ensemble tree-based ML models (Random Forest, XGBoost, and LightGBM) to select most influential predictors from 143 derived variables of student and school questionnaires. All three models showed high accuracy in predicting students’ math performance, with XGBoost performing best (rMSE = 69.82, training time = 4.14 s) and identifying 10 significant predictors. According to the accumulated local effects (ALEs) plots, three of them have general positive effects, five have roughly negative effects, and two have mixed effects on students’ math performance. When comparing these ML-identified predictors to those identified by literature review, the ML method has significantly improved the accuracy of predictor selection (<em>p</em>-value < 0.05) but offered lower interpretability.</div></div><div><h3>Conclusions</h3><div>We conclude that ML predictor selection is an effective alternative to LR for obtaining influential factors affecting student learning outcomes. Among the factors identified, math self-efficacy, ESCS, and math anxiety are strongly correlate to students’ math performance. The results provide valuable insights to implement shifts in instructional practices, targeted interventions, curriculum development, and policy decisions, ultimately contributing to enhancing the overall quality of U.S. math education.</div></div>","PeriodicalId":48076,"journal":{"name":"International Journal of Educational Research","volume":"130 ","pages":"Article 102537"},"PeriodicalIF":2.6000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Educational Research","FirstCategoryId":"95","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0883035525000047","RegionNum":3,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"EDUCATION & EDUCATIONAL RESEARCH","Score":null,"Total":0}
引用次数: 0
Abstract
Background
In the latest Program for International Student Assessment (PISA) 2022 results, U.S. students earned the lowest math scores in two decades. Educators and stakeholders have endeavored to identify key malleable factors in an attempt to raise scores. Although more researchers are gradually incorporating machine learning (ML) techniques, most still rely on literature reviews by humans to identify important predictors. Here we focus on providing innovative insights into how to use ML models to identify predictors most strongly associated with students’ math performance.
Methods and Results
The dataset comprises 4,552 U.S. students in 154 schools from the PISA 2022. We used three ensemble tree-based ML models (Random Forest, XGBoost, and LightGBM) to select most influential predictors from 143 derived variables of student and school questionnaires. All three models showed high accuracy in predicting students’ math performance, with XGBoost performing best (rMSE = 69.82, training time = 4.14 s) and identifying 10 significant predictors. According to the accumulated local effects (ALEs) plots, three of them have general positive effects, five have roughly negative effects, and two have mixed effects on students’ math performance. When comparing these ML-identified predictors to those identified by literature review, the ML method has significantly improved the accuracy of predictor selection (p-value < 0.05) but offered lower interpretability.
Conclusions
We conclude that ML predictor selection is an effective alternative to LR for obtaining influential factors affecting student learning outcomes. Among the factors identified, math self-efficacy, ESCS, and math anxiety are strongly correlate to students’ math performance. The results provide valuable insights to implement shifts in instructional practices, targeted interventions, curriculum development, and policy decisions, ultimately contributing to enhancing the overall quality of U.S. math education.
期刊介绍:
The International Journal of Educational Research publishes regular papers and special issues on specific topics of interest to international audiences of educational researchers. Examples of recent Special Issues published in the journal illustrate the breadth of topics that have be included in the journal: Students Perspectives on Learning Environments, Social, Motivational and Emotional Aspects of Learning Disabilities, Epistemological Beliefs and Domain, Analyzing Mathematics Classroom Cultures and Practices, and Music Education: A site for collaborative creativity.