Huawei Zhou , Shuanghong Shen , Yu Su , Yongchun Miao , Qi Liu , Linbo Zhu , Junyu Lu , Zhenya Huang
{"title":"LLM-EPSP: Large language model empowered early prediction of student performance","authors":"Huawei Zhou , Shuanghong Shen , Yu Su , Yongchun Miao , Qi Liu , Linbo Zhu , Junyu Lu , Zhenya Huang","doi":"10.1016/j.ipm.2025.104351","DOIUrl":null,"url":null,"abstract":"<div><div>Early prediction of student performance (EPSP) has garnered significant attention due to its educational value, especially its importance in academic early warning systems. State-of-the-art data mining methods have achieved remarkable success by optimizing feature selection and model enhancements. However, these methods often face challenges, including the cold-start problem, limited exploration of the intrinsic relationships among features, and poor generalization. In this work, we explore the utilization of Large Language Models (LLMs) as information integrators to address these challenges and propose a novel model called Large Language Model Empowered Early Prediction of Student Performance (LLM-EPSP). Specifically, for the cold-start problem, LLM-EPSP benefits from the inherent advantages of LLMs, which stem from their extensive pretraining on diverse datasets. This enables the model to make informed predictions even with limited initial data. For exploring intrinsic relationships among features, LLM-EPSP employs feature fusion techniques to uncover underlying connections between various features, ensuring a comprehensive and robust analysis. To enhance the generalization capabilities of LLM-EPSP, we develop predefined templates that facilitate its adaptation to a wide range of educational contexts. We evaluate our method on two real-world datasets: (1) OULAD, which includes data on 22 courses and 32,593 students, and (2) the UCI Machine Learning Repository, which contains 23 types of features from 649 students. Extensive validation demonstrates that LLM-EPSP considerably outperforms baseline approaches across diverse scenarios. Further analysis results also demonstrate the robustness and versatility of LLM-EPSP, suggesting its enormous potential in practical applications.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"63 1","pages":"Article 104351"},"PeriodicalIF":6.9000,"publicationDate":"2025-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Processing & Management","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0306457325002924","RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Early prediction of student performance (EPSP) has garnered significant attention due to its educational value, especially its importance in academic early warning systems. State-of-the-art data mining methods have achieved remarkable success by optimizing feature selection and model enhancements. However, these methods often face challenges, including the cold-start problem, limited exploration of the intrinsic relationships among features, and poor generalization. In this work, we explore the utilization of Large Language Models (LLMs) as information integrators to address these challenges and propose a novel model called Large Language Model Empowered Early Prediction of Student Performance (LLM-EPSP). Specifically, for the cold-start problem, LLM-EPSP benefits from the inherent advantages of LLMs, which stem from their extensive pretraining on diverse datasets. This enables the model to make informed predictions even with limited initial data. For exploring intrinsic relationships among features, LLM-EPSP employs feature fusion techniques to uncover underlying connections between various features, ensuring a comprehensive and robust analysis. To enhance the generalization capabilities of LLM-EPSP, we develop predefined templates that facilitate its adaptation to a wide range of educational contexts. We evaluate our method on two real-world datasets: (1) OULAD, which includes data on 22 courses and 32,593 students, and (2) the UCI Machine Learning Repository, which contains 23 types of features from 649 students. Extensive validation demonstrates that LLM-EPSP considerably outperforms baseline approaches across diverse scenarios. Further analysis results also demonstrate the robustness and versatility of LLM-EPSP, suggesting its enormous potential in practical applications.
期刊介绍:
Information Processing and Management is dedicated to publishing cutting-edge original research at the convergence of computing and information science. Our scope encompasses theory, methods, and applications across various domains, including advertising, business, health, information science, information technology marketing, and social computing.
We aim to cater to the interests of both primary researchers and practitioners by offering an effective platform for the timely dissemination of advanced and topical issues in this interdisciplinary field. The journal places particular emphasis on original research articles, research survey articles, research method articles, and articles addressing critical applications of research. Join us in advancing knowledge and innovation at the intersection of computing and information science.