Prediction of Smartphone Addiction Among Korean Adolescents Based on Physical Activity and Mental Health: A Machine Learning Analysis Using LASSO and SHAP From the Korea Youth Risk Behavior Survey.
{"title":"Prediction of Smartphone Addiction Among Korean Adolescents Based on Physical Activity and Mental Health: A Machine Learning Analysis Using LASSO and SHAP From the Korea Youth Risk Behavior Survey.","authors":"Kihyuk Lee, Wooin Seo, Se Young Jung","doi":"10.31083/AP46201","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Adolescent smartphone overuse is associated with physical inactivity and mental health problems, such as anxiety. However, few studies have analyzed these factors jointly using both linear and non-linear methods. This study aimed to predict smartphone addiction using physical activity and mental health indicators from the 2020 and 2023 Korea Youth Risk Behavior Survey, applying Least Absolute Shrinkage and Selection Operator (LASSO), multiple machine learning models, and SHapley Additive exPlanations (SHAP) analysis.</p><p><strong>Methods: </strong>A total of 86,744 adolescents were classified into general (n = 63,963), potential risk (n = 20,383), and high-risk (n = 2398) smartphone user groups. For the binary classification, general users were compared with combined-risk users. Twelve key predictors were selected using LASSO. Logistic Regression, Random Forest, Extreme Gradient Boosting (XGBoost), and Light Gradient Boosting Machine (LightGBM) models were implemented with Synthetic Minority Over-sampling Technique balancing; SHAP was used to compare variable importance across models.</p><p><strong>Results: </strong>LASSO identified moderate physical activity (β = -0.156), strength physical activity (-0.149), loneliness (0.144), smartphone usage time (0.085), and anxiety (0.078) as major predictors. Random Forest and Logistic Regression showed the best recall (0.63 and 0.60); LightGBM had the highest accuracy (0.726). It also achieved the highest Area Under the Receiver Operating Characteristic Curve (AUROC) (0.7108); XGBoost showed the lowest AUROC (0.5621). SHAP consistently ranked anxiety and smartphone usage time as the top predictors, with sleep and physical activity showing variable importance.</p><p><strong>Conclusions: </strong>Anxiety and smartphone usage time were consistently dominant predictors. Physical activity variables contributed in some models but showed inconsistent importance. These findings highlight the central role of mental health, with behavioral factors playing a secondary, model-specific role.</p>","PeriodicalId":72151,"journal":{"name":"Alpha psychiatry","volume":"27 1","pages":"46201"},"PeriodicalIF":3.5000,"publicationDate":"2026-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12957968/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Alpha psychiatry","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.31083/AP46201","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2026/2/1 0:00:00","PubModel":"eCollection","JCR":"Q3","JCRName":"PSYCHIATRY","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Adolescent smartphone overuse is associated with physical inactivity and mental health problems, such as anxiety. However, few studies have analyzed these factors jointly using both linear and non-linear methods. This study aimed to predict smartphone addiction using physical activity and mental health indicators from the 2020 and 2023 Korea Youth Risk Behavior Survey, applying Least Absolute Shrinkage and Selection Operator (LASSO), multiple machine learning models, and SHapley Additive exPlanations (SHAP) analysis.
Methods: A total of 86,744 adolescents were classified into general (n = 63,963), potential risk (n = 20,383), and high-risk (n = 2398) smartphone user groups. For the binary classification, general users were compared with combined-risk users. Twelve key predictors were selected using LASSO. Logistic Regression, Random Forest, Extreme Gradient Boosting (XGBoost), and Light Gradient Boosting Machine (LightGBM) models were implemented with Synthetic Minority Over-sampling Technique balancing; SHAP was used to compare variable importance across models.
Results: LASSO identified moderate physical activity (β = -0.156), strength physical activity (-0.149), loneliness (0.144), smartphone usage time (0.085), and anxiety (0.078) as major predictors. Random Forest and Logistic Regression showed the best recall (0.63 and 0.60); LightGBM had the highest accuracy (0.726). It also achieved the highest Area Under the Receiver Operating Characteristic Curve (AUROC) (0.7108); XGBoost showed the lowest AUROC (0.5621). SHAP consistently ranked anxiety and smartphone usage time as the top predictors, with sleep and physical activity showing variable importance.
Conclusions: Anxiety and smartphone usage time were consistently dominant predictors. Physical activity variables contributed in some models but showed inconsistent importance. These findings highlight the central role of mental health, with behavioral factors playing a secondary, model-specific role.