V. Selvakumar , Nadipi Keerthana Reddy , R. Sree Vardhini Tulasi , Kunchala Rohit Kumar
{"title":"Data-Driven Insights into Social Media Behavior Using Predictive Modeling","authors":"V. Selvakumar , Nadipi Keerthana Reddy , R. Sree Vardhini Tulasi , Kunchala Rohit Kumar","doi":"10.1016/j.procs.2025.01.007","DOIUrl":null,"url":null,"abstract":"<div><div>This study proposes a statistical machine learning approach to predict social media usage across various demographic categories in India. The dataset comprises twenty-six features, including demographic attributes (age, gender, education, location), social media engagement metrics (number of followers, posts, time spent on platforms), and device-related information. It reflects real-world social media behavior on platforms such as WhatsApp, Facebook, and Instagram, capturing distinct patterns of weekday and weekend usage. Key variables such as time spent on each platform, the number of Instagram posts and followers, and overall social media usage were analyzed in detail. It is identified that significant predictors of user status categories through feature engineering, including Sabbatical, Self-Employed, Student, and Working Professional. Multiple regression models—Linear Regression, K-Nearest Neighbors, Decision Tree Regression, Random Forest Regression, Gradient Boosting, Naïve Bayes, and Support Vector Regression—were employed to assess their performance in predicting user status. Comparative analysis revealed that the Gradient Boosting algorithm outperformed other models with the highest accuracy. The machine learning workflow encompassed data pre-processing, feature engineering, model training, and evaluation, all implemented using Python. This study significantly advances the field by elucidating the key predictors of social media engagement and providing a thorough evaluation of the importance of features alongside a comparative analysis of predictive models.</div></div>","PeriodicalId":20465,"journal":{"name":"Procedia Computer Science","volume":"252 ","pages":"Pages 480-489"},"PeriodicalIF":0.0000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Procedia Computer Science","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1877050925000079","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
This study proposes a statistical machine learning approach to predict social media usage across various demographic categories in India. The dataset comprises twenty-six features, including demographic attributes (age, gender, education, location), social media engagement metrics (number of followers, posts, time spent on platforms), and device-related information. It reflects real-world social media behavior on platforms such as WhatsApp, Facebook, and Instagram, capturing distinct patterns of weekday and weekend usage. Key variables such as time spent on each platform, the number of Instagram posts and followers, and overall social media usage were analyzed in detail. It is identified that significant predictors of user status categories through feature engineering, including Sabbatical, Self-Employed, Student, and Working Professional. Multiple regression models—Linear Regression, K-Nearest Neighbors, Decision Tree Regression, Random Forest Regression, Gradient Boosting, Naïve Bayes, and Support Vector Regression—were employed to assess their performance in predicting user status. Comparative analysis revealed that the Gradient Boosting algorithm outperformed other models with the highest accuracy. The machine learning workflow encompassed data pre-processing, feature engineering, model training, and evaluation, all implemented using Python. This study significantly advances the field by elucidating the key predictors of social media engagement and providing a thorough evaluation of the importance of features alongside a comparative analysis of predictive models.