Rejoice Chitengu, Silas Formunyuy Verkijika, Kelibone Eva Mamabolo
{"title":"Forecasting Civil Unrest in South Africa Using Social Media Data: A Hybrid Machine Learning Approach","authors":"Rejoice Chitengu, Silas Formunyuy Verkijika, Kelibone Eva Mamabolo","doi":"10.1177/08944393251349542","DOIUrl":null,"url":null,"abstract":"Civil unrest, encompassing protests and riots, is an increasing global concern, with incidents rising at an alarming rate, a trend that has been observed in South Africa over the years. This issue is particularly pronounced in today’s social media era, where platforms like ‘X’ (formerly Twitter) serve as powerful tools for mobilization. This raises the question: What factors drive civil unrest, and how can machine learning, using social media data, be employed to forecast such events? In response, this study had as objective to develop a hybrid machine learning model to forecast protest and riot events in South Africa using Twitter data. Employing the CRISP-DM methodology, data was collected from Twitter for the period between 2019 and 2024, resulting in 18,487 curated tweets, with associated ground truth data extracted from the ACLED database. Using this data, a hybrid model combining Bidirectional LSTM (Bi-LSTM) networks with eXtreme Gradient Boosting (XGBoost) for classification and regression tasks was developed to forecast civil unrest in South Africa. Additionally, SHapley Additive exPlanations (SHAP) were used for model explainability. The proposed model outperformed the base model, achieving an R-squared value of 33% for protests and 23% for riots in regression, along with classification accuracies of 92% for protests and 86.2% for riots. SHAP results indicated that the key predictors of unrest included sentiment-related features, tweet engagement features, regional factors, the day of the week, public holidays, and the topics being discussed. This study demonstrates the value of a hybrid model in forecasting civil unrest events and identifies key features that stakeholders can use to target their efforts more precisely in addressing civil unrest, ensuring resources are allocated where they are needed most. The study concludes with a discussion of valuable insights for stakeholders on how to leverage social media data to predict and mitigate civil unrest.","PeriodicalId":49509,"journal":{"name":"Social Science Computer Review","volume":"60 1","pages":""},"PeriodicalIF":3.0000,"publicationDate":"2025-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Social Science Computer Review","FirstCategoryId":"90","ListUrlMain":"https://doi.org/10.1177/08944393251349542","RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0
Abstract
Civil unrest, encompassing protests and riots, is an increasing global concern, with incidents rising at an alarming rate, a trend that has been observed in South Africa over the years. This issue is particularly pronounced in today’s social media era, where platforms like ‘X’ (formerly Twitter) serve as powerful tools for mobilization. This raises the question: What factors drive civil unrest, and how can machine learning, using social media data, be employed to forecast such events? In response, this study had as objective to develop a hybrid machine learning model to forecast protest and riot events in South Africa using Twitter data. Employing the CRISP-DM methodology, data was collected from Twitter for the period between 2019 and 2024, resulting in 18,487 curated tweets, with associated ground truth data extracted from the ACLED database. Using this data, a hybrid model combining Bidirectional LSTM (Bi-LSTM) networks with eXtreme Gradient Boosting (XGBoost) for classification and regression tasks was developed to forecast civil unrest in South Africa. Additionally, SHapley Additive exPlanations (SHAP) were used for model explainability. The proposed model outperformed the base model, achieving an R-squared value of 33% for protests and 23% for riots in regression, along with classification accuracies of 92% for protests and 86.2% for riots. SHAP results indicated that the key predictors of unrest included sentiment-related features, tweet engagement features, regional factors, the day of the week, public holidays, and the topics being discussed. This study demonstrates the value of a hybrid model in forecasting civil unrest events and identifies key features that stakeholders can use to target their efforts more precisely in addressing civil unrest, ensuring resources are allocated where they are needed most. The study concludes with a discussion of valuable insights for stakeholders on how to leverage social media data to predict and mitigate civil unrest.
期刊介绍:
Unique Scope Social Science Computer Review is an interdisciplinary journal covering social science instructional and research applications of computing, as well as societal impacts of informational technology. Topics included: artificial intelligence, business, computational social science theory, computer-assisted survey research, computer-based qualitative analysis, computer simulation, economic modeling, electronic modeling, electronic publishing, geographic information systems, instrumentation and research tools, public administration, social impacts of computing and telecommunications, software evaluation, world-wide web resources for social scientists. Interdisciplinary Nature Because the Uses and impacts of computing are interdisciplinary, so is Social Science Computer Review. The journal is of direct relevance to scholars and scientists in a wide variety of disciplines. In its pages you''ll find work in the following areas: sociology, anthropology, political science, economics, psychology, computer literacy, computer applications, and methodology.