Di Mu, Simai Zhang, Ting Zhu, Yong Zhou, Wei Zhang
{"title":"Prediction of Recidivism and Detection of Risk Factors Under Different Time Windows Using Machine Learning Techniques","authors":"Di Mu, Simai Zhang, Ting Zhu, Yong Zhou, Wei Zhang","doi":"10.1177/08944393241226607","DOIUrl":null,"url":null,"abstract":"Following a comprehensive analysis of the initial three generations of prisoner risk assessment tools, the field has observed a notable prominence in the integration of fourth-generation tools and machine learning techniques. However, limited efforts have been made to address the explainability of data-driven prediction models and their connection with treatment recommendations. Our primary objective was to develop predictive models for assessing the likelihood of recidivism among prisoners released from their index incarceration within 1-year, 2-year, and 5-year timeframes. We aimed to enhance interpretability using SHapley Additive exPlanations (SHAP). We collected data from 20,457 in-prison records from February 10, 2005, to August 25, 2021, sourced from a Southwestern China prison’s data management system. Recidivism records were officially determined through data mining from an official website and combined identification data from neighboring prisons. We employed five machine learning algorithms, considering sociodemographic, physical health, psychological assessments, criminological characteristics, crime history, social support, and in-prison behaviors as factors. For interpretability, SHAP was applied to reveal feature contributions. Findings indicated that young prisoners accused of larceny, previous convictions, lower fines, and limited family support faced higher reoffending risk. Conversely, middle-aged and senior prisoners with no prior convictions, lower monthly supermarket expenses, and positive psychological test results had lower reoffending risk. We also explored interactions between significant predictive features, such as prisoner age at incarceration initiation and primary accusation, and the duration of current incarceration and cumulative prior incarcerations. Notably, our models consistently exhibited high performance, as shown by AUC on the test dataset across time windows. Interpretability results provided insights into evolving risk factors over time, valuable for intervention with high-risk individuals. These insights, with additional validation, could offer dynamic prisoner information for stakeholders. Moreover, interpretability results can be seamlessly integrated into prison and court management systems as a valuable risk assessment tool.","PeriodicalId":506768,"journal":{"name":"Social Science Computer Review","volume":"13 8","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Social Science Computer Review","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1177/08944393241226607","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Following a comprehensive analysis of the initial three generations of prisoner risk assessment tools, the field has observed a notable prominence in the integration of fourth-generation tools and machine learning techniques. However, limited efforts have been made to address the explainability of data-driven prediction models and their connection with treatment recommendations. Our primary objective was to develop predictive models for assessing the likelihood of recidivism among prisoners released from their index incarceration within 1-year, 2-year, and 5-year timeframes. We aimed to enhance interpretability using SHapley Additive exPlanations (SHAP). We collected data from 20,457 in-prison records from February 10, 2005, to August 25, 2021, sourced from a Southwestern China prison’s data management system. Recidivism records were officially determined through data mining from an official website and combined identification data from neighboring prisons. We employed five machine learning algorithms, considering sociodemographic, physical health, psychological assessments, criminological characteristics, crime history, social support, and in-prison behaviors as factors. For interpretability, SHAP was applied to reveal feature contributions. Findings indicated that young prisoners accused of larceny, previous convictions, lower fines, and limited family support faced higher reoffending risk. Conversely, middle-aged and senior prisoners with no prior convictions, lower monthly supermarket expenses, and positive psychological test results had lower reoffending risk. We also explored interactions between significant predictive features, such as prisoner age at incarceration initiation and primary accusation, and the duration of current incarceration and cumulative prior incarcerations. Notably, our models consistently exhibited high performance, as shown by AUC on the test dataset across time windows. Interpretability results provided insights into evolving risk factors over time, valuable for intervention with high-risk individuals. These insights, with additional validation, could offer dynamic prisoner information for stakeholders. Moreover, interpretability results can be seamlessly integrated into prison and court management systems as a valuable risk assessment tool.