A fairness scale for real-time recidivism forecasts using a national database of convicted offenders.

IF 4.5 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neural Computing & Applications Pub Date : 2025-01-01 Epub Date: 2025-08-01 DOI:10.1007/s00521-025-11478-x

Jacob Verrey, Peter Neyroud, Lawrence Sherman, Barak Ariel

{"title":"A fairness scale for real-time recidivism forecasts using a national database of convicted offenders.","authors":"Jacob Verrey, Peter Neyroud, Lawrence Sherman, Barak Ariel","doi":"10.1007/s00521-025-11478-x","DOIUrl":null,"url":null,"abstract":"This investigation explores whether machine learning can predict recidivism while addressing societal biases. To investigate this, we obtained conviction data from the UK's Police National Computer (PNC) on 346,685 records between January 1, 2000, and February 3, 2006 (His Majesty's Inspectorate of Constabulary in Use of the Police National Computer: An inspection of the ACRO Criminal Records Office. His Majesty's Inspectorate of Constabulary, Birmingham, https://assets-hmicfrs.justiceinspectorates.gov.uk/uploads/police-national-computer-use-acro-criminal-records-office.pdf, 2017). We generate twelve machine learning models-six to forecast general recidivism, and six to forecast violent recidivism-over a 3-year period, evaluated via fivefold cross-validation. Our best-performing models outperform the existing state-of-the-arts, receiving an area under curve (AUC) score of 0.8660 and 0.8375 for general and violent recidivism, respectively. Next, we construct a fairness scale that communicates the semantic and technical trade-offs associated with debiasing a criminal justice forecasting model. We use this scale to debias our best-performing models. Results indicate both models can achieve all five fairness definitions because the metrics measuring these definitions-the statistical range of recall, precision, positive rate, and error balance between demographics-indicate that these scores are within a one percentage point difference of each other. Deployment recommendations and implications are discussed. These include recommended safeguards against false positives, an explication of how these models addressed societal biases, and a case study illustrating how these models can improve existing criminal justice practices. That is, these models may help police identify fewer people in a way less impacted by structural bias while still reducing crime. A randomized control trial is proposed to test this illustrated case study, and further directions explored.Supplementary information: The online version contains supplementary material available at 10.1007/s00521-025-11478-x.","PeriodicalId":49766,"journal":{"name":"Neural Computing & Applications","volume":"37 26","pages":"21607-21657"},"PeriodicalIF":4.5000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12401775/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neural Computing & Applications","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s00521-025-11478-x","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/8/1 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

This investigation explores whether machine learning can predict recidivism while addressing societal biases. To investigate this, we obtained conviction data from the UK's Police National Computer (PNC) on 346,685 records between January 1, 2000, and February 3, 2006 (His Majesty's Inspectorate of Constabulary in Use of the Police National Computer: An inspection of the ACRO Criminal Records Office. His Majesty's Inspectorate of Constabulary, Birmingham, https://assets-hmicfrs.justiceinspectorates.gov.uk/uploads/police-national-computer-use-acro-criminal-records-office.pdf, 2017). We generate twelve machine learning models-six to forecast general recidivism, and six to forecast violent recidivism-over a 3-year period, evaluated via fivefold cross-validation. Our best-performing models outperform the existing state-of-the-arts, receiving an area under curve (AUC) score of 0.8660 and 0.8375 for general and violent recidivism, respectively. Next, we construct a fairness scale that communicates the semantic and technical trade-offs associated with debiasing a criminal justice forecasting model. We use this scale to debias our best-performing models. Results indicate both models can achieve all five fairness definitions because the metrics measuring these definitions-the statistical range of recall, precision, positive rate, and error balance between demographics-indicate that these scores are within a one percentage point difference of each other. Deployment recommendations and implications are discussed. These include recommended safeguards against false positives, an explication of how these models addressed societal biases, and a case study illustrating how these models can improve existing criminal justice practices. That is, these models may help police identify fewer people in a way less impacted by structural bias while still reducing crime. A randomized control trial is proposed to test this illustrated case study, and further directions explored.

Supplementary information: The online version contains supplementary material available at 10.1007/s00521-025-11478-x.

Abstract Image

查看原文本刊更多论文

一个利用国家罪犯数据库实时预测累犯的公平尺度。

这项调查探讨了机器学习是否可以在解决社会偏见的同时预测再犯。为了对此进行调查，我们从英国警察国家计算机（PNC）中获得了2000年1月1日至2006年2月3日期间346,685条记录的定罪数据（《使用警察国家计算机的警察陛下检查：对犯罪记录办公室的检查》）。英国皇家警察监察局，伯明翰，https://assets-hmicfrs.justiceinspectorates.gov.uk/uploads/police-national-computer-use-acro-criminal-records-office.pdf, 2017)。我们生成了12个机器学习模型——6个用于预测一般累犯，6个用于预测暴力累犯——在3年的时间里，通过五倍交叉验证进行评估。我们的最佳表现模型优于现有的最先进的技术，对于一般和暴力累犯的曲线下面积（AUC）得分分别为0.8660和0.8375。接下来，我们构建了一个公平量表，该量表传达了与消除刑事司法预测模型偏见相关的语义和技术权衡。我们使用这个尺度来筛选表现最好的模型。结果表明，这两种模型都可以实现所有五个公平定义，因为衡量这些定义的指标——召回率、精确度、正确率和人口学之间的错误平衡的统计范围——表明这些分数彼此之间的差异在一个百分点以内。讨论了部署建议和含义。这些建议包括防止误报的建议措施，对这些模型如何解决社会偏见的解释，以及一个案例研究，说明这些模型如何改善现有的刑事司法实践。也就是说，这些模型可以帮助警察以一种较少受结构性偏见影响的方式识别更少的人，同时仍然减少犯罪。提出了一项随机对照试验来验证这一案例研究，并探索了进一步的方向。补充信息：在线版本包含补充资料，下载地址为10.1007/s00521-025-11478-x。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Neural Computing & Applications 工程技术-计算机：人工智能

CiteScore

11.40

自引率

8.30%

发文量

1280

审稿时长

6.9 months

期刊介绍： Neural Computing & Applications is an international journal which publishes original research and other information in the field of practical applications of neural computing and related techniques such as genetic algorithms, fuzzy logic and neuro-fuzzy systems. All items relevant to building practical systems are within its scope, including but not limited to: -adaptive computing- algorithms- applicable neural networks theory- applied statistics- architectures- artificial intelligence- benchmarks- case histories of innovative applications- fuzzy logic- genetic algorithms- hardware implementations- hybrid intelligent systems- intelligent agents- intelligent control systems- intelligent diagnostics- intelligent forecasting- machine learning- neural networks- neuro-fuzzy systems- pattern recognition- performance measures- self-learning systems- software simulations- supervised and unsupervised learning methods- system engineering and integration. Featured contributions fall into several categories: Original Articles, Review Articles, Book Reviews and Announcements.