Understanding the development, performance, fairness, and transparency of machine learning models used in child protection prediction: A systematic review.

IF 3.4 2区心理学 Q1 FAMILY STUDIES

Child Abuse & Neglect Pub Date : 2025-08-07 DOI:10.1016/j.chiabu.2025.107630

Claudia Bull, Steve Kisely, Kim Betts, Yanan Hu

{"title":"Understanding the development, performance, fairness, and transparency of machine learning models used in child protection prediction: A systematic review.","authors":"Claudia Bull, Steve Kisely, Kim Betts, Yanan Hu","doi":"10.1016/j.chiabu.2025.107630","DOIUrl":null,"url":null,"abstract":"Objective: To understand the development and validation of contemporary machine learning (ML) models for child protection prediction, their performance evaluation, integration of fairness, and operationalisation of model explainability and transparency.Methods: This systematic review followed the Preferred Reporting Items for Systematic reviews and Meta-Analyses (PRISMA) guidelines. Model transparency was assessed against the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis + Artificial Intelligence (TRIPOD+AI) criteria, while study risk of bias and model applicability were evaluated using Prediction model Risk Of Bias ASsessment Tool (PROBAST) criteria.Results: Eleven studies were identified, employing various ML approaches such as supervised classification models (e.g., binary classification, decision trees, support vector machines), regression models, and ensemble methods. These models utilised administrative health, child welfare, and criminal/court data. Performance was evaluated using a range of discrimination, classification, and calibration metrics, yielding variable results. Only four models incorporated group fairness, focusing on race/ethnicity as the protected attribute. Explainability and transparency were enhanced through Receiver Operating Curves, Precision-Recall Curves, feature importance plots, and SHapley Additive exPlanations (SHAP) plots. According to TRIPOD+AI criteria, only four studies reported likely reproducible models. Based on PROBAST criteria, all studies had unclear or high risk of bias.Conclusions: This is the first review to use TRIPOD+AI and PROBAST criteria to assess the risk of bias and transparency of ML models in child protection prediction. The findings reveal that the field remains methodologically immature, with many models lacking fair, transparent, and reproducible methods. Adoption of advanced fairness techniques (beyond fairness-through-unawareness), stakeholder involvement in model development and validation, and transparency through data and code sharing will be essential for the ethical and effective design of ML models, ultimately improving decision-making processes and outcomes for vulnerable children and families.","PeriodicalId":51343,"journal":{"name":"Child Abuse & Neglect","volume":"169 Pt 1","pages":"107630"},"PeriodicalIF":3.4000,"publicationDate":"2025-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Child Abuse & Neglect","FirstCategoryId":"102","ListUrlMain":"https://doi.org/10.1016/j.chiabu.2025.107630","RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"FAMILY STUDIES","Score":null,"Total":0}

引用次数: 0

Abstract

Objective: To understand the development and validation of contemporary machine learning (ML) models for child protection prediction, their performance evaluation, integration of fairness, and operationalisation of model explainability and transparency.

Methods: This systematic review followed the Preferred Reporting Items for Systematic reviews and Meta-Analyses (PRISMA) guidelines. Model transparency was assessed against the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis + Artificial Intelligence (TRIPOD+AI) criteria, while study risk of bias and model applicability were evaluated using Prediction model Risk Of Bias ASsessment Tool (PROBAST) criteria.

Results: Eleven studies were identified, employing various ML approaches such as supervised classification models (e.g., binary classification, decision trees, support vector machines), regression models, and ensemble methods. These models utilised administrative health, child welfare, and criminal/court data. Performance was evaluated using a range of discrimination, classification, and calibration metrics, yielding variable results. Only four models incorporated group fairness, focusing on race/ethnicity as the protected attribute. Explainability and transparency were enhanced through Receiver Operating Curves, Precision-Recall Curves, feature importance plots, and SHapley Additive exPlanations (SHAP) plots. According to TRIPOD+AI criteria, only four studies reported likely reproducible models. Based on PROBAST criteria, all studies had unclear or high risk of bias.

Conclusions: This is the first review to use TRIPOD+AI and PROBAST criteria to assess the risk of bias and transparency of ML models in child protection prediction. The findings reveal that the field remains methodologically immature, with many models lacking fair, transparent, and reproducible methods. Adoption of advanced fairness techniques (beyond fairness-through-unawareness), stakeholder involvement in model development and validation, and transparency through data and code sharing will be essential for the ethical and effective design of ML models, ultimately improving decision-making processes and outcomes for vulnerable children and families.

查看原文本刊更多论文

理解用于儿童保护预测的机器学习模型的发展、性能、公平性和透明度：系统综述。

目的：了解用于儿童保护预测的当代机器学习（ML）模型的发展和验证、其绩效评估、公平性的整合以及模型可解释性和透明度的可操作性。方法：本系统评价遵循系统评价和荟萃分析的首选报告项目（PRISMA）指南。模型透明度根据个体预后或诊断多变量预测模型透明报告+人工智能（TRIPOD+AI）标准进行评估，研究偏倚风险和模型适用性使用预测模型偏倚风险评估工具（PROBAST）标准进行评估。结果：确定了11项研究，采用了各种机器学习方法，如监督分类模型（例如，二分类、决策树、支持向量机）、回归模型和集成方法。这些模型利用了行政卫生、儿童福利和刑事/法庭数据。使用一系列区分、分类和校准指标对性能进行评估，产生不同的结果。只有四种模型纳入了群体公平，将种族/民族作为受保护的属性。可解释性和透明度通过受试者工作曲线、精确查全率曲线、特征重要性图和SHapley加性解释（SHAP）图得到增强。根据TRIPOD+AI的标准，只有四项研究报告了可能可重复的模型。基于PROBAST标准，所有研究均存在不明确或高偏倚风险。结论：这是首次使用TRIPOD+AI和PROBAST标准评估ML模型在儿童保护预测中的偏倚风险和透明度的综述。研究结果表明，该领域在方法上仍然不成熟，许多模型缺乏公平、透明和可重复的方法。采用先进的公平技术（超越无意识的公平），利益相关者参与模型开发和验证，以及通过数据和代码共享实现透明度，对于ML模型的道德和有效设计至关重要，最终改善弱势儿童和家庭的决策过程和结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Child Abuse & Neglect Multiple-

CiteScore

7.40

自引率

10.40%

发文量

397

期刊介绍： Official Publication of the International Society for Prevention of Child Abuse and Neglect. Child Abuse & Neglect The International Journal, provides an international, multidisciplinary forum on all aspects of child abuse and neglect, with special emphasis on prevention and treatment; the scope extends further to all those aspects of life which either favor or hinder child development. While contributions will primarily be from the fields of psychology, psychiatry, social work, medicine, nursing, law enforcement, legislature, education, and anthropology, the Journal encourages the concerned lay individual and child-oriented advocate organizations to contribute.