Siam Knecht, Paolo Morandini, Lucie Biehler-Gomez, Yann Ardagna, Marie Perrin, Cristina Cattaneo, Christophe Roman, Pascal Adalian
{"title":"Interpretable machine learning for individualized sex estimation from long bones.","authors":"Siam Knecht, Paolo Morandini, Lucie Biehler-Gomez, Yann Ardagna, Marie Perrin, Cristina Cattaneo, Christophe Roman, Pascal Adalian","doi":"10.1007/s00414-025-03635-7","DOIUrl":null,"url":null,"abstract":"<p><p>Sex estimation is an essential task in forensic anthropology. It is not only crucial for the identification of individuals from skeletal remains, but it is also essential for improving the reliability of other methods of biological profile estimation, such as age and stature, some of which perform better when sex is taken into account. This study investigates the application of machine learning (ML) techniques to sex estimation, with a particular focus on interpretability to address the \"black box\" challenge inherent in AI models. Using a diverse dataset of long bone measurements from 2,969 individuals, 12 different ML algorithms were evaluated. Missing data were handled using iterative regression imputation, though challenges arising from incomplete datasets underscored the need for improved data handling strategies. Linear Discriminant Analysis (LDA) emerged as the most accurate approach, achieving 95.2% accuracy. A key feature of this study is the integration of SHapley Additive exPlanations (SHAP) values, which provide individualized insights into the factors influencing each prediction. This interpretability framework ensures transparency and addresses legal and scientific concerns about the admissibility of AI-generated evidence in court. Indeed, misclassifications possibilities highlight the importance of clear, understandable models in forensic applications. The study emphasizes the significance of individualized prediction, illustrated by the probability of male or female classification for each individual, as well as the impact of missing values on prediction accuracy. This research demonstrates that ML models can effectively balance accuracy with interpretability, offering personalized, actionable insights for forensic investigations. It paves the way for AI-driven methods that meet both scientific rigor and legal standards, transforming sex estimation in forensic science by providing individualized, defensible evidence suitable for court.</p>","PeriodicalId":14071,"journal":{"name":"International Journal of Legal Medicine","volume":" ","pages":""},"PeriodicalIF":2.3000,"publicationDate":"2025-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Legal Medicine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1007/s00414-025-03635-7","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MEDICINE, LEGAL","Score":null,"Total":0}
引用次数: 0
Abstract
Sex estimation is an essential task in forensic anthropology. It is not only crucial for the identification of individuals from skeletal remains, but it is also essential for improving the reliability of other methods of biological profile estimation, such as age and stature, some of which perform better when sex is taken into account. This study investigates the application of machine learning (ML) techniques to sex estimation, with a particular focus on interpretability to address the "black box" challenge inherent in AI models. Using a diverse dataset of long bone measurements from 2,969 individuals, 12 different ML algorithms were evaluated. Missing data were handled using iterative regression imputation, though challenges arising from incomplete datasets underscored the need for improved data handling strategies. Linear Discriminant Analysis (LDA) emerged as the most accurate approach, achieving 95.2% accuracy. A key feature of this study is the integration of SHapley Additive exPlanations (SHAP) values, which provide individualized insights into the factors influencing each prediction. This interpretability framework ensures transparency and addresses legal and scientific concerns about the admissibility of AI-generated evidence in court. Indeed, misclassifications possibilities highlight the importance of clear, understandable models in forensic applications. The study emphasizes the significance of individualized prediction, illustrated by the probability of male or female classification for each individual, as well as the impact of missing values on prediction accuracy. This research demonstrates that ML models can effectively balance accuracy with interpretability, offering personalized, actionable insights for forensic investigations. It paves the way for AI-driven methods that meet both scientific rigor and legal standards, transforming sex estimation in forensic science by providing individualized, defensible evidence suitable for court.
期刊介绍:
The International Journal of Legal Medicine aims to improve the scientific resources used in the elucidation of crime and related forensic applications at a high level of evidential proof. The journal offers review articles tracing development in specific areas, with up-to-date analysis; original articles discussing significant recent research results; case reports describing interesting and exceptional examples; population data; letters to the editors; and technical notes, which appear in a section originally created for rapid publication of data in the dynamic field of DNA analysis.