Salience, legitimacy, and credibility of two end-to-end forensic single cell probabilistic systems

IF 3.1 2区 医学 Q2 GENETICS & HEREDITY
Qhawe A. Bhembe , Desmond S. Lun , Ken R. Duffy , Catherine M. Grgicak
{"title":"Salience, legitimacy, and credibility of two end-to-end forensic single cell probabilistic systems","authors":"Qhawe A. Bhembe ,&nbsp;Desmond S. Lun ,&nbsp;Ken R. Duffy ,&nbsp;Catherine M. Grgicak","doi":"10.1016/j.fsigen.2025.103369","DOIUrl":null,"url":null,"abstract":"<div><div>In cases for which there is no suspect, national forensic databases provide a mechanism by which to generate investigatory leads. National forensic DNA databases, however, have restrictions on what data to load. For example, uploading inferred alleles from DNA data that is a mixture of more than two contributors may be disallowed, leading to unresolved cases. A single-cell strategy has the potential to overcome the mixture gap by isolating each cell at the front-end of the pipeline. Once DNA signatures from each cell are obtained, they are clustered into groups. This is followed by asserting the probability we observe the data in the cluster had a person carrying genotype <em>g</em> donated. On applying Bayes’ Rule, we obtain the probability of a genotype given the data in a cluster and model. If this probability is near one, it means that only one genotype reasonably explains the data and this genotype can be used in a national database query. Good clustering, therefore, is an invaluable step in single-cell forensic interpretation and it is for this reason we examine the fortitude of two clustering approaches – i.e., model-based clustering (MBC) and forensic-aware clustering (FAC) – within an end-to-end single-cell predictor named EESCIt™. Using proper scoring rules, we report the performance of our probabilistic single-cell evaluator and structure the analytics into categories of <em>Salience</em>, <em>Legitimacy</em> and <em>Credibility</em> (SLC). With <em>Salience</em> referring to the applicability of a technology to meet an actor’s needs, we begin by discussing the relevance of single cell reports to forensic actors. Regarding <em>Legitimacy</em>, we determined the proportion of admixtures giving correct and incorrect cluster numbers and found that FAC returned correct cluster numbers for all admixtures tested. With improved clustering, 90 % of the loci returned only one credible genotype and it was the correct one, which improves on MBC’s 84 %. We then examined the Brier Score and decomposed it into calibration and refinement. We show that the FAC-centered system returned better calibration scores than the MBC one, which was driven by its improved clustering performance. Regarding <em>Credibility</em>, we found that the FAC-based system also returned better refinement scores. With FAC being more <em>Legitimate</em> and <em>Credible</em> than an MBC system for single-cell forensics, we adopt it into EESCIt™, therein creating the first end-to-end single-cell probabilistic system able to address single-cell queries about <em>how many</em> donors there were, and <em>who</em> they were.</div></div>","PeriodicalId":50435,"journal":{"name":"Forensic Science International-Genetics","volume":"81 ","pages":"Article 103369"},"PeriodicalIF":3.1000,"publicationDate":"2025-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Forensic Science International-Genetics","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1872497325001498","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}
引用次数: 0

Abstract

In cases for which there is no suspect, national forensic databases provide a mechanism by which to generate investigatory leads. National forensic DNA databases, however, have restrictions on what data to load. For example, uploading inferred alleles from DNA data that is a mixture of more than two contributors may be disallowed, leading to unresolved cases. A single-cell strategy has the potential to overcome the mixture gap by isolating each cell at the front-end of the pipeline. Once DNA signatures from each cell are obtained, they are clustered into groups. This is followed by asserting the probability we observe the data in the cluster had a person carrying genotype g donated. On applying Bayes’ Rule, we obtain the probability of a genotype given the data in a cluster and model. If this probability is near one, it means that only one genotype reasonably explains the data and this genotype can be used in a national database query. Good clustering, therefore, is an invaluable step in single-cell forensic interpretation and it is for this reason we examine the fortitude of two clustering approaches – i.e., model-based clustering (MBC) and forensic-aware clustering (FAC) – within an end-to-end single-cell predictor named EESCIt™. Using proper scoring rules, we report the performance of our probabilistic single-cell evaluator and structure the analytics into categories of Salience, Legitimacy and Credibility (SLC). With Salience referring to the applicability of a technology to meet an actor’s needs, we begin by discussing the relevance of single cell reports to forensic actors. Regarding Legitimacy, we determined the proportion of admixtures giving correct and incorrect cluster numbers and found that FAC returned correct cluster numbers for all admixtures tested. With improved clustering, 90 % of the loci returned only one credible genotype and it was the correct one, which improves on MBC’s 84 %. We then examined the Brier Score and decomposed it into calibration and refinement. We show that the FAC-centered system returned better calibration scores than the MBC one, which was driven by its improved clustering performance. Regarding Credibility, we found that the FAC-based system also returned better refinement scores. With FAC being more Legitimate and Credible than an MBC system for single-cell forensics, we adopt it into EESCIt™, therein creating the first end-to-end single-cell probabilistic system able to address single-cell queries about how many donors there were, and who they were.
显著性,合法性,和可信性的两个端到端法医单细胞概率系统
在没有嫌疑犯的案件中,国家法医数据库提供了一种产生调查线索的机制。然而,国家法医DNA数据库对加载的数据有限制。例如,上传从两个以上贡献者混合的DNA数据推断的等位基因可能是不允许的,这会导致未解决的案件。单单元策略通过隔离管道前端的每个单元,有可能克服混合间隙。一旦获得每个细胞的DNA特征,它们就会被聚集成组。接下来是断言我们在集群中观察到的数据中有一个携带基因型g的人捐赠的概率。在贝叶斯规则的基础上,给出了给定聚类和模型数据的基因型概率。如果这个概率接近于1,则意味着只有一种基因型可以合理地解释数据,并且该基因型可以用于国家数据库查询。因此,良好的聚类是单细胞法医解释的宝贵步骤,正是出于这个原因,我们在名为EESCIt™的端到端单细胞预测器中检查了两种聚类方法的坚定性-即基于模型的聚类(MBC)和法医感知聚类(FAC)。使用适当的评分规则,我们报告了我们的概率单细胞评估器的性能,并将分析结构分为显著性、合法性和可信度(SLC)三类。由于Salience指的是满足参与者需求的技术的适用性,我们首先讨论单细胞报告与法医参与者的相关性。关于合法性,我们确定了给出正确和不正确簇数的外加剂的比例,并发现FAC为所有测试的外加剂返回正确的簇数。通过改进聚类,90% %的位点只返回一个可信的基因型,并且是正确的基因型,比MBC的84% %有所提高。然后,我们检查Brier评分并将其分解为校准和细化。我们发现以facc为中心的系统比MBC系统返回更好的校准分数,这是由其改进的聚类性能驱动的。关于可信度,我们发现基于facc的系统也返回了更好的细化分数。由于FAC在单细胞取证方面比MBC系统更加合法和可信,我们将其应用到EESCIt™中,从而创建了第一个端到端单细胞概率系统,能够处理单细胞查询,例如有多少捐赠者以及他们是谁。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
7.50
自引率
32.30%
发文量
132
审稿时长
11.3 weeks
期刊介绍: Forensic Science International: Genetics is the premier journal in the field of Forensic Genetics. This branch of Forensic Science can be defined as the application of genetics to human and non-human material (in the sense of a science with the purpose of studying inherited characteristics for the analysis of inter- and intra-specific variations in populations) for the resolution of legal conflicts. The scope of the journal includes: Forensic applications of human polymorphism. Testing of paternity and other family relationships, immigration cases, typing of biological stains and tissues from criminal casework, identification of human remains by DNA testing methodologies. Description of human polymorphisms of forensic interest, with special interest in DNA polymorphisms. Autosomal DNA polymorphisms, mini- and microsatellites (or short tandem repeats, STRs), single nucleotide polymorphisms (SNPs), X and Y chromosome polymorphisms, mtDNA polymorphisms, and any other type of DNA variation with potential forensic applications. Non-human DNA polymorphisms for crime scene investigation. Population genetics of human polymorphisms of forensic interest. Population data, especially from DNA polymorphisms of interest for the solution of forensic problems. DNA typing methodologies and strategies. Biostatistical methods in forensic genetics. Evaluation of DNA evidence in forensic problems (such as paternity or immigration cases, criminal casework, identification), classical and new statistical approaches. Standards in forensic genetics. Recommendations of regulatory bodies concerning methods, markers, interpretation or strategies or proposals for procedural or technical standards. Quality control. Quality control and quality assurance strategies, proficiency testing for DNA typing methodologies. Criminal DNA databases. Technical, legal and statistical issues. General ethical and legal issues related to forensic genetics.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信