Commentary on: Thompson WC. Uncertainty in probabilistic genotyping of low template DNA: A case study comparing STRmix™ and TrueAllele®. J Forensic Sci. 2023;68(3):1049–63

IF 1.5 4区医学 Q2 MEDICINE, LEGAL

Journal of forensic sciences Pub Date : 2024-04-25 DOI:10.1111/1556-4029.15518

Mark W. Perlin PhD, MD, PhD, Nasir Butt PhD, Mark R. Wilson PhD

{"title":"Commentary on: Thompson WC. Uncertainty in probabilistic genotyping of low template DNA: A case study comparing STRmix™ and TrueAllele®. J Forensic Sci. 2023;68(3):1049–63","authors":"Mark W. Perlin PhD, MD, PhD, Nasir Butt PhD, Mark R. Wilson PhD","doi":"10.1111/1556-4029.15518","DOIUrl":null,"url":null,"abstract":"This Letter is a response to “Uncertainty in probabilistic genotyping of low template DNA: A case study comparing STRmix™ and TrueAllele®,” a Journal of Forensic Sciences (JFS) Case Report published online in February 2023 [1].In a California criminal case, a man was accused of drug possession. At the defendant's request, two drug packages were tested for DNA using short tandem repeat (STR) markers. Both items were two-person mixtures that gave similar match statistic results.On one item, Cybergenetics TrueAllele® probabilistic genotyping (PG) software found a strong exclusionary match statistic for the defendant of one over 1.2 million, with a false-negative error rate of one over 222 million. On the same item, ESR's STRmix™ PG program produced a weaker exclusionary match statistic of one over 24.There was no trial. Based on the exculpatory DNA evidence, the prosecutor dropped the more serious DNA-related possession charge and offered a plea agreement. The court accepted the defendant's plea in March 2023.The TrueAllele and STRmix PG software programs qualitatively agreed. Their likelihood ratio (LR) match statistics both supported the hypothesis that the defendant did not contribute his DNA to the drug package evidence. However, the magnitude of the LR match statistics differed between the software programs.This letter briefly explains why the two PG software results differed. As JFS requested, we address some issues raised in the Case Report [1]. A more extensive response [2] to the paper [1] was posted online in May 2023, discussing 20 topics and examining 120 assertions.The two programs were given different amounts of STR input data. TrueAllele is a fully Bayesian system capable of looking at all the (allelic and non-allelic) peak data without relying on laboratory-imposed data thresholds. Most other PG software applies peak height thresholds to limit the amount of input data. Peak heights are measured in relative fluorescent units (rfu).TrueAllele used 210 data peaks across all 21 GlobalFiler™ STR loci, or 10 peaks per locus. At a 40 rfu threshold, the STRmix program saw 24 peaks across 14 loci, or just 1.7 peaks per locus. This 1.7 peak density is insufficient for an informative analysis of a two-person mixture, since at least three or four peaks would be needed. The 88% reduction in STRmix data peaks, relative to TrueAllele input, accounts for the observed LR output differences.We tested STRmix on the STR data at different thresholds, ranging from 0 rfu to 90 rfu, in 10 rfu increments. The weakest STRmix subsource LR value in our sensitivity study was 1 over 3.35 (using 11 peaks at a high 90 rfu threshold), while the strongest LR was 1 over 30.5 million (38 peaks at a low 20 rfu threshold). Less STRmix input data gave less output identification information; more data yielded more information.At a 10 rfu threshold (54 peaks), the STRmix LR of one over 4.8 million was close to TrueAllele's reported one over 1.2 million. Given more data, STRmix got about the same LR results as TrueAllele. The difference in data input explains the difference between the reported TrueAllele and STRmix LR values in this case. The Case Report's “opinions” [3] did not.The Case Report assumed that TrueAllele and STRmix software should produce similar LR answers on the same DNA evidence. With abundant DNA, where thresholds are not an issue, the two programs often agree. But TrueAllele's hierarchical modeling is specifically designed to process low-template DNA data. Different statistical models can lead to different answers.The Case Report compared TrueAllele and STRmix probabilistic genotypes. However, TrueAllele numerically represents contributor genotypes using posterior probability, while STRmix uses likelihood-derived genotype “weights.” Probability and likelihood are different concepts whose numbers cannot be directly compared [4].The Case Report compared TrueAllele and STRmix mixture weights (MW). TrueAllele examined 10 peaks per locus at all 21 STR loci. This is enough STR pattern data for hierarchical MW modeling of a two-person mixture with differential DNA degradation. However, STRmix analyzed just 14 loci, averaging only 1.7 peaks per locus, which is insufficient genotyping data for determining MW. The Case Report looked at only a few nonrepresentative loci showing short STR molecules with little degradation.The Case Report compared TrueAllele and STRmix LR reporting language. TrueAllele separates complex mixture data into probabilistic contributor genotypes, producing LR values that compare single-contributor genotypes [5]. STRmix calculates LR values based on how well a set of genotypes jointly explain unseparated mixture data [6]. The two approaches compute the same LR value [7], each having appropriate reporting language for their calculation method.The Case Report took issue with reporting a “match.” However, the separated single-contributor LR language reports a match probability ratio, not a “match” [2]. Reporting “match” statistics (e.g., random “match” probability) has long been standard in forensic science [8].The Case Report speculated at length on why TrueAllele would give zero probability to two genotype values: locus D1 allele pair 14 14 and D22's 11 17. However, TrueAllele had assigned those allele pairs nonzero probabilities of 0.00022 and 0.00018, respectively.TrueAllele can use more data from low-template DNA than other programs because it hierarchically models baseline noise and PCR variance [5]. This extra modeling obviates the need for peak height thresholds, considering more STR data for deriving more LR information.TrueAllele constructs high-resolution LR distributions [9] for calculating LR error rates. This comprehensive method supports both false-positive rates for inclusionary match statistics, and false-negative rates for exclusionary results [10, 11].The Case Report cited only three TrueAllele validation studies [12-14]. In fact, from 2009 onward, there have been eight peer-reviewed studies, validating TrueAllele interpretation for mixtures containing 2 to 10 unknown contributors [5, 15-18].The Case Report suggested that TrueAllele uses an “ad hoc” LR cutoff. In fact, as presented at AAFS in 2013, the LR floor is based on a validation study of the impact of single or double allele dropout on under-sampled LR values [19].At PCAST's 2016 meeting, Dr. Perlin gave the committee 34 validation studies, including seven peer-reviewed papers [20]. In 14 of these studies, false inclusion error rates (i.e., false incrimination) were specifically addressed.Defendants and victims are entitled to meaningful DNA evidence. With low-level mixtures, more data and more variables can deliver more LR information, whether exculpatory or inculpatory. The JFS Case Report advised crime laboratories to “punt” when they are unable to interpret DNA data using potentially limited software. But, as this case shows, advanced PG software that can use more data lets them “go for the goal” of truth.","PeriodicalId":15743,"journal":{"name":"Journal of forensic sciences","volume":"69 4","pages":"1516-1518"},"PeriodicalIF":1.5000,"publicationDate":"2024-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/1556-4029.15518","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of forensic sciences","FirstCategoryId":"3","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/1556-4029.15518","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MEDICINE, LEGAL","Score":null,"Total":0}

引用次数: 0

Abstract

This Letter is a response to “Uncertainty in probabilistic genotyping of low template DNA: A case study comparing STRmix™ and TrueAllele®,” a Journal of Forensic Sciences (JFS) Case Report published online in February 2023 [1].

In a California criminal case, a man was accused of drug possession. At the defendant's request, two drug packages were tested for DNA using short tandem repeat (STR) markers. Both items were two-person mixtures that gave similar match statistic results.

On one item, Cybergenetics TrueAllele® probabilistic genotyping (PG) software found a strong exclusionary match statistic for the defendant of one over 1.2 million, with a false-negative error rate of one over 222 million. On the same item, ESR's STRmix™ PG program produced a weaker exclusionary match statistic of one over 24.

There was no trial. Based on the exculpatory DNA evidence, the prosecutor dropped the more serious DNA-related possession charge and offered a plea agreement. The court accepted the defendant's plea in March 2023.

The TrueAllele and STRmix PG software programs qualitatively agreed. Their likelihood ratio (LR) match statistics both supported the hypothesis that the defendant did not contribute his DNA to the drug package evidence. However, the magnitude of the LR match statistics differed between the software programs.

This letter briefly explains why the two PG software results differed. As JFS requested, we address some issues raised in the Case Report [1]. A more extensive response [2] to the paper [1] was posted online in May 2023, discussing 20 topics and examining 120 assertions.

The two programs were given different amounts of STR input data. TrueAllele is a fully Bayesian system capable of looking at all the (allelic and non-allelic) peak data without relying on laboratory-imposed data thresholds. Most other PG software applies peak height thresholds to limit the amount of input data. Peak heights are measured in relative fluorescent units (rfu).

TrueAllele used 210 data peaks across all 21 GlobalFiler™ STR loci, or 10 peaks per locus. At a 40 rfu threshold, the STRmix program saw 24 peaks across 14 loci, or just 1.7 peaks per locus. This 1.7 peak density is insufficient for an informative analysis of a two-person mixture, since at least three or four peaks would be needed. The 88% reduction in STRmix data peaks, relative to TrueAllele input, accounts for the observed LR output differences.

We tested STRmix on the STR data at different thresholds, ranging from 0 rfu to 90 rfu, in 10 rfu increments. The weakest STRmix subsource LR value in our sensitivity study was 1 over 3.35 (using 11 peaks at a high 90 rfu threshold), while the strongest LR was 1 over 30.5 million (38 peaks at a low 20 rfu threshold). Less STRmix input data gave less output identification information; more data yielded more information.

At a 10 rfu threshold (54 peaks), the STRmix LR of one over 4.8 million was close to TrueAllele's reported one over 1.2 million. Given more data, STRmix got about the same LR results as TrueAllele. The difference in data input explains the difference between the reported TrueAllele and STRmix LR values in this case. The Case Report's “opinions” [3] did not.

The Case Report assumed that TrueAllele and STRmix software should produce similar LR answers on the same DNA evidence. With abundant DNA, where thresholds are not an issue, the two programs often agree. But TrueAllele's hierarchical modeling is specifically designed to process low-template DNA data. Different statistical models can lead to different answers.

The Case Report compared TrueAllele and STRmix probabilistic genotypes. However, TrueAllele numerically represents contributor genotypes using posterior probability, while STRmix uses likelihood-derived genotype “weights.” Probability and likelihood are different concepts whose numbers cannot be directly compared [4].

The Case Report compared TrueAllele and STRmix mixture weights (MW). TrueAllele examined 10 peaks per locus at all 21 STR loci. This is enough STR pattern data for hierarchical MW modeling of a two-person mixture with differential DNA degradation. However, STRmix analyzed just 14 loci, averaging only 1.7 peaks per locus, which is insufficient genotyping data for determining MW. The Case Report looked at only a few nonrepresentative loci showing short STR molecules with little degradation.

The Case Report compared TrueAllele and STRmix LR reporting language. TrueAllele separates complex mixture data into probabilistic contributor genotypes, producing LR values that compare single-contributor genotypes [5]. STRmix calculates LR values based on how well a set of genotypes jointly explain unseparated mixture data [6]. The two approaches compute the same LR value [7], each having appropriate reporting language for their calculation method.

The Case Report took issue with reporting a “match.” However, the separated single-contributor LR language reports a match probability ratio, not a “match” [2]. Reporting “match” statistics (e.g., random “match” probability) has long been standard in forensic science [8].

The Case Report speculated at length on why TrueAllele would give zero probability to two genotype values: locus D1 allele pair 14 14 and D22's 11 17. However, TrueAllele had assigned those allele pairs nonzero probabilities of 0.00022 and 0.00018, respectively.

TrueAllele can use more data from low-template DNA than other programs because it hierarchically models baseline noise and PCR variance [5]. This extra modeling obviates the need for peak height thresholds, considering more STR data for deriving more LR information.

TrueAllele constructs high-resolution LR distributions [9] for calculating LR error rates. This comprehensive method supports both false-positive rates for inclusionary match statistics, and false-negative rates for exclusionary results [10, 11].

The Case Report cited only three TrueAllele validation studies [12-14]. In fact, from 2009 onward, there have been eight peer-reviewed studies, validating TrueAllele interpretation for mixtures containing 2 to 10 unknown contributors [5, 15-18].

The Case Report suggested that TrueAllele uses an “ad hoc” LR cutoff. In fact, as presented at AAFS in 2013, the LR floor is based on a validation study of the impact of single or double allele dropout on under-sampled LR values [19].

At PCAST's 2016 meeting, Dr. Perlin gave the committee 34 validation studies, including seven peer-reviewed papers [20]. In 14 of these studies, false inclusion error rates (i.e., false incrimination) were specifically addressed.

Defendants and victims are entitled to meaningful DNA evidence. With low-level mixtures, more data and more variables can deliver more LR information, whether exculpatory or inculpatory. The JFS Case Report advised crime laboratories to “punt” when they are unable to interpret DNA data using potentially limited software. But, as this case shows, advanced PG software that can use more data lets them “go for the goal” of truth.

查看原文本刊更多论文

评论： Thompson WC：Thompson WC.低模板DNA概率基因分型的不确定性：比较 STRmix™ 和 TrueAllele® 的案例研究。J Forensic Sci. 2023;68(3):1049-63.

本信是对 "低模板 DNA 概率基因分型中的不确定性：在加利福尼亚州的一起刑事案件中，一名男子被指控持有毒品。在被告的要求下，使用短串联重复（STR）标记对两个毒品包装进行了 DNA 检测。在其中一个项目上，Cybergenetics TrueAllele® 概率基因分型（PG）软件发现被告的排除性匹配统计为 120 万分之一，假阴性错误率为 2.22 亿分之一。在同一项目中，ESR 的 STRmix™ PG 程序得出的排除性匹配统计结果较弱，为 24 分之一。基于可开脱罪责的 DNA 证据，检察官撤销了与 DNA 相关的更严重的藏毒指控，并提出了认罪协议。TrueAllele 和 STRmix PG 软件程序在质量上达成了一致。TrueAllele 和 STRmix PG 软件程序的定性结果一致，它们的似然比 (LR) 匹配统计都支持被告的 DNA 没有参与毒品包裹证据的假设。本信简要解释了两个 PG 软件结果不同的原因。根据 JFS 的要求，我们对案例报告[1]中提出的一些问题进行了回应。2023 年 5 月，我们在网上发布了对论文[1]更广泛的回应[2]，讨论了 20 个主题，审查了 120 项论断。TrueAllele 是一个完全贝叶斯系统，能够查看所有（等位基因和非等位基因）峰数据，而无需依赖实验室设定的数据阈值。大多数其他 PG 软件都采用峰高阈值来限制输入数据量。TrueAllele 在所有 21 个 GlobalFiler™ STR 基因座中使用了 210 个数据峰，即每个基因座 10 个数据峰。在 40 rfu 的阈值下，STRmix 程序在 14 个基因座上看到了 24 个峰，即每个基因座只有 1.7 个峰。这 1.7 个峰值密度不足以对两人混合物进行信息分析，因为至少需要三到四个峰值。相对于 TrueAllele 输入，STRmix 数据峰值减少了 88%，这就是所观察到的 LR 输出差异的原因。我们以不同的阈值对 STR 数据进行了 STRmix 测试，阈值范围从 0 rfu 到 90 rfu，以 10 rfu 为增量。在我们的灵敏度研究中，最弱的 STRmix 子源 LR 值为 1 超过 3.35（在 90 rfu 的高阈值下使用 11 个峰），而最强的 LR 值为 1 超过 3,050 万（在 20 rfu 的低阈值下使用 38 个峰）。在 10 rfu 阈值（54 个峰）下，STRmix 的 LR 为 1 超过 480 万，接近 TrueAlle 报告的 1 超过 120 万。如果输入更多数据，STRmix 的 LR 结果与 TrueAllele 差不多。数据输入的差异解释了本案中报告的 TrueAllele 和 STRmix LR 值之间的差异。案例报告》的 "意见"[3] 并没有这样说。《案例报告》假定 TrueAllele 和 STRmix 软件在相同的 DNA 证据上应该得出相似的 LR 答案。在DNA含量丰富、阈值不是问题的情况下，两个软件的结果往往一致。但 TrueAllele 的分层建模是专门为处理低模板 DNA 数据而设计的。不同的统计模型会得出不同的答案。案例报告比较了 TrueAllele 和 STRmix 的概率基因型。然而，TrueAllele 使用后验概率数值化表示贡献者基因型，而 STRmix 则使用似然基因型 "权重"。概率和可能性是不同的概念，其数字不能直接进行比较[4]。TrueAllele 检测了所有 21 个 STR 位点上每个位点的 10 个峰。这些 STR 模式数据足以对具有不同 DNA 降解的两人混合物进行分层 MW 建模。然而，STRmix 只分析了 14 个位点，平均每个位点只有 1.7 个峰，这对于确定 MW 的基因分型数据来说是不够的。案例报告只研究了几个无代表性的位点，这些位点显示的 STR 分子很短，几乎没有降解。案例报告比较了 TrueAllele 和 STRmix LR 报告语言。TrueAllele 将复杂的混合物数据分离成概率贡献者基因型，产生比较单一贡献者基因型的 LR 值[5]。STRmix 根据一组基因型对未分离混合物数据的联合解释程度计算 LR 值[6]。这两种方法计算出的 LR 值相同[7]，每种方法的计算方法都有适当的报告语言。然而，分离的单一贡献者 LR 语言报告的是匹配概率比，而不是 "匹配"[2]。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of forensic sciences 医学-医学：法

CiteScore

4.00

自引率

12.50%

发文量

215

审稿时长

2 months

期刊介绍： The Journal of Forensic Sciences (JFS) is the official publication of the American Academy of Forensic Sciences (AAFS). It is devoted to the publication of original investigations, observations, scholarly inquiries and reviews in various branches of the forensic sciences. These include anthropology, criminalistics, digital and multimedia sciences, engineering and applied sciences, pathology/biology, psychiatry and behavioral science, jurisprudence, odontology, questioned documents, and toxicology. Similar submissions dealing with forensic aspects of other sciences and the social sciences are also accepted, as are submissions dealing with scientifically sound emerging science disciplines. The content and/or views expressed in the JFS are not necessarily those of the AAFS, the JFS Editorial Board, the organizations with which authors are affiliated, or the publisher of JFS. All manuscript submissions are double-blind peer-reviewed.