Commentary on: Thompson WC. Uncertainty in probabilistic genotyping of low template DNA: A case study comparing STRmix™ and TrueAllele®. J Forensic Sci. 2023;68(3):1049–63
Mark W. Perlin PhD, MD, PhD, Nasir Butt PhD, Mark R. Wilson PhD
{"title":"Commentary on: Thompson WC. Uncertainty in probabilistic genotyping of low template DNA: A case study comparing STRmix™ and TrueAllele®. J Forensic Sci. 2023;68(3):1049–63","authors":"Mark W. Perlin PhD, MD, PhD, Nasir Butt PhD, Mark R. Wilson PhD","doi":"10.1111/1556-4029.15518","DOIUrl":null,"url":null,"abstract":"<p>This Letter is a response to “Uncertainty in probabilistic genotyping of low template DNA: A case study comparing STRmix™ and TrueAllele®,” a <i>Journal of Forensic Sciences</i> (<i>JFS</i>) Case Report published online in February 2023 [<span>1</span>].</p><p>In a California criminal case, a man was accused of drug possession. At the defendant's request, two drug packages were tested for DNA using short tandem repeat (STR) markers. Both items were two-person mixtures that gave similar match statistic results.</p><p>On one item, Cybergenetics TrueAllele® probabilistic genotyping (PG) software found a strong exclusionary match statistic for the defendant of one over 1.2 million, with a false-negative error rate of one over 222 million. On the same item, ESR's STRmix™ PG program produced a weaker exclusionary match statistic of one over 24.</p><p>There was no trial. Based on the exculpatory DNA evidence, the prosecutor dropped the more serious DNA-related possession charge and offered a plea agreement. The court accepted the defendant's plea in March 2023.</p><p>The TrueAllele and STRmix PG software programs qualitatively agreed. Their likelihood ratio (LR) match statistics both supported the hypothesis that the defendant <i>did not</i> contribute his DNA to the drug package evidence. However, the magnitude of the LR match statistics differed between the software programs.</p><p>This letter briefly explains why the two PG software results differed. As <i>JFS</i> requested, we address some issues raised in the Case Report [<span>1</span>]. A more extensive response [<span>2</span>] to the paper [<span>1</span>] was posted online in May 2023, discussing 20 topics and examining 120 assertions.</p><p>The two programs were given different amounts of STR input data. TrueAllele is a fully Bayesian system capable of looking at all the (allelic and non-allelic) peak data without relying on laboratory-imposed data thresholds. Most other PG software applies peak height thresholds to limit the amount of input data. Peak heights are measured in relative fluorescent units (rfu).</p><p>TrueAllele used 210 data peaks across all 21 GlobalFiler™ STR loci, or 10 peaks per locus. At a 40 rfu threshold, the STRmix program saw 24 peaks across 14 loci, or just 1.7 peaks per locus. This 1.7 peak density is insufficient for an informative analysis of a two-person mixture, since at least three or four peaks would be needed. The 88% reduction in STRmix data peaks, relative to TrueAllele input, accounts for the observed LR output differences.</p><p>We tested STRmix on the STR data at different thresholds, ranging from 0 rfu to 90 rfu, in 10 rfu increments. The weakest STRmix subsource LR value in our sensitivity study was 1 over 3.35 (using 11 peaks at a high 90 rfu threshold), while the strongest LR was 1 over 30.5 million (38 peaks at a low 20 rfu threshold). Less STRmix input data gave less output identification information; more data yielded more information.</p><p>At a 10 rfu threshold (54 peaks), the STRmix LR of one over 4.8 million was close to TrueAllele's reported one over 1.2 million. Given more data, STRmix got about the same LR results as TrueAllele. The difference in data input explains the difference between the reported TrueAllele and STRmix LR values in this case. The Case Report<i>'s</i> “opinions” [<span>3</span>] did not.</p><p>The Case Report assumed that TrueAllele and STRmix software should produce similar LR answers on the same DNA evidence. With abundant DNA, where thresholds are not an issue, the two programs often agree. But TrueAllele's hierarchical modeling is specifically designed to process low-template DNA data. Different statistical models can lead to different answers.</p><p>The Case Report compared TrueAllele and STRmix probabilistic genotypes. However, TrueAllele numerically represents contributor genotypes using posterior <i>probability</i>, while STRmix uses <i>likelihood</i>-derived genotype “weights.” Probability and likelihood are different concepts whose numbers cannot be directly compared [<span>4</span>].</p><p>The Case Report compared TrueAllele and STRmix mixture weights (MW). TrueAllele examined 10 peaks per locus at all 21 STR loci. This is enough STR pattern data for hierarchical MW modeling of a two-person mixture with differential DNA degradation. However, STRmix analyzed just 14 loci, averaging only 1.7 peaks per locus, which is insufficient genotyping data for determining MW. The Case Report looked at only a few nonrepresentative loci showing short STR molecules with little degradation.</p><p>The Case Report compared TrueAllele and STRmix LR reporting language. TrueAllele <i>separates</i> complex mixture data into probabilistic contributor genotypes, producing LR values that compare single-contributor genotypes [<span>5</span>]. STRmix calculates LR values based on how well a set of genotypes jointly explain <i>unseparated</i> mixture data [<span>6</span>]. The two approaches compute the same LR value [<span>7</span>], each having appropriate reporting language for their calculation method.</p><p>The Case Report took issue with reporting a “match.” However, the separated single-contributor LR language reports a <i>match probability ratio</i>, not a “match” [<span>2</span>]. Reporting “match” statistics (e.g., random “match” probability) has long been standard in forensic science [<span>8</span>].</p><p>The Case Report speculated at length on why TrueAllele would give zero probability to two genotype values: locus D1 allele pair 14 14 and D22's 11 17. However, TrueAllele had assigned those allele pairs <i>nonzero</i> probabilities of 0.00022 and 0.00018, respectively.</p><p>TrueAllele can use more data from low-template DNA than other programs because it hierarchically models baseline noise and PCR variance [<span>5</span>]. This extra modeling obviates the need for peak height thresholds, considering more STR data for deriving more LR information.</p><p>TrueAllele constructs high-resolution LR distributions [<span>9</span>] for calculating LR error rates. This comprehensive method supports both false-positive rates for inclusionary match statistics, and false-negative rates for exclusionary results [<span>10, 11</span>].</p><p>The Case Report cited only three TrueAllele validation studies [<span>12-14</span>]. In fact, from 2009 onward, there have been eight peer-reviewed studies, validating TrueAllele interpretation for mixtures containing 2 to 10 unknown contributors [<span>5, 15-18</span>].</p><p>The Case Report suggested that TrueAllele uses an “ad hoc” LR cutoff. In fact, as presented at AAFS in 2013, the LR floor is based on a validation study of the impact of single or double allele dropout on under-sampled LR values [<span>19</span>].</p><p>At PCAST's 2016 meeting, Dr. Perlin gave the committee 34 validation studies, including seven peer-reviewed papers [<span>20</span>]. In 14 of these studies, false inclusion error rates (i.e., false incrimination) were specifically addressed.</p><p>Defendants and victims are entitled to meaningful DNA evidence. With low-level mixtures, more data and more variables can deliver more LR information, whether exculpatory or inculpatory. The <i>JFS</i> Case Report advised crime laboratories to “punt” when they are unable to interpret DNA data using potentially limited software. But, as this case shows, advanced PG software that can use more data lets them “go for the goal” of truth.</p>","PeriodicalId":15743,"journal":{"name":"Journal of forensic sciences","volume":"69 4","pages":"1516-1518"},"PeriodicalIF":1.5000,"publicationDate":"2024-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/1556-4029.15518","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of forensic sciences","FirstCategoryId":"3","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/1556-4029.15518","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MEDICINE, LEGAL","Score":null,"Total":0}
引用次数: 0
Abstract
This Letter is a response to “Uncertainty in probabilistic genotyping of low template DNA: A case study comparing STRmix™ and TrueAllele®,” a Journal of Forensic Sciences (JFS) Case Report published online in February 2023 [1].
In a California criminal case, a man was accused of drug possession. At the defendant's request, two drug packages were tested for DNA using short tandem repeat (STR) markers. Both items were two-person mixtures that gave similar match statistic results.
On one item, Cybergenetics TrueAllele® probabilistic genotyping (PG) software found a strong exclusionary match statistic for the defendant of one over 1.2 million, with a false-negative error rate of one over 222 million. On the same item, ESR's STRmix™ PG program produced a weaker exclusionary match statistic of one over 24.
There was no trial. Based on the exculpatory DNA evidence, the prosecutor dropped the more serious DNA-related possession charge and offered a plea agreement. The court accepted the defendant's plea in March 2023.
The TrueAllele and STRmix PG software programs qualitatively agreed. Their likelihood ratio (LR) match statistics both supported the hypothesis that the defendant did not contribute his DNA to the drug package evidence. However, the magnitude of the LR match statistics differed between the software programs.
This letter briefly explains why the two PG software results differed. As JFS requested, we address some issues raised in the Case Report [1]. A more extensive response [2] to the paper [1] was posted online in May 2023, discussing 20 topics and examining 120 assertions.
The two programs were given different amounts of STR input data. TrueAllele is a fully Bayesian system capable of looking at all the (allelic and non-allelic) peak data without relying on laboratory-imposed data thresholds. Most other PG software applies peak height thresholds to limit the amount of input data. Peak heights are measured in relative fluorescent units (rfu).
TrueAllele used 210 data peaks across all 21 GlobalFiler™ STR loci, or 10 peaks per locus. At a 40 rfu threshold, the STRmix program saw 24 peaks across 14 loci, or just 1.7 peaks per locus. This 1.7 peak density is insufficient for an informative analysis of a two-person mixture, since at least three or four peaks would be needed. The 88% reduction in STRmix data peaks, relative to TrueAllele input, accounts for the observed LR output differences.
We tested STRmix on the STR data at different thresholds, ranging from 0 rfu to 90 rfu, in 10 rfu increments. The weakest STRmix subsource LR value in our sensitivity study was 1 over 3.35 (using 11 peaks at a high 90 rfu threshold), while the strongest LR was 1 over 30.5 million (38 peaks at a low 20 rfu threshold). Less STRmix input data gave less output identification information; more data yielded more information.
At a 10 rfu threshold (54 peaks), the STRmix LR of one over 4.8 million was close to TrueAllele's reported one over 1.2 million. Given more data, STRmix got about the same LR results as TrueAllele. The difference in data input explains the difference between the reported TrueAllele and STRmix LR values in this case. The Case Report's “opinions” [3] did not.
The Case Report assumed that TrueAllele and STRmix software should produce similar LR answers on the same DNA evidence. With abundant DNA, where thresholds are not an issue, the two programs often agree. But TrueAllele's hierarchical modeling is specifically designed to process low-template DNA data. Different statistical models can lead to different answers.
The Case Report compared TrueAllele and STRmix probabilistic genotypes. However, TrueAllele numerically represents contributor genotypes using posterior probability, while STRmix uses likelihood-derived genotype “weights.” Probability and likelihood are different concepts whose numbers cannot be directly compared [4].
The Case Report compared TrueAllele and STRmix mixture weights (MW). TrueAllele examined 10 peaks per locus at all 21 STR loci. This is enough STR pattern data for hierarchical MW modeling of a two-person mixture with differential DNA degradation. However, STRmix analyzed just 14 loci, averaging only 1.7 peaks per locus, which is insufficient genotyping data for determining MW. The Case Report looked at only a few nonrepresentative loci showing short STR molecules with little degradation.
The Case Report compared TrueAllele and STRmix LR reporting language. TrueAllele separates complex mixture data into probabilistic contributor genotypes, producing LR values that compare single-contributor genotypes [5]. STRmix calculates LR values based on how well a set of genotypes jointly explain unseparated mixture data [6]. The two approaches compute the same LR value [7], each having appropriate reporting language for their calculation method.
The Case Report took issue with reporting a “match.” However, the separated single-contributor LR language reports a match probability ratio, not a “match” [2]. Reporting “match” statistics (e.g., random “match” probability) has long been standard in forensic science [8].
The Case Report speculated at length on why TrueAllele would give zero probability to two genotype values: locus D1 allele pair 14 14 and D22's 11 17. However, TrueAllele had assigned those allele pairs nonzero probabilities of 0.00022 and 0.00018, respectively.
TrueAllele can use more data from low-template DNA than other programs because it hierarchically models baseline noise and PCR variance [5]. This extra modeling obviates the need for peak height thresholds, considering more STR data for deriving more LR information.
TrueAllele constructs high-resolution LR distributions [9] for calculating LR error rates. This comprehensive method supports both false-positive rates for inclusionary match statistics, and false-negative rates for exclusionary results [10, 11].
The Case Report cited only three TrueAllele validation studies [12-14]. In fact, from 2009 onward, there have been eight peer-reviewed studies, validating TrueAllele interpretation for mixtures containing 2 to 10 unknown contributors [5, 15-18].
The Case Report suggested that TrueAllele uses an “ad hoc” LR cutoff. In fact, as presented at AAFS in 2013, the LR floor is based on a validation study of the impact of single or double allele dropout on under-sampled LR values [19].
At PCAST's 2016 meeting, Dr. Perlin gave the committee 34 validation studies, including seven peer-reviewed papers [20]. In 14 of these studies, false inclusion error rates (i.e., false incrimination) were specifically addressed.
Defendants and victims are entitled to meaningful DNA evidence. With low-level mixtures, more data and more variables can deliver more LR information, whether exculpatory or inculpatory. The JFS Case Report advised crime laboratories to “punt” when they are unable to interpret DNA data using potentially limited software. But, as this case shows, advanced PG software that can use more data lets them “go for the goal” of truth.
期刊介绍:
The Journal of Forensic Sciences (JFS) is the official publication of the American Academy of Forensic Sciences (AAFS). It is devoted to the publication of original investigations, observations, scholarly inquiries and reviews in various branches of the forensic sciences. These include anthropology, criminalistics, digital and multimedia sciences, engineering and applied sciences, pathology/biology, psychiatry and behavioral science, jurisprudence, odontology, questioned documents, and toxicology. Similar submissions dealing with forensic aspects of other sciences and the social sciences are also accepted, as are submissions dealing with scientifically sound emerging science disciplines. The content and/or views expressed in the JFS are not necessarily those of the AAFS, the JFS Editorial Board, the organizations with which authors are affiliated, or the publisher of JFS. All manuscript submissions are double-blind peer-reviewed.