{"title":"Benchmarking for genotyping and imputation using degraded DNA for forensic applications across diverse populations","authors":"Elena I. Zavala , Rori V. Rohlfs , Priya Moorjani","doi":"10.1016/j.fsigen.2024.103177","DOIUrl":null,"url":null,"abstract":"<div><div>Advancements in sequencing and laboratory technologies have enabled forensic genetic analysis on increasingly low quality and degraded DNA samples. However, existing computational methods applied to genotyping and imputation for generating DNA profiles from degraded DNA have not been tested for forensic applications. Here we simulated sequencing data of varying qualities–coverage, fragment lengths, and deamination patterns–from forty individuals of diverse genetic ancestries. We used this dataset to test the performance of commonly used genotype and imputation methods (SAMtools, GATK, ATLAS, Beagle, and GLIMPSE) on five different SNP panels (MPS-plex, FORCE, two extended kinship panels, and the Human Origins array) that are used for forensic and population genetics applications. For genome mapping and variant calling with degraded DNA, we find use of parameters and methods (such as ATLAS) developed for ancient DNA analysis provides a marked improvement over conventional standards used for next generation sequencing analysis. We find that ATLAS outperforms GATK and SAMtools, achieving over 90 % genotyping accuracy for the four largest SNP panels with coverages greater than 10X. For lower coverages, decreased concordance rates are correlated with increased rates of heterozygosity. Genotype refinement and imputation improve the accuracy at lower coverages by leveraging population reference data. For all five SNP panels, we find that using a population reference panel representative of worldwide populations (e.g., the 1000 Genomes Project) results in increased genotype accuracies across genetic ancestries, compared to ancestry-matched population reference panels. Importantly, we find that the low SNP density of commonly used forensics SNP panels can impact the reliability and performance of genotype refinement and imputation. This highlights a critical trade-off between enhancing privacy by using panels with fewer SNPs and maintaining the effectiveness of genomic tools. We provide benchmarks and recommendations for analyzing degraded DNA from diverse populations with widely used genomic methods in forensic casework.</div></div>","PeriodicalId":50435,"journal":{"name":"Forensic Science International-Genetics","volume":"75 ","pages":"Article 103177"},"PeriodicalIF":3.2000,"publicationDate":"2024-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Forensic Science International-Genetics","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S187249732400173X","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}
引用次数: 0
Abstract
Advancements in sequencing and laboratory technologies have enabled forensic genetic analysis on increasingly low quality and degraded DNA samples. However, existing computational methods applied to genotyping and imputation for generating DNA profiles from degraded DNA have not been tested for forensic applications. Here we simulated sequencing data of varying qualities–coverage, fragment lengths, and deamination patterns–from forty individuals of diverse genetic ancestries. We used this dataset to test the performance of commonly used genotype and imputation methods (SAMtools, GATK, ATLAS, Beagle, and GLIMPSE) on five different SNP panels (MPS-plex, FORCE, two extended kinship panels, and the Human Origins array) that are used for forensic and population genetics applications. For genome mapping and variant calling with degraded DNA, we find use of parameters and methods (such as ATLAS) developed for ancient DNA analysis provides a marked improvement over conventional standards used for next generation sequencing analysis. We find that ATLAS outperforms GATK and SAMtools, achieving over 90 % genotyping accuracy for the four largest SNP panels with coverages greater than 10X. For lower coverages, decreased concordance rates are correlated with increased rates of heterozygosity. Genotype refinement and imputation improve the accuracy at lower coverages by leveraging population reference data. For all five SNP panels, we find that using a population reference panel representative of worldwide populations (e.g., the 1000 Genomes Project) results in increased genotype accuracies across genetic ancestries, compared to ancestry-matched population reference panels. Importantly, we find that the low SNP density of commonly used forensics SNP panels can impact the reliability and performance of genotype refinement and imputation. This highlights a critical trade-off between enhancing privacy by using panels with fewer SNPs and maintaining the effectiveness of genomic tools. We provide benchmarks and recommendations for analyzing degraded DNA from diverse populations with widely used genomic methods in forensic casework.
随着测序和实验室技术的进步,对越来越多的低质量和降解 DNA 样本进行法医遗传分析成为可能。然而,现有的用于基因分型和从降解 DNA 中生成 DNA 图谱的计算方法尚未在法医应用中进行过测试。在这里,我们模拟了来自 40 个不同基因血统个体的不同质量的测序数据--覆盖率、片段长度和脱氨模式。我们使用该数据集测试了常用基因型和估算方法(SAMtools、GATK、ATLAS、Beagle 和 GLIMPSE)在五个不同 SNP 面板(MPS-plex、FORCE、两个扩展亲缘关系面板和人类起源阵列)上的性能,这些面板可用于法医和群体遗传学应用。我们发现,在使用降解 DNA 进行基因组图谱绘制和变异调用时,使用为古 DNA 分析开发的参数和方法(如 ATLAS)比使用新一代测序分析的传统标准有明显改善。我们发现 ATLAS 的表现优于 GATK 和 SAMtools,在覆盖率大于 10 倍的四个最大 SNP 面板中,其基因分型准确率超过 90%。对于较低的覆盖率,一致性率的降低与杂合率的增加相关。通过利用群体参考数据,基因型细化和估算提高了较低覆盖率下的准确率。对于所有五个 SNP 面板,我们发现与祖先匹配的人口参考面板相比,使用代表全球人口的人口参考面板(如 1000 基因组项目)可提高不同遗传祖先的基因型准确性。重要的是,我们发现常用法医 SNP 面板的 SNP 密度较低,这会影响基因型完善和归因的可靠性和性能。这凸显了通过使用 SNP 较少的面板来提高隐私性与保持基因组工具有效性之间的重要权衡。我们为在法医案件工作中使用广泛使用的基因组学方法分析来自不同人群的降解 DNA 提供了基准和建议。
期刊介绍:
Forensic Science International: Genetics is the premier journal in the field of Forensic Genetics. This branch of Forensic Science can be defined as the application of genetics to human and non-human material (in the sense of a science with the purpose of studying inherited characteristics for the analysis of inter- and intra-specific variations in populations) for the resolution of legal conflicts.
The scope of the journal includes:
Forensic applications of human polymorphism.
Testing of paternity and other family relationships, immigration cases, typing of biological stains and tissues from criminal casework, identification of human remains by DNA testing methodologies.
Description of human polymorphisms of forensic interest, with special interest in DNA polymorphisms.
Autosomal DNA polymorphisms, mini- and microsatellites (or short tandem repeats, STRs), single nucleotide polymorphisms (SNPs), X and Y chromosome polymorphisms, mtDNA polymorphisms, and any other type of DNA variation with potential forensic applications.
Non-human DNA polymorphisms for crime scene investigation.
Population genetics of human polymorphisms of forensic interest.
Population data, especially from DNA polymorphisms of interest for the solution of forensic problems.
DNA typing methodologies and strategies.
Biostatistical methods in forensic genetics.
Evaluation of DNA evidence in forensic problems (such as paternity or immigration cases, criminal casework, identification), classical and new statistical approaches.
Standards in forensic genetics.
Recommendations of regulatory bodies concerning methods, markers, interpretation or strategies or proposals for procedural or technical standards.
Quality control.
Quality control and quality assurance strategies, proficiency testing for DNA typing methodologies.
Criminal DNA databases.
Technical, legal and statistical issues.
General ethical and legal issues related to forensic genetics.