Malte B. Nielsen , Poul S. Eriksen , Helle S. Mogensen , Niels Morling , Mikkel M. Andersen
{"title":"对称多项逻辑回归增强SNP基因分型","authors":"Malte B. Nielsen , Poul S. Eriksen , Helle S. Mogensen , Niels Morling , Mikkel M. Andersen","doi":"10.1016/j.fsigen.2025.103291","DOIUrl":null,"url":null,"abstract":"<div><div>In genotyping, determining single nucleotide polymorphisms (SNPs) is standard practice, but it becomes difficult when analysing small quantities of input DNA, as is often required in forensic applications. Existing SNP genotyping methods, such as the HID SNP Genotyper Plugin (HSG) from Thermo Fisher Scientific, perform well with adequate DNA input levels but often produce erroneously called genotypes when DNA quantities are low. To mitigate these errors, genotype quality can be checked with the HSG. However, enforcing the HSG’s quality checks decreases the call rate by introducing more no-calls, and it does not eliminate all wrong calls. This study presents and validates a symmetric multinomial logistic regression (SMLR) model designed to enhance genotyping accuracy and call rate with small amounts of DNA. Comprehensive bootstrap and cross-validation analyses across a wide range of DNA quantities demonstrate the robustness and efficiency of the SMLR model in maintaining high call rates without compromising accuracy compared to the HSG. For DNA amounts as low as 31.25<!--> <!-->pg, the SMLR method reduced the rate of no-calls by 50.0% relative to the HSG while maintaining the same rate of wrong calls, resulting in a call rate of 96.0%. Similarly, SMLR reduced the rate of wrong calls by 55.6% while maintaining the same call rate, achieving an accuracy of 99.775%. The no-call and wrong-call rates were significantly reduced at 62.5–250<!--> <!-->pg DNA. The results highlight the SMLR model’s utility in optimising SNP genotyping at suboptimal DNA concentrations, making it a valuable tool for forensic applications where sample quantity and quality may be decreased. This work reinforces the feasibility of statistical approaches in forensic genotyping and provides a framework for implementing the SMLR method in practical forensic settings. The SMLR model applies to genotyping biallelic data with a signal (e.g. reads, counts, or intensity) for each allele. The model can also improve the allele balance quality check.</div></div>","PeriodicalId":50435,"journal":{"name":"Forensic Science International-Genetics","volume":"78 ","pages":"Article 103291"},"PeriodicalIF":3.2000,"publicationDate":"2025-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Enhanced SNP genotyping with symmetric multinomial logistic regression\",\"authors\":\"Malte B. Nielsen , Poul S. Eriksen , Helle S. Mogensen , Niels Morling , Mikkel M. Andersen\",\"doi\":\"10.1016/j.fsigen.2025.103291\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>In genotyping, determining single nucleotide polymorphisms (SNPs) is standard practice, but it becomes difficult when analysing small quantities of input DNA, as is often required in forensic applications. Existing SNP genotyping methods, such as the HID SNP Genotyper Plugin (HSG) from Thermo Fisher Scientific, perform well with adequate DNA input levels but often produce erroneously called genotypes when DNA quantities are low. To mitigate these errors, genotype quality can be checked with the HSG. However, enforcing the HSG’s quality checks decreases the call rate by introducing more no-calls, and it does not eliminate all wrong calls. This study presents and validates a symmetric multinomial logistic regression (SMLR) model designed to enhance genotyping accuracy and call rate with small amounts of DNA. Comprehensive bootstrap and cross-validation analyses across a wide range of DNA quantities demonstrate the robustness and efficiency of the SMLR model in maintaining high call rates without compromising accuracy compared to the HSG. For DNA amounts as low as 31.25<!--> <!-->pg, the SMLR method reduced the rate of no-calls by 50.0% relative to the HSG while maintaining the same rate of wrong calls, resulting in a call rate of 96.0%. Similarly, SMLR reduced the rate of wrong calls by 55.6% while maintaining the same call rate, achieving an accuracy of 99.775%. The no-call and wrong-call rates were significantly reduced at 62.5–250<!--> <!-->pg DNA. The results highlight the SMLR model’s utility in optimising SNP genotyping at suboptimal DNA concentrations, making it a valuable tool for forensic applications where sample quantity and quality may be decreased. This work reinforces the feasibility of statistical approaches in forensic genotyping and provides a framework for implementing the SMLR method in practical forensic settings. The SMLR model applies to genotyping biallelic data with a signal (e.g. reads, counts, or intensity) for each allele. The model can also improve the allele balance quality check.</div></div>\",\"PeriodicalId\":50435,\"journal\":{\"name\":\"Forensic Science International-Genetics\",\"volume\":\"78 \",\"pages\":\"Article 103291\"},\"PeriodicalIF\":3.2000,\"publicationDate\":\"2025-05-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Forensic Science International-Genetics\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1872497325000717\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"GENETICS & HEREDITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Forensic Science International-Genetics","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1872497325000717","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}
Enhanced SNP genotyping with symmetric multinomial logistic regression
In genotyping, determining single nucleotide polymorphisms (SNPs) is standard practice, but it becomes difficult when analysing small quantities of input DNA, as is often required in forensic applications. Existing SNP genotyping methods, such as the HID SNP Genotyper Plugin (HSG) from Thermo Fisher Scientific, perform well with adequate DNA input levels but often produce erroneously called genotypes when DNA quantities are low. To mitigate these errors, genotype quality can be checked with the HSG. However, enforcing the HSG’s quality checks decreases the call rate by introducing more no-calls, and it does not eliminate all wrong calls. This study presents and validates a symmetric multinomial logistic regression (SMLR) model designed to enhance genotyping accuracy and call rate with small amounts of DNA. Comprehensive bootstrap and cross-validation analyses across a wide range of DNA quantities demonstrate the robustness and efficiency of the SMLR model in maintaining high call rates without compromising accuracy compared to the HSG. For DNA amounts as low as 31.25 pg, the SMLR method reduced the rate of no-calls by 50.0% relative to the HSG while maintaining the same rate of wrong calls, resulting in a call rate of 96.0%. Similarly, SMLR reduced the rate of wrong calls by 55.6% while maintaining the same call rate, achieving an accuracy of 99.775%. The no-call and wrong-call rates were significantly reduced at 62.5–250 pg DNA. The results highlight the SMLR model’s utility in optimising SNP genotyping at suboptimal DNA concentrations, making it a valuable tool for forensic applications where sample quantity and quality may be decreased. This work reinforces the feasibility of statistical approaches in forensic genotyping and provides a framework for implementing the SMLR method in practical forensic settings. The SMLR model applies to genotyping biallelic data with a signal (e.g. reads, counts, or intensity) for each allele. The model can also improve the allele balance quality check.
期刊介绍:
Forensic Science International: Genetics is the premier journal in the field of Forensic Genetics. This branch of Forensic Science can be defined as the application of genetics to human and non-human material (in the sense of a science with the purpose of studying inherited characteristics for the analysis of inter- and intra-specific variations in populations) for the resolution of legal conflicts.
The scope of the journal includes:
Forensic applications of human polymorphism.
Testing of paternity and other family relationships, immigration cases, typing of biological stains and tissues from criminal casework, identification of human remains by DNA testing methodologies.
Description of human polymorphisms of forensic interest, with special interest in DNA polymorphisms.
Autosomal DNA polymorphisms, mini- and microsatellites (or short tandem repeats, STRs), single nucleotide polymorphisms (SNPs), X and Y chromosome polymorphisms, mtDNA polymorphisms, and any other type of DNA variation with potential forensic applications.
Non-human DNA polymorphisms for crime scene investigation.
Population genetics of human polymorphisms of forensic interest.
Population data, especially from DNA polymorphisms of interest for the solution of forensic problems.
DNA typing methodologies and strategies.
Biostatistical methods in forensic genetics.
Evaluation of DNA evidence in forensic problems (such as paternity or immigration cases, criminal casework, identification), classical and new statistical approaches.
Standards in forensic genetics.
Recommendations of regulatory bodies concerning methods, markers, interpretation or strategies or proposals for procedural or technical standards.
Quality control.
Quality control and quality assurance strategies, proficiency testing for DNA typing methodologies.
Criminal DNA databases.
Technical, legal and statistical issues.
General ethical and legal issues related to forensic genetics.