Abel K.J.G. de Wit , Claire D. Wagenaar , Nathalie A.C. Janssen , Brechtje Hoegen , Judith van de Wetering , Huub Hoofs , Simone Ariëns , Corina C.G. Benschop , Rolf J.F. Ypma
{"title":"使人工智能可用于法医DNA分析","authors":"Abel K.J.G. de Wit , Claire D. Wagenaar , Nathalie A.C. Janssen , Brechtje Hoegen , Judith van de Wetering , Huub Hoofs , Simone Ariëns , Corina C.G. Benschop , Rolf J.F. Ypma","doi":"10.1016/j.fsigen.2025.103345","DOIUrl":null,"url":null,"abstract":"<div><div>Deep learning has the potential to be a powerful tool for automating allele calling in forensic DNA analysis. Studies to date have relied on bespoke model architecture and painstaking manual annotations to train models, which makes it challenging for other researchers to work with these techniques. In this study, we explore the possibility of training a well-performing model using data gathered as part of casework, and employing a widely adopted architecture: the U-Net. In this approach, annotations are created from alleles called during casework. The model, dubbed ‘DNANet’, then classifies each scan point in the electropherogram (EPG) as part of an allele or non-allele, building on the task of segmentation in computer vision. We evaluate performance on unseen case data and on independent mixture research data, taking analyst annotations as ground-truth. We further compare DNANet’s performance with analyst performance on the research data, taking actual donor alleles as ground-truth. DNANet reached an F1 score of 0.971 on analyst annotated alleles on case data not seen during training, and 0.982 on the research data. On actual donor alleles, DNANet reached an F1 score of 0.962, equal to the F1 score computed from analyst annotations. Our results show that DNANet’s performance is comparable to human annotations following standard procedures. This illustrates the potential for obtaining good results with standard data and architecture. Future work may focus on what aspects of data, annotations or model architecture are key in shaping performance. We make our code, model weights and research data publicly available to aid the community. Lastly, we call for an effort to establish a standardized benchmark to aid in quantitative comparisons between allele calling systems.</div></div>","PeriodicalId":50435,"journal":{"name":"Forensic Science International-Genetics","volume":"81 ","pages":"Article 103345"},"PeriodicalIF":3.1000,"publicationDate":"2025-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Making AI accessible for forensic DNA profile analysis\",\"authors\":\"Abel K.J.G. de Wit , Claire D. Wagenaar , Nathalie A.C. Janssen , Brechtje Hoegen , Judith van de Wetering , Huub Hoofs , Simone Ariëns , Corina C.G. Benschop , Rolf J.F. Ypma\",\"doi\":\"10.1016/j.fsigen.2025.103345\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Deep learning has the potential to be a powerful tool for automating allele calling in forensic DNA analysis. Studies to date have relied on bespoke model architecture and painstaking manual annotations to train models, which makes it challenging for other researchers to work with these techniques. In this study, we explore the possibility of training a well-performing model using data gathered as part of casework, and employing a widely adopted architecture: the U-Net. In this approach, annotations are created from alleles called during casework. The model, dubbed ‘DNANet’, then classifies each scan point in the electropherogram (EPG) as part of an allele or non-allele, building on the task of segmentation in computer vision. We evaluate performance on unseen case data and on independent mixture research data, taking analyst annotations as ground-truth. We further compare DNANet’s performance with analyst performance on the research data, taking actual donor alleles as ground-truth. DNANet reached an F1 score of 0.971 on analyst annotated alleles on case data not seen during training, and 0.982 on the research data. On actual donor alleles, DNANet reached an F1 score of 0.962, equal to the F1 score computed from analyst annotations. Our results show that DNANet’s performance is comparable to human annotations following standard procedures. This illustrates the potential for obtaining good results with standard data and architecture. Future work may focus on what aspects of data, annotations or model architecture are key in shaping performance. We make our code, model weights and research data publicly available to aid the community. Lastly, we call for an effort to establish a standardized benchmark to aid in quantitative comparisons between allele calling systems.</div></div>\",\"PeriodicalId\":50435,\"journal\":{\"name\":\"Forensic Science International-Genetics\",\"volume\":\"81 \",\"pages\":\"Article 103345\"},\"PeriodicalIF\":3.1000,\"publicationDate\":\"2025-08-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Forensic Science International-Genetics\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1872497325001255\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"GENETICS & HEREDITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Forensic Science International-Genetics","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1872497325001255","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}
Making AI accessible for forensic DNA profile analysis
Deep learning has the potential to be a powerful tool for automating allele calling in forensic DNA analysis. Studies to date have relied on bespoke model architecture and painstaking manual annotations to train models, which makes it challenging for other researchers to work with these techniques. In this study, we explore the possibility of training a well-performing model using data gathered as part of casework, and employing a widely adopted architecture: the U-Net. In this approach, annotations are created from alleles called during casework. The model, dubbed ‘DNANet’, then classifies each scan point in the electropherogram (EPG) as part of an allele or non-allele, building on the task of segmentation in computer vision. We evaluate performance on unseen case data and on independent mixture research data, taking analyst annotations as ground-truth. We further compare DNANet’s performance with analyst performance on the research data, taking actual donor alleles as ground-truth. DNANet reached an F1 score of 0.971 on analyst annotated alleles on case data not seen during training, and 0.982 on the research data. On actual donor alleles, DNANet reached an F1 score of 0.962, equal to the F1 score computed from analyst annotations. Our results show that DNANet’s performance is comparable to human annotations following standard procedures. This illustrates the potential for obtaining good results with standard data and architecture. Future work may focus on what aspects of data, annotations or model architecture are key in shaping performance. We make our code, model weights and research data publicly available to aid the community. Lastly, we call for an effort to establish a standardized benchmark to aid in quantitative comparisons between allele calling systems.
期刊介绍:
Forensic Science International: Genetics is the premier journal in the field of Forensic Genetics. This branch of Forensic Science can be defined as the application of genetics to human and non-human material (in the sense of a science with the purpose of studying inherited characteristics for the analysis of inter- and intra-specific variations in populations) for the resolution of legal conflicts.
The scope of the journal includes:
Forensic applications of human polymorphism.
Testing of paternity and other family relationships, immigration cases, typing of biological stains and tissues from criminal casework, identification of human remains by DNA testing methodologies.
Description of human polymorphisms of forensic interest, with special interest in DNA polymorphisms.
Autosomal DNA polymorphisms, mini- and microsatellites (or short tandem repeats, STRs), single nucleotide polymorphisms (SNPs), X and Y chromosome polymorphisms, mtDNA polymorphisms, and any other type of DNA variation with potential forensic applications.
Non-human DNA polymorphisms for crime scene investigation.
Population genetics of human polymorphisms of forensic interest.
Population data, especially from DNA polymorphisms of interest for the solution of forensic problems.
DNA typing methodologies and strategies.
Biostatistical methods in forensic genetics.
Evaluation of DNA evidence in forensic problems (such as paternity or immigration cases, criminal casework, identification), classical and new statistical approaches.
Standards in forensic genetics.
Recommendations of regulatory bodies concerning methods, markers, interpretation or strategies or proposals for procedural or technical standards.
Quality control.
Quality control and quality assurance strategies, proficiency testing for DNA typing methodologies.
Criminal DNA databases.
Technical, legal and statistical issues.
General ethical and legal issues related to forensic genetics.