Making AI accessible for forensic DNA profile analysis

IF 3.1 2区 医学 Q2 GENETICS & HEREDITY
Abel K.J.G. de Wit , Claire D. Wagenaar , Nathalie A.C. Janssen , Brechtje Hoegen , Judith van de Wetering , Huub Hoofs , Simone Ariëns , Corina C.G. Benschop , Rolf J.F. Ypma
{"title":"Making AI accessible for forensic DNA profile analysis","authors":"Abel K.J.G. de Wit ,&nbsp;Claire D. Wagenaar ,&nbsp;Nathalie A.C. Janssen ,&nbsp;Brechtje Hoegen ,&nbsp;Judith van de Wetering ,&nbsp;Huub Hoofs ,&nbsp;Simone Ariëns ,&nbsp;Corina C.G. Benschop ,&nbsp;Rolf J.F. Ypma","doi":"10.1016/j.fsigen.2025.103345","DOIUrl":null,"url":null,"abstract":"<div><div>Deep learning has the potential to be a powerful tool for automating allele calling in forensic DNA analysis. Studies to date have relied on bespoke model architecture and painstaking manual annotations to train models, which makes it challenging for other researchers to work with these techniques. In this study, we explore the possibility of training a well-performing model using data gathered as part of casework, and employing a widely adopted architecture: the U-Net. In this approach, annotations are created from alleles called during casework. The model, dubbed ‘DNANet’, then classifies each scan point in the electropherogram (EPG) as part of an allele or non-allele, building on the task of segmentation in computer vision. We evaluate performance on unseen case data and on independent mixture research data, taking analyst annotations as ground-truth. We further compare DNANet’s performance with analyst performance on the research data, taking actual donor alleles as ground-truth. DNANet reached an F1 score of 0.971 on analyst annotated alleles on case data not seen during training, and 0.982 on the research data. On actual donor alleles, DNANet reached an F1 score of 0.962, equal to the F1 score computed from analyst annotations. Our results show that DNANet’s performance is comparable to human annotations following standard procedures. This illustrates the potential for obtaining good results with standard data and architecture. Future work may focus on what aspects of data, annotations or model architecture are key in shaping performance. We make our code, model weights and research data publicly available to aid the community. Lastly, we call for an effort to establish a standardized benchmark to aid in quantitative comparisons between allele calling systems.</div></div>","PeriodicalId":50435,"journal":{"name":"Forensic Science International-Genetics","volume":"81 ","pages":"Article 103345"},"PeriodicalIF":3.1000,"publicationDate":"2025-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Forensic Science International-Genetics","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1872497325001255","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}
引用次数: 0

Abstract

Deep learning has the potential to be a powerful tool for automating allele calling in forensic DNA analysis. Studies to date have relied on bespoke model architecture and painstaking manual annotations to train models, which makes it challenging for other researchers to work with these techniques. In this study, we explore the possibility of training a well-performing model using data gathered as part of casework, and employing a widely adopted architecture: the U-Net. In this approach, annotations are created from alleles called during casework. The model, dubbed ‘DNANet’, then classifies each scan point in the electropherogram (EPG) as part of an allele or non-allele, building on the task of segmentation in computer vision. We evaluate performance on unseen case data and on independent mixture research data, taking analyst annotations as ground-truth. We further compare DNANet’s performance with analyst performance on the research data, taking actual donor alleles as ground-truth. DNANet reached an F1 score of 0.971 on analyst annotated alleles on case data not seen during training, and 0.982 on the research data. On actual donor alleles, DNANet reached an F1 score of 0.962, equal to the F1 score computed from analyst annotations. Our results show that DNANet’s performance is comparable to human annotations following standard procedures. This illustrates the potential for obtaining good results with standard data and architecture. Future work may focus on what aspects of data, annotations or model architecture are key in shaping performance. We make our code, model weights and research data publicly available to aid the community. Lastly, we call for an effort to establish a standardized benchmark to aid in quantitative comparisons between allele calling systems.
使人工智能可用于法医DNA分析
深度学习有可能成为法医DNA分析中自动调用等位基因的强大工具。迄今为止的研究都依赖于定制的模型架构和艰苦的手工注释来训练模型,这使得其他研究人员使用这些技术具有挑战性。在这项研究中,我们探索了使用作为案例工作的一部分收集的数据来训练一个性能良好的模型的可能性,并采用了广泛采用的架构:U-Net。在这种方法中,注释是根据案例工作期间调用的等位基因创建的。该模型被称为“DNANet”,然后在计算机视觉分割任务的基础上,将电泳图(EPG)中的每个扫描点分类为等位基因或非等位基因的一部分。我们在未见的案例数据和独立的混合研究数据上评估性能,将分析师的注释作为基本事实。我们进一步将DNANet的性能与分析师在研究数据上的性能进行比较,以实际供体等位基因为基础。DNANet对训练中未见的病例数据的分析师注释等位基因的F1得分为0.971,对研究数据的F1得分为0.982。在实际供体等位基因上,DNANet的F1得分为0.962,与分析者注释计算的F1得分相等。我们的结果表明,DNANet的性能与遵循标准过程的人类注释相当。这说明了使用标准数据和体系结构获得良好结果的可能性。未来的工作可能会集中在数据、注释或模型架构的哪些方面是塑造性能的关键。我们将代码、模型权重和研究数据公开,以帮助社区。最后,我们呼吁努力建立一个标准化的基准,以帮助在等位基因呼叫系统之间的定量比较。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
7.50
自引率
32.30%
发文量
132
审稿时长
11.3 weeks
期刊介绍: Forensic Science International: Genetics is the premier journal in the field of Forensic Genetics. This branch of Forensic Science can be defined as the application of genetics to human and non-human material (in the sense of a science with the purpose of studying inherited characteristics for the analysis of inter- and intra-specific variations in populations) for the resolution of legal conflicts. The scope of the journal includes: Forensic applications of human polymorphism. Testing of paternity and other family relationships, immigration cases, typing of biological stains and tissues from criminal casework, identification of human remains by DNA testing methodologies. Description of human polymorphisms of forensic interest, with special interest in DNA polymorphisms. Autosomal DNA polymorphisms, mini- and microsatellites (or short tandem repeats, STRs), single nucleotide polymorphisms (SNPs), X and Y chromosome polymorphisms, mtDNA polymorphisms, and any other type of DNA variation with potential forensic applications. Non-human DNA polymorphisms for crime scene investigation. Population genetics of human polymorphisms of forensic interest. Population data, especially from DNA polymorphisms of interest for the solution of forensic problems. DNA typing methodologies and strategies. Biostatistical methods in forensic genetics. Evaluation of DNA evidence in forensic problems (such as paternity or immigration cases, criminal casework, identification), classical and new statistical approaches. Standards in forensic genetics. Recommendations of regulatory bodies concerning methods, markers, interpretation or strategies or proposals for procedural or technical standards. Quality control. Quality control and quality assurance strategies, proficiency testing for DNA typing methodologies. Criminal DNA databases. Technical, legal and statistical issues. General ethical and legal issues related to forensic genetics.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信