Explainable artificial intelligence in forensic DNA analysis: Alleles identification in challenging electropherograms using supervised machine learning methods

IF 3.2 2区 医学 Q2 GENETICS & HEREDITY
Mengyu Tan , Yuxuan Tan , Haoyan Jiang , Jiaming Xue , Qiushuo Wu , Yazi Zheng , Guihong Liu , Yuanyuan Xiao , Meili Lv , Miao Liao , Lin Zhang , Shengqiu Qu , Weibo Liang
{"title":"Explainable artificial intelligence in forensic DNA analysis: Alleles identification in challenging electropherograms using supervised machine learning methods","authors":"Mengyu Tan ,&nbsp;Yuxuan Tan ,&nbsp;Haoyan Jiang ,&nbsp;Jiaming Xue ,&nbsp;Qiushuo Wu ,&nbsp;Yazi Zheng ,&nbsp;Guihong Liu ,&nbsp;Yuanyuan Xiao ,&nbsp;Meili Lv ,&nbsp;Miao Liao ,&nbsp;Lin Zhang ,&nbsp;Shengqiu Qu ,&nbsp;Weibo Liang","doi":"10.1016/j.fsigen.2025.103289","DOIUrl":null,"url":null,"abstract":"<div><div>Challenging samples in capillary electrophoresis (CE)-based short tandem repeat (STR) analysis often produce artefactual signals that cannot be completely filtered out by expert electropherogram (EPG) reading systems, complicating allele interpretation. Previous studies have demonstrated the potential of artificial intelligence (AI) to address this issue by accurately distinguishing allele signals from artefacts in EPGs. Traditional machine learning models offer significant advantages in enhancing the interpretability and transparency of AI models used in DNA analysis, particularly in criminal investigations and legal contexts. In this study, five traditional machine learning algorithms were employed to train and construct models using EPG signal datasets from single-source low-template EPGs, mixture EPGs, and combined datasets. Performance evaluation and validation with additional datasets demonstrated the feasibility of these models in improving the reportability of potential information in EPGs. However, further optimization is needed for mixture EPGs to enhance classification accuracy. Implementing Receiver Operating Characteristic (ROC) curve analysis and prediction probability thresholds effectively reduced false positive classifications. Additionally, a user-friendly platform was developed for EPG signal classification based on machine learning and ensemble learning, allowing for the classification of any signal datasets using traditional machine learning models and combining the prediction results of multiple models. This platform will provide analysts with more optimal and robust results. This study shows that machine-learning-based EPG signal classification models can significantly enhance the efficiency of sample analysis and interpretation, providing a solid foundation for future research.</div></div>","PeriodicalId":50435,"journal":{"name":"Forensic Science International-Genetics","volume":"78 ","pages":"Article 103289"},"PeriodicalIF":3.2000,"publicationDate":"2025-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Forensic Science International-Genetics","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1872497325000699","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}
引用次数: 0

Abstract

Challenging samples in capillary electrophoresis (CE)-based short tandem repeat (STR) analysis often produce artefactual signals that cannot be completely filtered out by expert electropherogram (EPG) reading systems, complicating allele interpretation. Previous studies have demonstrated the potential of artificial intelligence (AI) to address this issue by accurately distinguishing allele signals from artefacts in EPGs. Traditional machine learning models offer significant advantages in enhancing the interpretability and transparency of AI models used in DNA analysis, particularly in criminal investigations and legal contexts. In this study, five traditional machine learning algorithms were employed to train and construct models using EPG signal datasets from single-source low-template EPGs, mixture EPGs, and combined datasets. Performance evaluation and validation with additional datasets demonstrated the feasibility of these models in improving the reportability of potential information in EPGs. However, further optimization is needed for mixture EPGs to enhance classification accuracy. Implementing Receiver Operating Characteristic (ROC) curve analysis and prediction probability thresholds effectively reduced false positive classifications. Additionally, a user-friendly platform was developed for EPG signal classification based on machine learning and ensemble learning, allowing for the classification of any signal datasets using traditional machine learning models and combining the prediction results of multiple models. This platform will provide analysts with more optimal and robust results. This study shows that machine-learning-based EPG signal classification models can significantly enhance the efficiency of sample analysis and interpretation, providing a solid foundation for future research.
法医DNA分析中可解释的人工智能:使用监督机器学习方法在具有挑战性的电泳图中识别等位基因
在基于毛细管电泳(CE)的短串联重复序列(STR)分析中,具有挑战性的样品通常会产生人工信号,这些信号不能被专业的电泳(EPG)读取系统完全过滤掉,从而使等位基因解释复杂化。先前的研究已经证明,人工智能(AI)可以通过准确区分epg中的等位基因信号和伪信号来解决这一问题。传统的机器学习模型在提高DNA分析中使用的人工智能模型的可解释性和透明度方面具有显着优势,特别是在刑事调查和法律环境中。在本研究中,采用五种传统的机器学习算法,对单源低模板EPG、混合EPG和组合EPG信号数据集进行训练和构建模型。使用其他数据集进行性能评估和验证,证明了这些模型在提高epg潜在信息的可报告性方面的可行性。然而,混合epg需要进一步优化以提高分类精度。实施受试者工作特征(ROC)曲线分析和预测概率阈值,有效减少误报分类。此外,开发了基于机器学习和集成学习的用户友好型EPG信号分类平台,允许使用传统的机器学习模型对任何信号数据集进行分类,并结合多个模型的预测结果。该平台将为分析师提供更优、更可靠的结果。本研究表明,基于机器学习的EPG信号分类模型可以显著提高样本分析和解释的效率,为未来的研究提供了坚实的基础。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
7.50
自引率
32.30%
发文量
132
审稿时长
11.3 weeks
期刊介绍: Forensic Science International: Genetics is the premier journal in the field of Forensic Genetics. This branch of Forensic Science can be defined as the application of genetics to human and non-human material (in the sense of a science with the purpose of studying inherited characteristics for the analysis of inter- and intra-specific variations in populations) for the resolution of legal conflicts. The scope of the journal includes: Forensic applications of human polymorphism. Testing of paternity and other family relationships, immigration cases, typing of biological stains and tissues from criminal casework, identification of human remains by DNA testing methodologies. Description of human polymorphisms of forensic interest, with special interest in DNA polymorphisms. Autosomal DNA polymorphisms, mini- and microsatellites (or short tandem repeats, STRs), single nucleotide polymorphisms (SNPs), X and Y chromosome polymorphisms, mtDNA polymorphisms, and any other type of DNA variation with potential forensic applications. Non-human DNA polymorphisms for crime scene investigation. Population genetics of human polymorphisms of forensic interest. Population data, especially from DNA polymorphisms of interest for the solution of forensic problems. DNA typing methodologies and strategies. Biostatistical methods in forensic genetics. Evaluation of DNA evidence in forensic problems (such as paternity or immigration cases, criminal casework, identification), classical and new statistical approaches. Standards in forensic genetics. Recommendations of regulatory bodies concerning methods, markers, interpretation or strategies or proposals for procedural or technical standards. Quality control. Quality control and quality assurance strategies, proficiency testing for DNA typing methodologies. Criminal DNA databases. Technical, legal and statistical issues. General ethical and legal issues related to forensic genetics.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信