AI-driven feature selection and epigenetic pattern analysis: A screening strategy of CpGs validated by pyrosequencing for body fluid identification

IF 2.2 3区 医学 Q1 MEDICINE, LEGAL
Ming Zhao , Meiming Cai , Fanzhang Lei , Xi Yuan , Qinglin Liu , Yating Fang , Bofeng Zhu
{"title":"AI-driven feature selection and epigenetic pattern analysis: A screening strategy of CpGs validated by pyrosequencing for body fluid identification","authors":"Ming Zhao ,&nbsp;Meiming Cai ,&nbsp;Fanzhang Lei ,&nbsp;Xi Yuan ,&nbsp;Qinglin Liu ,&nbsp;Yating Fang ,&nbsp;Bofeng Zhu","doi":"10.1016/j.forsciint.2024.112339","DOIUrl":null,"url":null,"abstract":"<div><div>Identification of body fluid stain at crime scene is one of the important tasks of forensic evidence analysis. Currently, body fluid-specific CpGs detected by DNA methylation microarray screening, have been widely studied for forensic body fluid identification. However, some CpGs have limited ability to distinguish certain body fluid types. The ongoing need is to discover novel methylation markers and fully validate them to enhance their evidentiary strength in complex forensic scenarios. This research gathered forensic-related DNA methylation microarrays data from the Gene Expression Omnibus (GEO) database. A novel screening strategy for marker selection was developed, combining feature selection algorithms (elastic net, information gain ratio, feature importance based on Random Forest, and mutual information coefficient) with epigenetic pattern analysis, to identify CpG markers for body fluid identification. The selected CpGs were validated through pyrosequencing on peripheral blood, saliva, semen, vaginal secretions, and menstrual blood samples, and machine learning classification models were constructed based on the sequencing results. Pyrosequencing results revealed 14 CpGs with high specificity in five types of body fluid samples. A machine learning classification model, developed based on the pyrosequencing results, could effectively distinguish five types of body fluid samples, achieving 100 % accuracy on the test set. Utilizing six CpG markers, it was also feasible to attain ideal efficacy in identifying body fluid stains. Our research proposes a systematic and scientific strategy for screening body fluid-specific CpGs, contributing new insights and methods to forensic body fluid identification.</div></div>","PeriodicalId":12341,"journal":{"name":"Forensic science international","volume":"367 ","pages":"Article 112339"},"PeriodicalIF":2.2000,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Forensic science international","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0379073824004213","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MEDICINE, LEGAL","Score":null,"Total":0}
引用次数: 0

Abstract

Identification of body fluid stain at crime scene is one of the important tasks of forensic evidence analysis. Currently, body fluid-specific CpGs detected by DNA methylation microarray screening, have been widely studied for forensic body fluid identification. However, some CpGs have limited ability to distinguish certain body fluid types. The ongoing need is to discover novel methylation markers and fully validate them to enhance their evidentiary strength in complex forensic scenarios. This research gathered forensic-related DNA methylation microarrays data from the Gene Expression Omnibus (GEO) database. A novel screening strategy for marker selection was developed, combining feature selection algorithms (elastic net, information gain ratio, feature importance based on Random Forest, and mutual information coefficient) with epigenetic pattern analysis, to identify CpG markers for body fluid identification. The selected CpGs were validated through pyrosequencing on peripheral blood, saliva, semen, vaginal secretions, and menstrual blood samples, and machine learning classification models were constructed based on the sequencing results. Pyrosequencing results revealed 14 CpGs with high specificity in five types of body fluid samples. A machine learning classification model, developed based on the pyrosequencing results, could effectively distinguish five types of body fluid samples, achieving 100 % accuracy on the test set. Utilizing six CpG markers, it was also feasible to attain ideal efficacy in identifying body fluid stains. Our research proposes a systematic and scientific strategy for screening body fluid-specific CpGs, contributing new insights and methods to forensic body fluid identification.
人工智能驱动的特征选择和表观遗传模式分析:一种通过焦磷酸测序验证的CpGs筛选策略,用于体液鉴定。
犯罪现场体液污渍的鉴定是法医证据分析的重要任务之一。目前,通过DNA甲基化微阵列筛选检测体液特异性CpGs已被广泛研究用于法医体液鉴定。然而,一些cpg区分某些体液类型的能力有限。目前需要的是发现新的甲基化标记,并充分验证它们,以提高其在复杂法医场景中的证据强度。本研究收集了来自基因表达综合数据库(Gene Expression Omnibus, GEO)的法医相关DNA甲基化微阵列数据。将特征选择算法(弹性网络、信息增益比、基于随机森林的特征重要性和互信息系数)与表观遗传模式分析相结合,提出了一种新的标记选择筛选策略,用于识别CpG体液标记。选择的CpGs通过外周血、唾液、精液、阴道分泌物和经血样本的焦磷酸测序进行验证,并根据测序结果构建机器学习分类模型。焦磷酸测序结果显示,在5种体液样品中有14个CpGs具有高特异性。基于焦磷酸测序结果开发的机器学习分类模型可以有效区分五种类型的体液样本,在测试集上达到100% %的准确率。利用6种CpG标记物也可获得理想的体液污渍识别效果。本研究提出了一种系统、科学的筛选体液特异性CpGs的策略,为法医体液鉴定提供了新的见解和方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Forensic science international
Forensic science international 医学-医学:法
CiteScore
5.00
自引率
9.10%
发文量
285
审稿时长
49 days
期刊介绍: Forensic Science International is the flagship journal in the prestigious Forensic Science International family, publishing the most innovative, cutting-edge, and influential contributions across the forensic sciences. Fields include: forensic pathology and histochemistry, chemistry, biochemistry and toxicology, biology, serology, odontology, psychiatry, anthropology, digital forensics, the physical sciences, firearms, and document examination, as well as investigations of value to public health in its broadest sense, and the important marginal area where science and medicine interact with the law. The journal publishes: Case Reports Commentaries Letters to the Editor Original Research Papers (Regular Papers) Rapid Communications Review Articles Technical Notes.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信