Ming Zhao , Meiming Cai , Fanzhang Lei , Xi Yuan , Qinglin Liu , Yating Fang , Bofeng Zhu
{"title":"AI-driven feature selection and epigenetic pattern analysis: A screening strategy of CpGs validated by pyrosequencing for body fluid identification","authors":"Ming Zhao , Meiming Cai , Fanzhang Lei , Xi Yuan , Qinglin Liu , Yating Fang , Bofeng Zhu","doi":"10.1016/j.forsciint.2024.112339","DOIUrl":null,"url":null,"abstract":"<div><div>Identification of body fluid stain at crime scene is one of the important tasks of forensic evidence analysis. Currently, body fluid-specific CpGs detected by DNA methylation microarray screening, have been widely studied for forensic body fluid identification. However, some CpGs have limited ability to distinguish certain body fluid types. The ongoing need is to discover novel methylation markers and fully validate them to enhance their evidentiary strength in complex forensic scenarios. This research gathered forensic-related DNA methylation microarrays data from the Gene Expression Omnibus (GEO) database. A novel screening strategy for marker selection was developed, combining feature selection algorithms (elastic net, information gain ratio, feature importance based on Random Forest, and mutual information coefficient) with epigenetic pattern analysis, to identify CpG markers for body fluid identification. The selected CpGs were validated through pyrosequencing on peripheral blood, saliva, semen, vaginal secretions, and menstrual blood samples, and machine learning classification models were constructed based on the sequencing results. Pyrosequencing results revealed 14 CpGs with high specificity in five types of body fluid samples. A machine learning classification model, developed based on the pyrosequencing results, could effectively distinguish five types of body fluid samples, achieving 100 % accuracy on the test set. Utilizing six CpG markers, it was also feasible to attain ideal efficacy in identifying body fluid stains. Our research proposes a systematic and scientific strategy for screening body fluid-specific CpGs, contributing new insights and methods to forensic body fluid identification.</div></div>","PeriodicalId":12341,"journal":{"name":"Forensic science international","volume":"367 ","pages":"Article 112339"},"PeriodicalIF":2.2000,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Forensic science international","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0379073824004213","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MEDICINE, LEGAL","Score":null,"Total":0}
引用次数: 0
Abstract
Identification of body fluid stain at crime scene is one of the important tasks of forensic evidence analysis. Currently, body fluid-specific CpGs detected by DNA methylation microarray screening, have been widely studied for forensic body fluid identification. However, some CpGs have limited ability to distinguish certain body fluid types. The ongoing need is to discover novel methylation markers and fully validate them to enhance their evidentiary strength in complex forensic scenarios. This research gathered forensic-related DNA methylation microarrays data from the Gene Expression Omnibus (GEO) database. A novel screening strategy for marker selection was developed, combining feature selection algorithms (elastic net, information gain ratio, feature importance based on Random Forest, and mutual information coefficient) with epigenetic pattern analysis, to identify CpG markers for body fluid identification. The selected CpGs were validated through pyrosequencing on peripheral blood, saliva, semen, vaginal secretions, and menstrual blood samples, and machine learning classification models were constructed based on the sequencing results. Pyrosequencing results revealed 14 CpGs with high specificity in five types of body fluid samples. A machine learning classification model, developed based on the pyrosequencing results, could effectively distinguish five types of body fluid samples, achieving 100 % accuracy on the test set. Utilizing six CpG markers, it was also feasible to attain ideal efficacy in identifying body fluid stains. Our research proposes a systematic and scientific strategy for screening body fluid-specific CpGs, contributing new insights and methods to forensic body fluid identification.
期刊介绍:
Forensic Science International is the flagship journal in the prestigious Forensic Science International family, publishing the most innovative, cutting-edge, and influential contributions across the forensic sciences. Fields include: forensic pathology and histochemistry, chemistry, biochemistry and toxicology, biology, serology, odontology, psychiatry, anthropology, digital forensics, the physical sciences, firearms, and document examination, as well as investigations of value to public health in its broadest sense, and the important marginal area where science and medicine interact with the law.
The journal publishes:
Case Reports
Commentaries
Letters to the Editor
Original Research Papers (Regular Papers)
Rapid Communications
Review Articles
Technical Notes.