{"title":"Cost-effective optimisation and validation of the VISAGE enhanced tool assay on the Illumina NovaSeq 6000 platform","authors":"Lauren Verstraeten , Kristina Fokias , Gitte Saerens , Bram Bekaert","doi":"10.1016/j.fsigen.2025.103299","DOIUrl":"10.1016/j.fsigen.2025.103299","url":null,"abstract":"<div><div>Numerous forensic age prediction models based on DNA methylation markers have been developed, each differing in the number of predictive markers, statistical method, biomatrix, and sequencing platform used. This variability highlighted the need for more uniformity in the development of epigenetic clocks. To partially address this need, the VISAGE Consortium introduced the VISAGE enhanced tool assay, a multi-tissue assay that targets eight age-associated genes (<em>ELOVL2</em>, <em>EDARADD</em>, <em>ASPA</em>, <em>FHL2</em>, <em>MIR29B2CHG</em>, <em>KLF14</em>, <em>TRIM59</em>, and <em>PDE4C</em>). So far, three models were built using this assay for age prediction in blood, buccal cells, and bones, based on Illumina MiSeq sequencing data with the v3 reagent kit (2 × 300 bp). Unfortunately, the existing models are neither publicly accessible nor permitted for use in forensic casework. To address this limitation, we developed our own age estimation model utilising the VISAGE enhanced tool assay in combination with the Illumina NovaSeq 6000 platform and the v1.5 reagent kit (2 × 150 bp). By employing the same assay, we streamlined the workflow and enhanced uniformity, as there was no need to identify additional age-associated genes. By adjusting the assay’s primer concentrations, we achieved sufficient read depths to accurately determine methylation levels, even for longer amplicons with partial sequencing strand coverage. This modified assay was used to develop an age estimation model in blood (n = 98) with a mean absolute error (MAE) of 3.22 years and root mean squared error (RMSE) of 3.77 years in the test set (n = 30). Overall, this study demonstrated that by adjusting primer concentrations, equal age estimation performances can be achieved with the added benefit of drastically reduced costs and turn-around-time by using a 2 × 150 bp sequencing strategy. Additionally, this study was the first to independently validate the VISAGE enhanced tool assay on a different sequencing platform, exploring its potential for broader applications and partially answering the need for more uniformity.</div></div>","PeriodicalId":50435,"journal":{"name":"Forensic Science International-Genetics","volume":"78 ","pages":"Article 103299"},"PeriodicalIF":3.2,"publicationDate":"2025-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144071182","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Informed consent for forensic genetic population studies: Status quo and a call for harmonization","authors":"Martin Bodner , Walther Parson","doi":"10.1016/j.fsigen.2025.103298","DOIUrl":"10.1016/j.fsigen.2025.103298","url":null,"abstract":"<div><div>Donor-signed informed consent is a fundamental prerequisite for ethically correct analysis and publication of genetic data in forensic population studies, including quality assessment of datasets and their inclusion into frequency databases. While considerations on the requirement and content of informed consent have been published, little information is available with regard to the actual nature of the documents currently in use. This study investigated 50 recent informed consent forms submitted to EMPOP and STRidER from a broad range of contributors across worldwide legislations, irrespective of the quality of the associated genetic data. The common ground of the informed consent forms, their specific content and differences, and the extent to which they contain suggested components are outlined. This evaluation of authentic informed consent form diversity adds to the discussion on formal aspects to be covered at the time of sampling and may expedite future harmonization of informed consent in forensic population studies, assuring ethical principles in the application of precious sample sets for a broad range of investigations across genetic disciplines.</div></div>","PeriodicalId":50435,"journal":{"name":"Forensic Science International-Genetics","volume":"78 ","pages":"Article 103298"},"PeriodicalIF":3.2,"publicationDate":"2025-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144067944","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Development of a multiplex recombinase amplification assay for the rapid and concurrent detection of human DNA and sex identification","authors":"Yazi Zheng , Guihong Liu , Qiushuo Wu , Mengyu Tan , Jiaming Xue , Mengna Wu , Lin Zhang , Meili Lv , Miao Liao , Shengqiu Qu , Weibo Liang","doi":"10.1016/j.fsigen.2025.103300","DOIUrl":"10.1016/j.fsigen.2025.103300","url":null,"abstract":"<div><div>In forensic practice, it is essential to identify human DNA and determine the sex of individuals from biological samples collected at crime scenes. Currently, the common detection methods mainly focus on targeted DNA analysis based on PCR technology, which is time-consuming and relies on laboratory equipment. In recent years, recombinase polymerase amplification (RPA), as one of ubiquitous isothermal amplification technology, has gained popularity across various diagnostic fields due to its advantages of rapid processing and minimal temperature control requirements. This study has developed a multiplex RPA assay suitable for human and sex components identification. The assay has good sensitivity (as low as 25 pg) and strong tolerance to inhibitors (in the presence of 200 ng/μL humic acid, 400 ng/μL tannic acid, and 8000 ng/μL collagen). Furthermore, we combined the alkaline lysis and RPA detection to construct a rapid detection scheme, which can shorten detection time to half an hour. We also conducted a preliminary exploration of the visualization scheme for the constructed RPA assay. The above research demonstrates simultaneous and rapid detection of human and sex components, offering an accurate and sensitive detection scheme.</div></div>","PeriodicalId":50435,"journal":{"name":"Forensic Science International-Genetics","volume":"78 ","pages":"Article 103300"},"PeriodicalIF":3.2,"publicationDate":"2025-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143935047","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Olivia L. Martin , Courtney R.H. Lynch , Rachel Fleming
{"title":"Advancing forensic body fluid identification: A comparative analysis of RT-LAMP+CRISPR-Cas12a and established mRNA-based methods","authors":"Olivia L. Martin , Courtney R.H. Lynch , Rachel Fleming","doi":"10.1016/j.fsigen.2025.103297","DOIUrl":"10.1016/j.fsigen.2025.103297","url":null,"abstract":"<div><div>In forensic science, the analysis of body fluid evidence determines the cellular origin of a sample, aiding in the reconstruction of a potential crime. Messenger ribonucleic acid (mRNA) based confirmatory tests address limitations of current conventional methods, providing increased specificity and sensitivity, minimal sample consumption, and the detection of a broader range of body fluids. However, they require expensive instrumentation, longer reaction times, and lack portability. Reverse-transcription loop-mediated isothermal amplification (RT-LAMP) coupled with clustered regular interspaced short palindromic repeats (CRISPR) with CRISPR-associated protein 12a (Cas12a) has the potential to overcome these challenges. This approach offers reduced testing time and cost, while potentially providing equivalent sensitivity and specificity, as observed in the field of viral diagnostics. Visual detection capabilities enable the development of rapid, portable screening tests suitable for testing at the crime scene. In the context of a sexual assault investigation, RT-LAMP+CRISPR-Cas12a could potentially increase the efficiency and detection rate. This study compares this novel method to two other mRNA-based methods, endpoint reverse transcription polymerase chain reaction (RT-PCR) multiplex assay CellTyper 2, and a real-time reverse transcription quantitative PCR (RT-qPCR) multiplex assay. The tests’ sensitivity and specificity were evaluated on single-source and mixed body fluid samples, including rectal mucosa, a fluid which is minimally explored in forensic literature. The RT-qPCR assay demonstrated the highest sensitivity, specificity, and precision in mixed samples. In addition, RT-qPCR offers a greater linear dynamic range, faster processing time and easier methodology compared to CellTyper 2, only limited by its expensive nature. Notably, rectal mucosa samples exhibited non-specific marker expression of CellTyper 2 markers and expression of <em>CYP2B7P</em> (vaginal fluid) for all methods. This emphasises the need for a dedicated rectal mucosa marker. RT-LAMP+CRISPR-Cas12a exhibited a high specificity, displaying off-target expression of <em>CYP2B7P</em> in two fluid types. However, the method lacked sensitivity and precision for most markers except <em>MMP3</em> (menstrual blood), demonstrating detection down to 1:10,000 with 100 % specificity. RT-LAMP+CRISPR requires further development, but its quick, inexpensive nature and high specificity suggest it has potential as a confirmatory test that could reduce the limitations of existing methods.</div></div>","PeriodicalId":50435,"journal":{"name":"Forensic Science International-Genetics","volume":"78 ","pages":"Article 103297"},"PeriodicalIF":3.2,"publicationDate":"2025-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143922870","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ye‑Lim Kwon , Jiwon Kim , Su Min Joo , Kyoung‑Jin Shin
{"title":"Denoising of microhaplotype MPS data using DADA2 and its application to two-person DNA mixture analysis","authors":"Ye‑Lim Kwon , Jiwon Kim , Su Min Joo , Kyoung‑Jin Shin","doi":"10.1016/j.fsigen.2025.103295","DOIUrl":"10.1016/j.fsigen.2025.103295","url":null,"abstract":"<div><div>With the advent of phase-known sequencing enabled by massively parallel sequencing (MPS), research on microhaplotypes (microhaps), multi-single nucleotide polymorphisms within short DNA fragments, has advanced significantly in forensic genetics. However, MPS data inherently contains PCR and sequencing errors, presenting challenges in distinguishing minor contributor alleles from background noise in DNA mixture analysis. Divisive Amplicon Denoising Algorithm 2 (DADA2) has been widely used in microbial research for inferring amplicon sequence variants (ASVs) through computational error correction. However, its potential applicability to forensic identity testing has not been fully explored. In this study, we redesigned an in-house MPS panel targeting 24 multipurpose microhaps and established a pipeline employing DADA2’s ASV inference algorithm to denoise microhap MPS data. Denoising performance was evaluated using 1 ng of DNA from 50 single-source samples. The average <em>not suppressed noise</em> level decreased from 1.2 % to 0.1 % after denoising, achieving a genotype concordance rate of 99.5 % with undenoised data. However, DADA2 had difficulty in distinguishing heterozygous alleles differing only by single indel. In two-person DNA mixture analysis, DADA2-denoising pipeline reduced the number of noise haplotypes by 10-fold across various ratios (1:10, 1:20, 1:50, and 1:100) using 1 ng of total DNA. Even at a 1:100 ratio with 10 pg of minor DNA, noise was detected in only two or fewer markers among the 24 microhaps. These findings highlight the potential of computational error correction for enhancing the accuracy of detecting minor alleles and estimating the number of contributors in forensic analyses.</div></div>","PeriodicalId":50435,"journal":{"name":"Forensic Science International-Genetics","volume":"78 ","pages":"Article 103295"},"PeriodicalIF":3.2,"publicationDate":"2025-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143928963","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Calibration performance of DNAStatistX in a laboratory setting and considerations for likelihood ratio thresholds","authors":"Moya McCarthy-Allen , Jerry Hoogenboom , Rolf J.F. Ypma , Corina C.G. Benschop","doi":"10.1016/j.fsigen.2025.103293","DOIUrl":"10.1016/j.fsigen.2025.103293","url":null,"abstract":"<div><div>A previous study examining calibration and discrimination performance highlighted the need for caution when interpreting “low” likelihood ratios (LRs) derived from maximum likelihood estimate-based models DNAStatistX and EuroForMix [1]. The study reported that calibration performance was dependent on the dataset, dataset size and the subpopulation correction factor (Fst). In the worst case scenario (smallest dataset and Fst 0.01) miscalibration of LRs occurred up to LR ∼1000. In the best case scenario (largest dataset and Fst 0.03) there were signs of miscalibration up to LR ∼100 but not above. In the current study, the discrimination power and calibration performance were examined for DNAStatistX using a dataset that more closely reflects our casework practice. This involved analysing PowerPlex® Fusion 6C data using two different analytical threshold sets and up to three PCR replicate profiles in the LR calculation. The results showed calibration performance that was comparable or better than previous findings for maximum likelihood based (MLE) models. The use of two different sets of analytical thresholds yielded similar results. Calibration performance decreased when replicate profiles were combined in the LR calculation. Additionally, this study demonstrates that using per-dye LRs to assess calibration performance can be beneficial, especially when the dataset size is limited. Overall, the findings support previous research, suggesting that setting a lower threshold for reporting is useful when using MLE-based models. Ideally, the threshold is as low as possible as that may avoid overlooking valuable evidence. An LR value of 1000 seems supported by the data.</div></div>","PeriodicalId":50435,"journal":{"name":"Forensic Science International-Genetics","volume":"78 ","pages":"Article 103293"},"PeriodicalIF":3.2,"publicationDate":"2025-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143913040","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lindsay L. Kotchey , Sophie Lee , Leah Nangeroni , Jacqueline Tyler Thomas , Adam Staadig , Andreas Tillmar , Kimberly Sturk-Andreaggi , Charla Marshall , Mirna Ghemrawi
{"title":"Investigating SNP typing using alternative reference materials with the FORCE panel and QIAseq® chemistry","authors":"Lindsay L. Kotchey , Sophie Lee , Leah Nangeroni , Jacqueline Tyler Thomas , Adam Staadig , Andreas Tillmar , Kimberly Sturk-Andreaggi , Charla Marshall , Mirna Ghemrawi","doi":"10.1016/j.fsigen.2025.103294","DOIUrl":"10.1016/j.fsigen.2025.103294","url":null,"abstract":"<div><div>Advancements in forensic science have introduced single nucleotide polymorphism (SNP) markers as crucial tools in kinship, ancestry, and identity testing, and in predicting phenotypic traits. The emergence of Forensic Genetic Genealogy (FGG) and massively parallel sequencing (MPS) technologies have further enhanced the utility of SNP markers, generating interest in their application within the forensic community. The FORensic Capture Enrichment (FORCE) panel, targeting 5497 SNPs including those associated with ancestry, phenotype, identity, and kinship, is specifically designed for direct kinship comparisons and the recovery of autosomal markers for direct identification in cases where traditional short tandem repeat (STR) typing is unsuitable.</div><div>This study aimed to evaluate alternative reference materials—hair roots, hair shafts, and fingernail clippings—using the FORCE panel and QIAseq® chemistry for direct identification. We assessed SNP recovery and concordance between these materials and buccal swabs, and we compared predictions for phenotype, Y-haplogroup, and biogeographic ancestry. Additionally, the study examined the concordance and performance on two sequencing platforms: MiSeq FGx and NextSeq 550.</div><div>Our results demonstrated high (99.62–100 %) SNP concordance rates between alternative reference materials and buccal swabs, with fingernail samples showing the highest SNP recovery and concordance of the alternate reference materials. Phenotypic, ancestry, and Y-haplogroup ancestry predictions from alternative materials were 100 % consistent with those from buccal samples. However, some discrepancies in phenotype predictions were noted when comparing predictions to self-reported data. Both sequencing platforms provided comparable results, with NextSeq 550 mid-output kit offering 3.5X higher coverage and potential cost-saving advantages through greater sample multiplexing.</div><div>In conclusion, the FORCE panel, combined with QIAseq® chemistry, effectively profiles SNPs from various alternative reference materials, offering reliable concordance and comprehensive genotype recovery even from low-input and degraded samples. This highlights its potential application in forensic investigations where traditional reference samples are unavailable.</div></div>","PeriodicalId":50435,"journal":{"name":"Forensic Science International-Genetics","volume":"78 ","pages":"Article 103294"},"PeriodicalIF":3.2,"publicationDate":"2025-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143916104","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kristiaan J. van der Gaag , Vincent van Marion , Redmar R. van den Berg , Natalie E.C. Weiler , Jerry Hoogenboom , Arnoud Kal , Manfred Kayser , Peter de Knijff , Jeroen F.J. Laros , Titia Sijen , Klaas Slooten
{"title":"Identifying a monozygotic twin brother as a donor of DNA in minimal, mixed forensic stains – A case example","authors":"Kristiaan J. van der Gaag , Vincent van Marion , Redmar R. van den Berg , Natalie E.C. Weiler , Jerry Hoogenboom , Arnoud Kal , Manfred Kayser , Peter de Knijff , Jeroen F.J. Laros , Titia Sijen , Klaas Slooten","doi":"10.1016/j.fsigen.2025.103292","DOIUrl":"10.1016/j.fsigen.2025.103292","url":null,"abstract":"<div><div>In forensic casework, monozygotic twins have always provided a challenge, as routinely used forensic Short Tandem Repeat (STR) profiles are not able to differentiate between the twin individuals. In this study, we applied a method to discriminate between two monozygotic twin brothers in a sexual assault case that is unique and challenging for several reasons: the use of contact stains as evidence, the stains contain DNA from two persons (victim and one of the brothers), have minimal amounts of DNA, and there are PCR inhibiting factors. Despite these challenging factors, we present a successfully solved case in which whole genome sequencing was applied to identify multiple somatic differences between the two brothers. Validation of the developed methods and the identified differences was performed on material provided by the two siblings, before applying the method on two evidentiary stains. A statistical framework was developed to provide a likelihood calculation for this type of analysis in mixed stains. The results were accepted in court and contributed to the conviction of the case suspect. Here we provide the scientific details in order to encourage the use of this approach in more such cases in the future.</div></div>","PeriodicalId":50435,"journal":{"name":"Forensic Science International-Genetics","volume":"78 ","pages":"Article 103292"},"PeriodicalIF":3.2,"publicationDate":"2025-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143941512","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Adriana Castillo , Verónica Gomes , Humberto Ossa , Bibiana Ribeiro , Maria João Prata , Fernando Rondón , Filipa Simão , Leonor Gusmão
{"title":"An overview of the mtDNA diversity across the Colombian Andean region","authors":"Adriana Castillo , Verónica Gomes , Humberto Ossa , Bibiana Ribeiro , Maria João Prata , Fernando Rondón , Filipa Simão , Leonor Gusmão","doi":"10.1016/j.fsigen.2025.103288","DOIUrl":"10.1016/j.fsigen.2025.103288","url":null,"abstract":"<div><div>In Colombia, a country in the northwest corner of South America, populations are highly diverse due to the intercontinental admixture of Native Americans, European settlers, and enslaved Africans. While genetic diversity has been largely assessed based on autosomal markers, studies on mtDNA are much scarcer, allowing only a fragmentary view of the distribution of maternal lineages in the country. In this study the genetic diversity of maternal lineages in Colombian Andean populations was interrogated to infer whether the pattern of structuring was in line with the different colonization histories of the departments within the region. The ultimate goal was to establish a haplotype database for forensic purposes. In a total of 458 individuals born and residing in the departments of the Andean region, haplotypes of the total mtDNA control region were determined and assigned to the corresponding haplogroups. Across the 10 departments, haplotype diversities ranged between 0.9665 and 0.9967, and power of exclusion between 0.9208 and 0.9845. A component ascribed to be of Native American ancestry prevailed in all departments, where 89.27 % of haplotypes in the total sample belonged to mtDNA macro-haplogroups A2, B4, C1, and D. The remaining lineages were of Eurasian (6.65 %) or African (4.08 %) origin. Pairwise <em>F</em><sub>ST</sub> values showed signs of genetic differentiation, but still only reached statistical significance when Risaralda or Cundinamarca were compared with other populations. Principal component analysis showed that the population structure was mainly due to some differences in Native American substrates. The results obtained highlighted a heterogeneity within Andean populations that must be considered when developing mtDNA haplotype databases for forensic purposes. In this context, the use of specific databases is recommended for the departments of Risaralda and Cundinamarca, while the other departments can rely on a single haplotype frequency database.</div></div>","PeriodicalId":50435,"journal":{"name":"Forensic Science International-Genetics","volume":"78 ","pages":"Article 103288"},"PeriodicalIF":3.2,"publicationDate":"2025-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143874291","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mengyu Tan , Yuxuan Tan , Haoyan Jiang , Jiaming Xue , Qiushuo Wu , Yazi Zheng , Guihong Liu , Yuanyuan Xiao , Meili Lv , Miao Liao , Lin Zhang , Shengqiu Qu , Weibo Liang
{"title":"Explainable artificial intelligence in forensic DNA analysis: Alleles identification in challenging electropherograms using supervised machine learning methods","authors":"Mengyu Tan , Yuxuan Tan , Haoyan Jiang , Jiaming Xue , Qiushuo Wu , Yazi Zheng , Guihong Liu , Yuanyuan Xiao , Meili Lv , Miao Liao , Lin Zhang , Shengqiu Qu , Weibo Liang","doi":"10.1016/j.fsigen.2025.103289","DOIUrl":"10.1016/j.fsigen.2025.103289","url":null,"abstract":"<div><div>Challenging samples in capillary electrophoresis (CE)-based short tandem repeat (STR) analysis often produce artefactual signals that cannot be completely filtered out by expert electropherogram (EPG) reading systems, complicating allele interpretation. Previous studies have demonstrated the potential of artificial intelligence (AI) to address this issue by accurately distinguishing allele signals from artefacts in EPGs. Traditional machine learning models offer significant advantages in enhancing the interpretability and transparency of AI models used in DNA analysis, particularly in criminal investigations and legal contexts. In this study, five traditional machine learning algorithms were employed to train and construct models using EPG signal datasets from single-source low-template EPGs, mixture EPGs, and combined datasets. Performance evaluation and validation with additional datasets demonstrated the feasibility of these models in improving the reportability of potential information in EPGs. However, further optimization is needed for mixture EPGs to enhance classification accuracy. Implementing Receiver Operating Characteristic (ROC) curve analysis and prediction probability thresholds effectively reduced false positive classifications. Additionally, a user-friendly platform was developed for EPG signal classification based on machine learning and ensemble learning, allowing for the classification of any signal datasets using traditional machine learning models and combining the prediction results of multiple models. This platform will provide analysts with more optimal and robust results. This study shows that machine-learning-based EPG signal classification models can significantly enhance the efficiency of sample analysis and interpretation, providing a solid foundation for future research.</div></div>","PeriodicalId":50435,"journal":{"name":"Forensic Science International-Genetics","volume":"78 ","pages":"Article 103289"},"PeriodicalIF":3.2,"publicationDate":"2025-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143874272","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}