Larry Tang , Slun Booppasiri , Michael E. Sigman , Mary R. Williams
{"title":"Evaluating machine learning methods on a large-scale of in silico fire debris data","authors":"Larry Tang , Slun Booppasiri , Michael E. Sigman , Mary R. Williams","doi":"10.1016/j.forc.2025.100652","DOIUrl":null,"url":null,"abstract":"<div><div>A large dataset of 240,000 fire debris samples have been generated in-silico using a data augmentation method at National Center for Forensic Science. The IS samples contain balanced data with 50 % samples having ignitable liquid residue and 50 % only having substrate components. In the big data era, this large dataset is useful for researchers to develop and implement their new machine learning methods. In this paper, we split the data into a training dataset and a test dataset. We then trained seven machine learning methods including logistic regression, least discriminant analysis, quadratic discriminant analysis, support vector machine, random forest, XGBoost, and neural network on an in-silico training dataset. The predictive accuracy and area under the ROC (AUC) of the models was evaluated and compared on both an in-silico test dataset and on an experimental fire debris dataset. In addition, we analyzed both TIS and TIC datasets. For the TIS dataset, neural network provides the highest AUC in both in-silico test and experimental fire debris dataset. Random forest shows the highest performance for the TIC dataset when we binned the retention index.</div></div>","PeriodicalId":324,"journal":{"name":"Forensic Chemistry","volume":"44 ","pages":"Article 100652"},"PeriodicalIF":2.6000,"publicationDate":"2025-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Forensic Chemistry","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2468170925000141","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CHEMISTRY, ANALYTICAL","Score":null,"Total":0}
引用次数: 0
Abstract
A large dataset of 240,000 fire debris samples have been generated in-silico using a data augmentation method at National Center for Forensic Science. The IS samples contain balanced data with 50 % samples having ignitable liquid residue and 50 % only having substrate components. In the big data era, this large dataset is useful for researchers to develop and implement their new machine learning methods. In this paper, we split the data into a training dataset and a test dataset. We then trained seven machine learning methods including logistic regression, least discriminant analysis, quadratic discriminant analysis, support vector machine, random forest, XGBoost, and neural network on an in-silico training dataset. The predictive accuracy and area under the ROC (AUC) of the models was evaluated and compared on both an in-silico test dataset and on an experimental fire debris dataset. In addition, we analyzed both TIS and TIC datasets. For the TIS dataset, neural network provides the highest AUC in both in-silico test and experimental fire debris dataset. Random forest shows the highest performance for the TIC dataset when we binned the retention index.
期刊介绍:
Forensic Chemistry publishes high quality manuscripts focusing on the theory, research and application of any chemical science to forensic analysis. The scope of the journal includes fundamental advancements that result in a better understanding of the evidentiary significance derived from the physical and chemical analysis of materials. The scope of Forensic Chemistry will also include the application and or development of any molecular and atomic spectrochemical technique, electrochemical techniques, sensors, surface characterization techniques, mass spectrometry, nuclear magnetic resonance, chemometrics and statistics, and separation sciences (e.g. chromatography) that provide insight into the forensic analysis of materials. Evidential topics of interest to the journal include, but are not limited to, fingerprint analysis, drug analysis, ignitable liquid residue analysis, explosives detection and analysis, the characterization and comparison of trace evidence (glass, fibers, paints and polymers, tapes, soils and other materials), ink and paper analysis, gunshot residue analysis, synthetic pathways for drugs, toxicology and the analysis and chemistry associated with the components of fingermarks. The journal is particularly interested in receiving manuscripts that report advances in the forensic interpretation of chemical evidence. Technology Readiness Level: When submitting an article to Forensic Chemistry, all authors will be asked to self-assign a Technology Readiness Level (TRL) to their article. The purpose of the TRL system is to help readers understand the level of maturity of an idea or method, to help track the evolution of readiness of a given technique or method, and to help filter published articles by the expected ease of implementation in an operation setting within a crime lab.