Evaluating machine learning methods on a large-scale of in silico fire debris data

IF 2.6 3区 医学 Q2 CHEMISTRY, ANALYTICAL
Larry Tang , Slun Booppasiri , Michael E. Sigman , Mary R. Williams
{"title":"Evaluating machine learning methods on a large-scale of in silico fire debris data","authors":"Larry Tang ,&nbsp;Slun Booppasiri ,&nbsp;Michael E. Sigman ,&nbsp;Mary R. Williams","doi":"10.1016/j.forc.2025.100652","DOIUrl":null,"url":null,"abstract":"<div><div>A large dataset of 240,000 fire debris samples have been generated in-silico using a data augmentation method at National Center for Forensic Science. The IS samples contain balanced data with 50 % samples having ignitable liquid residue and 50 % only having substrate components. In the big data era, this large dataset is useful for researchers to develop and implement their new machine learning methods. In this paper, we split the data into a training dataset and a test dataset. We then trained seven machine learning methods including logistic regression, least discriminant analysis, quadratic discriminant analysis, support vector machine, random forest, XGBoost, and neural network on an in-silico training dataset. The predictive accuracy and area under the ROC (AUC) of the models was evaluated and compared on both an in-silico test dataset and on an experimental fire debris dataset. In addition, we analyzed both TIS and TIC datasets. For the TIS dataset, neural network provides the highest AUC in both in-silico test and experimental fire debris dataset. Random forest shows the highest performance for the TIC dataset when we binned the retention index.</div></div>","PeriodicalId":324,"journal":{"name":"Forensic Chemistry","volume":"44 ","pages":"Article 100652"},"PeriodicalIF":2.6000,"publicationDate":"2025-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Forensic Chemistry","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2468170925000141","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CHEMISTRY, ANALYTICAL","Score":null,"Total":0}
引用次数: 0

Abstract

A large dataset of 240,000 fire debris samples have been generated in-silico using a data augmentation method at National Center for Forensic Science. The IS samples contain balanced data with 50 % samples having ignitable liquid residue and 50 % only having substrate components. In the big data era, this large dataset is useful for researchers to develop and implement their new machine learning methods. In this paper, we split the data into a training dataset and a test dataset. We then trained seven machine learning methods including logistic regression, least discriminant analysis, quadratic discriminant analysis, support vector machine, random forest, XGBoost, and neural network on an in-silico training dataset. The predictive accuracy and area under the ROC (AUC) of the models was evaluated and compared on both an in-silico test dataset and on an experimental fire debris dataset. In addition, we analyzed both TIS and TIC datasets. For the TIS dataset, neural network provides the highest AUC in both in-silico test and experimental fire debris dataset. Random forest shows the highest performance for the TIC dataset when we binned the retention index.

Abstract Image

在大规模的计算机火灾碎片数据上评估机器学习方法
在国家法医学中心,使用数据增强方法在计算机上生成了24万个火灾碎片样本的大型数据集。IS样品包含平衡数据,其中50%样品具有可燃液体残留物,50%样品仅具有衬底成分。在大数据时代,这个庞大的数据集对研究人员开发和实施新的机器学习方法非常有用。在本文中,我们将数据分为训练数据集和测试数据集。然后,我们在一个计算机训练数据集上训练了七种机器学习方法,包括逻辑回归、最小判别分析、二次判别分析、支持向量机、随机森林、XGBoost和神经网络。在计算机测试数据集和实验火灾碎片数据集上,对模型的预测精度和ROC下面积(AUC)进行了评估和比较。此外,我们分析了TIS和TIC数据集。对于TIS数据集,神经网络在计算机测试和实验火灾碎片数据集中提供了最高的AUC。当我们对保留指数进行分类时,随机森林显示了TIC数据集的最高性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Forensic Chemistry
Forensic Chemistry CHEMISTRY, ANALYTICAL-
CiteScore
5.70
自引率
14.80%
发文量
65
审稿时长
46 days
期刊介绍: Forensic Chemistry publishes high quality manuscripts focusing on the theory, research and application of any chemical science to forensic analysis. The scope of the journal includes fundamental advancements that result in a better understanding of the evidentiary significance derived from the physical and chemical analysis of materials. The scope of Forensic Chemistry will also include the application and or development of any molecular and atomic spectrochemical technique, electrochemical techniques, sensors, surface characterization techniques, mass spectrometry, nuclear magnetic resonance, chemometrics and statistics, and separation sciences (e.g. chromatography) that provide insight into the forensic analysis of materials. Evidential topics of interest to the journal include, but are not limited to, fingerprint analysis, drug analysis, ignitable liquid residue analysis, explosives detection and analysis, the characterization and comparison of trace evidence (glass, fibers, paints and polymers, tapes, soils and other materials), ink and paper analysis, gunshot residue analysis, synthetic pathways for drugs, toxicology and the analysis and chemistry associated with the components of fingermarks. The journal is particularly interested in receiving manuscripts that report advances in the forensic interpretation of chemical evidence. Technology Readiness Level: When submitting an article to Forensic Chemistry, all authors will be asked to self-assign a Technology Readiness Level (TRL) to their article. The purpose of the TRL system is to help readers understand the level of maturity of an idea or method, to help track the evolution of readiness of a given technique or method, and to help filter published articles by the expected ease of implementation in an operation setting within a crime lab.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信