In silico created fire debris data for Machine learning

IF 2.6 3区 医学 Q2 CHEMISTRY, ANALYTICAL
Michael E. Sigman , Mary R. Williams , Larry Tang , Slun Booppasiri , Nikhil Prakash
{"title":"In silico created fire debris data for Machine learning","authors":"Michael E. Sigman ,&nbsp;Mary R. Williams ,&nbsp;Larry Tang ,&nbsp;Slun Booppasiri ,&nbsp;Nikhil Prakash","doi":"10.1016/j.forc.2024.100633","DOIUrl":null,"url":null,"abstract":"<div><div>Attacking complex forensic problems, such as the classification of fire debris data as positive or negative for ignitable liquid residue (ILR), requires large amounts of training data if machine learning approaches are to be successful. This work examines the in-silico preparation of computed fire debris data for training a machine learning method to classify gas chromatography – mass spectrometry (GC–MS) data as positive or negative for ILR, and reports the outcome of validation tests on a set of laboratory-generated fire debris samples with known ground truth. A set of 240,000 total ion chromatograms (TIC) and total ion spectra (TIS) for fire debris (FD) samples were calculated in silico (IS). The IS FD sample set was balanced with 50% of the samples containing ignitable liquid residue (ILR) and substrate pyrolysis (SUB) contributions. The remaining 50% contained only SUB components. The ignitable liquids incorporated into the samples containing ILR were digitally evaporated to simulate weathering observed in experimental fire debris. The IS FD sample TIS were treated by principal component analysis (PCA) with centering and variance scaling and retaining 90% of the variance. A set of 1,117 experimental FD samples were projected into the IS FD PCA model. The recovered experimental FD TIS were compared to the TIS before projection by calculating the residual mean squared error (RMSE) for each sample as a test of the IS FD samples representation of experimental samples. The range of the RMSE was [ 0.012, 0.127] and the median RMSE was 0.029. Experimental FD samples where the recovered TIS had the larger RMSE values were not well-represented by the IS FD samples. The IS FD samples were randomly split into balanced sets for machine learning (ML) training (90%) and validation (10%). An XGBoost ML method, trained on the IS FD training data, was validated on the testing IS FD data, giving a receiver operating curve (ROC) with area under the curve (AUC) of 0.978. Validation of the model against the experimental FD data gave a lower ROC AUC of 0.845. Limiting the experimental data to samples in the lowest quadrant of RMSE values increased the ROC AUC to 0.90.</div></div>","PeriodicalId":324,"journal":{"name":"Forensic Chemistry","volume":"42 ","pages":"Article 100633"},"PeriodicalIF":2.6000,"publicationDate":"2024-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Forensic Chemistry","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2468170924000857","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CHEMISTRY, ANALYTICAL","Score":null,"Total":0}
引用次数: 0

Abstract

Attacking complex forensic problems, such as the classification of fire debris data as positive or negative for ignitable liquid residue (ILR), requires large amounts of training data if machine learning approaches are to be successful. This work examines the in-silico preparation of computed fire debris data for training a machine learning method to classify gas chromatography – mass spectrometry (GC–MS) data as positive or negative for ILR, and reports the outcome of validation tests on a set of laboratory-generated fire debris samples with known ground truth. A set of 240,000 total ion chromatograms (TIC) and total ion spectra (TIS) for fire debris (FD) samples were calculated in silico (IS). The IS FD sample set was balanced with 50% of the samples containing ignitable liquid residue (ILR) and substrate pyrolysis (SUB) contributions. The remaining 50% contained only SUB components. The ignitable liquids incorporated into the samples containing ILR were digitally evaporated to simulate weathering observed in experimental fire debris. The IS FD sample TIS were treated by principal component analysis (PCA) with centering and variance scaling and retaining 90% of the variance. A set of 1,117 experimental FD samples were projected into the IS FD PCA model. The recovered experimental FD TIS were compared to the TIS before projection by calculating the residual mean squared error (RMSE) for each sample as a test of the IS FD samples representation of experimental samples. The range of the RMSE was [ 0.012, 0.127] and the median RMSE was 0.029. Experimental FD samples where the recovered TIS had the larger RMSE values were not well-represented by the IS FD samples. The IS FD samples were randomly split into balanced sets for machine learning (ML) training (90%) and validation (10%). An XGBoost ML method, trained on the IS FD training data, was validated on the testing IS FD data, giving a receiver operating curve (ROC) with area under the curve (AUC) of 0.978. Validation of the model against the experimental FD data gave a lower ROC AUC of 0.845. Limiting the experimental data to samples in the lowest quadrant of RMSE values increased the ROC AUC to 0.90.

Abstract Image

用计算机为机器学习创建了火灾碎片数据
解决复杂的法医问题,例如将火灾碎片数据分类为可燃液体残留物(ILR)的阳性或阴性,如果机器学习方法要成功,就需要大量的训练数据。这项工作检查了计算机火灾碎片数据的计算机准备,用于训练机器学习方法,将气相色谱-质谱(GC-MS)数据分类为ILR的阳性或阴性,并报告了一组实验室生成的具有已知地面真实值的火灾碎片样本的验证测试结果。用计算机(IS)计算了火灾碎屑(FD)样品的总离子色谱图(TIC)和总离子谱(TIS)。IS FD样品集与50%含有可燃液体残渣(ILR)和底物热解(SUB)贡献的样品平衡。剩下的50%只包含SUB组件。将可燃液体加入到含有ILR的样品中,通过数字蒸发来模拟在实验火灾碎片中观察到的风化。IS FD样本TIS采用主成分分析(PCA)进行定心和方差缩放,保留90%的方差。将1117个实验FD样本投影到IS FD PCA模型中。通过计算每个样本的残差均方误差(RMSE),将恢复的实验FD TIS与投影前的TIS进行比较,以检验IS FD样本对实验样本的代表性。RMSE的范围为[0.012,0.127],中位数RMSE为0.029。在实验FD样本中,恢复的TIS具有较大的RMSE值,而IS FD样本没有很好地代表。IS FD样本被随机分成平衡集,用于机器学习(ML)训练(90%)和验证(10%)。在IS FD训练数据上训练的XGBoost ML方法在IS FD测试数据上得到验证,得到的受试者工作曲线(ROC)曲线下面积(AUC)为0.978。根据实验FD数据验证模型的ROC AUC较低,为0.845。将实验数据限制在RMSE值最低象限的样本中,使ROC AUC增加到0.90。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Forensic Chemistry
Forensic Chemistry CHEMISTRY, ANALYTICAL-
CiteScore
5.70
自引率
14.80%
发文量
65
审稿时长
46 days
期刊介绍: Forensic Chemistry publishes high quality manuscripts focusing on the theory, research and application of any chemical science to forensic analysis. The scope of the journal includes fundamental advancements that result in a better understanding of the evidentiary significance derived from the physical and chemical analysis of materials. The scope of Forensic Chemistry will also include the application and or development of any molecular and atomic spectrochemical technique, electrochemical techniques, sensors, surface characterization techniques, mass spectrometry, nuclear magnetic resonance, chemometrics and statistics, and separation sciences (e.g. chromatography) that provide insight into the forensic analysis of materials. Evidential topics of interest to the journal include, but are not limited to, fingerprint analysis, drug analysis, ignitable liquid residue analysis, explosives detection and analysis, the characterization and comparison of trace evidence (glass, fibers, paints and polymers, tapes, soils and other materials), ink and paper analysis, gunshot residue analysis, synthetic pathways for drugs, toxicology and the analysis and chemistry associated with the components of fingermarks. The journal is particularly interested in receiving manuscripts that report advances in the forensic interpretation of chemical evidence. Technology Readiness Level: When submitting an article to Forensic Chemistry, all authors will be asked to self-assign a Technology Readiness Level (TRL) to their article. The purpose of the TRL system is to help readers understand the level of maturity of an idea or method, to help track the evolution of readiness of a given technique or method, and to help filter published articles by the expected ease of implementation in an operation setting within a crime lab.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信