Comparison of decision tree and Naïve Bayes algorithms in detecting trace residue of gasoline based on GC–MS data

IF 1.4 4区 医学 Q3 MEDICINE, LEGAL
Md Gezani Bin Md Ghazi, Loong Chuen Lee, A S Samsudin, H Sino
{"title":"Comparison of decision tree and Naïve Bayes algorithms in detecting trace residue of gasoline based on GC–MS data","authors":"Md Gezani Bin Md Ghazi, Loong Chuen Lee, A S Samsudin, H Sino","doi":"10.1093/fsr/owad031","DOIUrl":null,"url":null,"abstract":"Abstract Fire debris analysis aims to detect and identify any ignitable liquid residues in burnt residues collected at a fire scene. Typically, the burnt residues are analysed using gas chromatography–mass spectrometry (GC–MS) and are manually interpreted. The interpretation process can be laborious due to the complexity and high dimensionality of the GC–MS data. Therefore, this study aims to compare the potential of classification and regression tree (CART) and naïve Bayes (NB) algorithms in analysing the pixel-level GC–MS data of fire debris. The data comprise 14 positive (i.e. fire debris with traces of gasoline) and 24 negative (i.e. fire debris without traces of gasoline) samples. The differences between the positive and negative samples were first inspected based on the mean chromatograms and scores plots of the principal component analysis technique. Then, CART and NB algorithms were independently applied to the GC–MS data. Stratified random resampling was applied to prepare three sets of 200 pairs of training and testing samples (i.e. split ratio of 7:3, 8:2, and 9:1) for estimating the prediction accuracies. Although both the positive and negative samples were hardly differentiated based on the mean chromatograms and scores plots of principal component analysis, the respective NB and CART predictive models produced satisfactory performances with the normalized GC–MS data, i.e. majority achieved prediction accuracy >70%. NB consistently outperformed CART based on the prediction accuracies of testing samples and the corresponding risk of overfitting except when evaluated using only 10% of samples. The accuracy of CART was found to be inversely proportional to the number of testing samples; meanwhile, NB demonstrated rather consistent performances across the three split ratios. In conclusion, NB seems to be much better than CART based on the robustness against the number of testing samples and the consistent lower risk of overfitting.","PeriodicalId":45852,"journal":{"name":"Forensic Sciences Research","volume":"31 1","pages":"0"},"PeriodicalIF":1.4000,"publicationDate":"2023-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Forensic Sciences Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/fsr/owad031","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MEDICINE, LEGAL","Score":null,"Total":0}
引用次数: 0

Abstract

Abstract Fire debris analysis aims to detect and identify any ignitable liquid residues in burnt residues collected at a fire scene. Typically, the burnt residues are analysed using gas chromatography–mass spectrometry (GC–MS) and are manually interpreted. The interpretation process can be laborious due to the complexity and high dimensionality of the GC–MS data. Therefore, this study aims to compare the potential of classification and regression tree (CART) and naïve Bayes (NB) algorithms in analysing the pixel-level GC–MS data of fire debris. The data comprise 14 positive (i.e. fire debris with traces of gasoline) and 24 negative (i.e. fire debris without traces of gasoline) samples. The differences between the positive and negative samples were first inspected based on the mean chromatograms and scores plots of the principal component analysis technique. Then, CART and NB algorithms were independently applied to the GC–MS data. Stratified random resampling was applied to prepare three sets of 200 pairs of training and testing samples (i.e. split ratio of 7:3, 8:2, and 9:1) for estimating the prediction accuracies. Although both the positive and negative samples were hardly differentiated based on the mean chromatograms and scores plots of principal component analysis, the respective NB and CART predictive models produced satisfactory performances with the normalized GC–MS data, i.e. majority achieved prediction accuracy >70%. NB consistently outperformed CART based on the prediction accuracies of testing samples and the corresponding risk of overfitting except when evaluated using only 10% of samples. The accuracy of CART was found to be inversely proportional to the number of testing samples; meanwhile, NB demonstrated rather consistent performances across the three split ratios. In conclusion, NB seems to be much better than CART based on the robustness against the number of testing samples and the consistent lower risk of overfitting.
基于GC-MS数据的决策树与Naïve贝叶斯算法在汽油痕量残留检测中的比较
摘要火灾碎片分析的目的是在火灾现场收集的燃烧残留物中检测和识别可燃液体残留物。通常,燃烧残留物使用气相色谱-质谱(GC-MS)进行分析,并进行人工解释。由于GC-MS数据的复杂性和高维性,解释过程可能很费力。因此,本研究旨在比较分类回归树(CART)和naïve贝叶斯(NB)算法在分析火灾碎片像元级GC-MS数据中的潜力。数据包括14个阳性样本(即有汽油痕迹的火灾碎片)和24个阴性样本(即没有汽油痕迹的火灾碎片)。首先根据主成分分析技术的平均色谱图和分值图来检验阳性和阴性样品之间的差异。然后,将CART和NB算法分别应用于GC-MS数据。采用分层随机重抽样的方法,准备了3组200对训练样本和测试样本(即分割比为7:3、8:2和9:1),用于估计预测精度。虽然根据主成分分析的平均色谱图和分值图很难区分阳性和阴性样品,但各自的NB和CART预测模型对归一化的GC-MS数据具有令人满意的性能,即大多数预测精度达到70%。除了仅使用10%的样本进行评估外,基于测试样本的预测准确性和相应的过拟合风险,NB始终优于CART。CART的准确度与检测样本数成反比;同时,NB在三个分割比率中表现出相当一致的表现。总之,基于对测试样本数量的稳健性和始终较低的过拟合风险,NB似乎比CART好得多。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Forensic Sciences Research
Forensic Sciences Research MEDICINE, LEGAL-
CiteScore
3.60
自引率
7.70%
发文量
158
审稿时长
26 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信