Yong Ju Lee, Chang Woo Jeong, Kim Hong Taek, Tai-Ju Lee, Hyoung Jin Kim
{"title":"Feature selection of Raman spectra for forensic document examination using machine learning","authors":"Yong Ju Lee, Chang Woo Jeong, Kim Hong Taek, Tai-Ju Lee, Hyoung Jin Kim","doi":"10.1039/d4an01529k","DOIUrl":null,"url":null,"abstract":"Forensics relies on the differentiation and classification of document papers, particularly in cases involving document forgery and fraud. In this study, document paper is classified by integrating Raman spectroscopy with machine learning models, namely, random forest (RF), support vector machines (SVMs), and feed-forward neural networks (FNNs). Among the machine learning models, the RF model effectively caculated the feature importance and identified the critical spectral region contributing to classification, enhancing the transparency and interpretability of the result. Spectral preprocessing with first derivative, significantly improved the classification performance. The spectral range 200–1,650 cm⁻¹ was identified as a highly informative region for differntiation, reducing the number of input variables from 756 to 360 while enhancing the model accuracy. The FNN model outperformed the RF and SVM, with an F1 score of 0.968. The results underscore the potential of combining Raman spectroscopy with machine learning for forensic document examination, offering an interpretable, computationally efficient, and robust approach for paper classification.","PeriodicalId":63,"journal":{"name":"Analyst","volume":"69 1","pages":""},"PeriodicalIF":3.6000,"publicationDate":"2025-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Analyst","FirstCategoryId":"92","ListUrlMain":"https://doi.org/10.1039/d4an01529k","RegionNum":3,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CHEMISTRY, ANALYTICAL","Score":null,"Total":0}
引用次数: 0
Abstract
Forensics relies on the differentiation and classification of document papers, particularly in cases involving document forgery and fraud. In this study, document paper is classified by integrating Raman spectroscopy with machine learning models, namely, random forest (RF), support vector machines (SVMs), and feed-forward neural networks (FNNs). Among the machine learning models, the RF model effectively caculated the feature importance and identified the critical spectral region contributing to classification, enhancing the transparency and interpretability of the result. Spectral preprocessing with first derivative, significantly improved the classification performance. The spectral range 200–1,650 cm⁻¹ was identified as a highly informative region for differntiation, reducing the number of input variables from 756 to 360 while enhancing the model accuracy. The FNN model outperformed the RF and SVM, with an F1 score of 0.968. The results underscore the potential of combining Raman spectroscopy with machine learning for forensic document examination, offering an interpretable, computationally efficient, and robust approach for paper classification.