Comparison of data augmentation and classification algorithms based on plastic spectroscopy†

IF 2.6 3区化学 Q2 CHEMISTRY, ANALYTICAL

Analytical Methods Pub Date : 2025-01-16 DOI:10.1039/D4AY01759E

Jiachao Luo, Qunbiao Wu, Jin Cao, Haifeng Fang, Chenyang Xu and Defang He

{"title":"Comparison of data augmentation and classification algorithms based on plastic spectroscopy†","authors":"Jiachao Luo, Qunbiao Wu, Jin Cao, Haifeng Fang, Chenyang Xu and Defang He","doi":"10.1039/D4AY01759E","DOIUrl":null,"url":null,"abstract":"<p >Plastic waste management is one of the key issues in global environmental protection. Integrating spectroscopy acquisition devices with deep learning algorithms has emerged as an effective method for rapid plastic classification. However, the challenges in collecting plastic samples and spectroscopy data have resulted in a limited number of data samples and an incomplete comparison of relevant classification algorithms. To address this issue, we propose a plastic spectroscopy generation model and conduct a systematic analysis and comparison of different algorithms' performance from multiple perspectives, based on data augmentation. This paper first performs cubic interpolation, normalization, S–G filtering, linear detrending, and standard normal variate (SNV) transformations as preprocessing methods on plastic spectral data collected from public datasets using techniques such as Fourier Transform Infrared Spectroscopy (FTIR), Raman Spectroscopy (RAMAN), and Laser Induced Breakdown Spectroscopy (LIBS). The results, based on Principal Component Analysis (PCA) visualization, demonstrate that the preprocessing steps help improve classification accuracy. Additionally, PCA loading is used to explain the chemical classification features of each spectral device. Secondly, to tackle the issue of insufficient sample size, we propose a plastic spectroscopy generation model based on C-GAN, which effectively handles multi-class spectroscopy generation. The generated spectra are subjectively validated through difference spectroscopy and t-SNE to confirm their consistency with real spectra, and this conclusion is objectively validated using Maximum Mean Discrepancy (MMD). Finally, we compared the classification accuracy of machine learning algorithms, including Support Vector Machine (SVM), Back Propagation Neural Network (BP), K-Nearest Neighbors (KNN), Random Forest (RF), and Decision Tree (DT), with deep learning algorithms such as GoogleNet and ResNet under various conditions. The results indicate that after data augmentation using the plastic spectrum generation model, the accuracy of each classification model improved by at least 3% compared to pre-augmentation levels. Notably, for data collected <em>via</em> FTIR, the classification accuracy reached a peak of 0.991 under the 1D-ResNet model when the data were augmented twofold. For small sample datasets, traditional machine learning algorithms, such as SVM and RF, demonstrated high stability and accuracy, with only minimal differences compared to deep learning algorithms. However, on large sample datasets, deep learning algorithms showed a stronger advantage. Regarding data input formats, 1D input models generally outperformed 2D input models. Grad-CAM visualizations further illustrated that the 1D-ResNet model achieved the highest classification accuracy, primarily due to its ability to more accurately identify peak features in the data.</p>","PeriodicalId":64,"journal":{"name":"Analytical Methods","volume":" 6","pages":" 1236-1251"},"PeriodicalIF":2.6000,"publicationDate":"2025-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Analytical Methods","FirstCategoryId":"92","ListUrlMain":"https://pubs.rsc.org/en/content/articlelanding/2025/ay/d4ay01759e","RegionNum":3,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CHEMISTRY, ANALYTICAL","Score":null,"Total":0}

引用次数: 0

Abstract

Plastic waste management is one of the key issues in global environmental protection. Integrating spectroscopy acquisition devices with deep learning algorithms has emerged as an effective method for rapid plastic classification. However, the challenges in collecting plastic samples and spectroscopy data have resulted in a limited number of data samples and an incomplete comparison of relevant classification algorithms. To address this issue, we propose a plastic spectroscopy generation model and conduct a systematic analysis and comparison of different algorithms' performance from multiple perspectives, based on data augmentation. This paper first performs cubic interpolation, normalization, S–G filtering, linear detrending, and standard normal variate (SNV) transformations as preprocessing methods on plastic spectral data collected from public datasets using techniques such as Fourier Transform Infrared Spectroscopy (FTIR), Raman Spectroscopy (RAMAN), and Laser Induced Breakdown Spectroscopy (LIBS). The results, based on Principal Component Analysis (PCA) visualization, demonstrate that the preprocessing steps help improve classification accuracy. Additionally, PCA loading is used to explain the chemical classification features of each spectral device. Secondly, to tackle the issue of insufficient sample size, we propose a plastic spectroscopy generation model based on C-GAN, which effectively handles multi-class spectroscopy generation. The generated spectra are subjectively validated through difference spectroscopy and t-SNE to confirm their consistency with real spectra, and this conclusion is objectively validated using Maximum Mean Discrepancy (MMD). Finally, we compared the classification accuracy of machine learning algorithms, including Support Vector Machine (SVM), Back Propagation Neural Network (BP), K-Nearest Neighbors (KNN), Random Forest (RF), and Decision Tree (DT), with deep learning algorithms such as GoogleNet and ResNet under various conditions. The results indicate that after data augmentation using the plastic spectrum generation model, the accuracy of each classification model improved by at least 3% compared to pre-augmentation levels. Notably, for data collected via FTIR, the classification accuracy reached a peak of 0.991 under the 1D-ResNet model when the data were augmented twofold. For small sample datasets, traditional machine learning algorithms, such as SVM and RF, demonstrated high stability and accuracy, with only minimal differences compared to deep learning algorithms. However, on large sample datasets, deep learning algorithms showed a stronger advantage. Regarding data input formats, 1D input models generally outperformed 2D input models. Grad-CAM visualizations further illustrated that the 1D-ResNet model achieved the highest classification accuracy, primarily due to its ability to more accurately identify peak features in the data.

Abstract Image

查看原文本刊更多论文

基于塑料光谱的数据增强与分类算法比较。

塑料废物管理是全球环境保护的关键问题之一。将光谱采集设备与深度学习算法相结合已成为塑料快速分类的有效方法。然而，在收集塑料样品和光谱数据方面的挑战导致数据样本数量有限，相关分类算法的比较不完整。针对这一问题，我们提出了一种塑性光谱生成模型，并在数据增强的基础上，从多个角度对不同算法的性能进行了系统的分析和比较。本文首先利用傅里叶变换红外光谱（FTIR）、拉曼光谱（Raman）和激光诱导分解光谱（LIBS）等技术，对从公共数据集中收集的塑料光谱数据进行三次插值、归一化、S-G滤波、线性去趋势和标准正态变量（SNV）变换作为预处理方法。基于主成分分析（PCA）可视化的结果表明，预处理步骤有助于提高分类精度。此外，PCA加载用于解释每个光谱装置的化学分类特征。其次，针对样本量不足的问题，提出了一种基于C-GAN的塑料光谱生成模型，该模型有效地处理了多类光谱的生成。通过差分光谱和t-SNE对生成的光谱进行了主观上的验证，以确认其与真实光谱的一致性，并利用最大平均差异（MMD）对这一结论进行了客观上的验证。最后，我们比较了支持向量机（SVM）、反向传播神经网络（BP）、k近邻（KNN）、随机森林（RF）和决策树（DT）等机器学习算法与GoogleNet和ResNet等深度学习算法在不同条件下的分类准确率。结果表明，使用塑性谱生成模型对数据进行增强后，各分类模型的准确率比增强前提高了至少3%。值得注意的是，对于通过FTIR采集的数据，当数据增强两倍时，在1D-ResNet模型下，分类准确率达到了0.991的峰值。对于小样本数据集，传统的机器学习算法，如SVM和RF，表现出很高的稳定性和准确性，与深度学习算法相比只有很小的差异。然而，在大样本数据集上，深度学习算法显示出更强的优势。在数据输入格式方面，一维输入模型普遍优于二维输入模型。Grad-CAM可视化进一步表明，1D-ResNet模型实现了最高的分类精度，主要是因为它能够更准确地识别数据中的峰值特征。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊