Utilization of Synthetic Near-Infrared Spectra via Generative Adversarial Network to Improve Wood Stiffness Prediction

IF 4.3 3区材料科学 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC

ACS Applied Electronic Materials Pub Date : 2024-03-21 DOI:10.3390/s24061992

Syed Danish Ali, Sameen Raut, Joseph Dahlen, Laurence Schimleck, R. Bergman, Zhou Zhang, Vahid Nasir

{"title":"Utilization of Synthetic Near-Infrared Spectra via Generative Adversarial Network to Improve Wood Stiffness Prediction","authors":"Syed Danish Ali, Sameen Raut, Joseph Dahlen, Laurence Schimleck, R. Bergman, Zhou Zhang, Vahid Nasir","doi":"10.3390/s24061992","DOIUrl":null,"url":null,"abstract":"Near-infrared (NIR) spectroscopy is widely used as a nondestructive evaluation (NDE) tool for predicting wood properties. When deploying NIR models, one faces challenges in ensuring representative training data, which large datasets can mitigate but often at a significant cost. Machine learning and deep learning NIR models are at an even greater disadvantage because they typically require higher sample sizes for training. In this study, NIR spectra were collected to predict the modulus of elasticity (MOE) of southern pine lumber (training set = 573 samples, testing set = 145 samples). To account for the limited size of the training data, this study employed a generative adversarial network (GAN) to generate synthetic NIR spectra. The training dataset was fed into a GAN to generate 313, 573, and 1000 synthetic spectra. The original and enhanced datasets were used to train artificial neural networks (ANNs), convolutional neural networks (CNNs), and light gradient boosting machines (LGBMs) for MOE prediction. Overall, results showed that data augmentation using GAN improved the coefficient of determination (R2) by up to 7.02% and reduced the error of predictions by up to 4.29%. ANNs and CNNs benefited more from synthetic spectra than LGBMs, which only yielded slight improvement. All models showed optimal performance when 313 synthetic spectra were added to the original training data; further additions did not improve model performance because the quality of the datapoints generated by GAN beyond a certain threshold is poor, and one of the main reasons for this can be the size of the initial training data fed into the GAN. LGBMs showed superior performances than ANNs and CNNs on both the original and enhanced training datasets, which highlights the significance of selecting an appropriate machine learning or deep learning model for NIR spectral-data analysis. The results highlighted the positive impact of GAN on the predictive performance of models utilizing NIR spectroscopy as an NDE technique and monitoring tool for wood mechanical-property evaluation. Further studies should investigate the impact of the initial size of training data, the optimal number of generated synthetic spectra, and machine learning or deep learning models that could benefit more from data augmentation using GANs.","PeriodicalId":3,"journal":{"name":"ACS Applied Electronic Materials","volume":" 45","pages":""},"PeriodicalIF":4.3000,"publicationDate":"2024-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACS Applied Electronic Materials","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.3390/s24061992","RegionNum":3,"RegionCategory":"材料科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

Abstract

Near-infrared (NIR) spectroscopy is widely used as a nondestructive evaluation (NDE) tool for predicting wood properties. When deploying NIR models, one faces challenges in ensuring representative training data, which large datasets can mitigate but often at a significant cost. Machine learning and deep learning NIR models are at an even greater disadvantage because they typically require higher sample sizes for training. In this study, NIR spectra were collected to predict the modulus of elasticity (MOE) of southern pine lumber (training set = 573 samples, testing set = 145 samples). To account for the limited size of the training data, this study employed a generative adversarial network (GAN) to generate synthetic NIR spectra. The training dataset was fed into a GAN to generate 313, 573, and 1000 synthetic spectra. The original and enhanced datasets were used to train artificial neural networks (ANNs), convolutional neural networks (CNNs), and light gradient boosting machines (LGBMs) for MOE prediction. Overall, results showed that data augmentation using GAN improved the coefficient of determination (R2) by up to 7.02% and reduced the error of predictions by up to 4.29%. ANNs and CNNs benefited more from synthetic spectra than LGBMs, which only yielded slight improvement. All models showed optimal performance when 313 synthetic spectra were added to the original training data; further additions did not improve model performance because the quality of the datapoints generated by GAN beyond a certain threshold is poor, and one of the main reasons for this can be the size of the initial training data fed into the GAN. LGBMs showed superior performances than ANNs and CNNs on both the original and enhanced training datasets, which highlights the significance of selecting an appropriate machine learning or deep learning model for NIR spectral-data analysis. The results highlighted the positive impact of GAN on the predictive performance of models utilizing NIR spectroscopy as an NDE technique and monitoring tool for wood mechanical-property evaluation. Further studies should investigate the impact of the initial size of training data, the optimal number of generated synthetic spectra, and machine learning or deep learning models that could benefit more from data augmentation using GANs.

查看原文本刊更多论文

通过生成式对抗网络利用合成近红外光谱改进木材硬度预测

近红外（NIR）光谱被广泛用作预测木材特性的无损评价（NDE）工具。在部署近红外模型时，人们面临着确保训练数据具有代表性的挑战，而大型数据集可以缓解这一问题，但往往成本高昂。机器学习和深度学习近红外模型的劣势更大，因为它们通常需要更高的样本量进行训练。本研究收集了近红外光谱来预测南方松木材的弹性模量（MOE）（训练集 = 573 个样本，测试集 = 145 个样本）。由于训练数据规模有限，本研究采用了生成对抗网络（GAN）来生成合成近红外光谱。将训练数据集输入 GAN，生成 313、573 和 1000 个合成光谱。原始数据集和增强数据集用于训练人工神经网络 (ANN)、卷积神经网络 (CNN) 和光梯度提升机 (LGBM)，以进行 MOE 预测。总体而言，结果表明，使用 GAN 进行数据增强可将判定系数 (R2) 提高 7.02%，将预测误差降低 4.29%。与 LGBM 相比，ANN 和 CNN 从合成光谱中获益更多，而 LGBM 只略有改善。当在原始训练数据中添加 313 个合成光谱时，所有模型都显示出最佳性能；进一步添加合成光谱并不能提高模型性能，因为 GAN 生成的数据点质量很差，超过了一定的阈值，而造成这种情况的主要原因之一可能是输入 GAN 的初始训练数据的大小。在原始数据集和增强训练数据集上，LGBM 的表现都优于 ANN 和 CNN，这凸显了为近红外光谱数据分析选择合适的机器学习或深度学习模型的重要性。结果凸显了 GAN 对利用近红外光谱作为木材机械性能评估的无损检测技术和监测工具的模型预测性能的积极影响。进一步的研究应调查训练数据的初始大小、生成合成光谱的最佳数量以及机器学习或深度学习模型的影响，这些都能从使用 GANs 的数据增强中获益更多。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ACS Applied Electronic Materials Multiple-

CiteScore

7.20

自引率

4.30%

发文量

567

期刊介绍： ACS Applied Electronic Materials is an interdisciplinary journal publishing original research covering all aspects of electronic materials. The journal is devoted to reports of new and original experimental and theoretical research of an applied nature that integrate knowledge in the areas of materials science, engineering, optics, physics, and chemistry into important applications of electronic materials. Sample research topics that span the journal's scope are inorganic, organic, ionic and polymeric materials with properties that include conducting, semiconducting, superconducting, insulating, dielectric, magnetic, optoelectronic, piezoelectric, ferroelectric and thermoelectric. Indexed/Abstracted： Web of Science SCIE Scopus CAS INSPEC Portico