A chemistry-based explainable machine learning model based on NIR spectra for predicting wood properties and understanding wavelength selection

Laurence Schimleck, Samuel Ayanleye, Stavros Avramidis, Vahid Nasir
{"title":"A chemistry-based explainable machine learning model based on NIR spectra for predicting wood properties and understanding wavelength selection","authors":"Laurence Schimleck, Samuel Ayanleye, Stavros Avramidis, Vahid Nasir","doi":"10.1080/17480272.2023.2265349","DOIUrl":null,"url":null,"abstract":"ABSTRACTA chemistry-based explainable machine learning (ML) approach was used to predict wood properties using near infrared (NIR) spectral data collected from rough and smooth surfaces, and to provide better understanding of the role of important NIR wavelengths (features) in the performance of ML models. NIR spectra collected from western hemlock (Tsuga heterophylla) and coastal Douglas-fir (Pseudotsuga menziesii) boards with rough and smooth surfaces were fed into random forest and TreeNet; a gradient boosting machine algorithm, for predicting wood density, modulus of elasticity (MOE) and modulus of rupture (MOR). The TreeNet model could predict the MOE, MOR, and density with R2 of 0.66, 0.64, and 0.64 using spectra collected from rough surface and R2 of 0.54, 0.46, and 0.46 using spectra collected from smooth surface. TreeNet outperformed the random forest, and for both ML algorithms higher R2 and lower error were obtained using NIR data collected from rough surfaces. This suggested that for Douglass fir and western hemlock, NIR spectra could be collected on a sawn surface prior to surface planing. However, it is difficult to generalize the impact of surface roughness on the performance of predictive model as different factors (e.g. what constitutes a smooth or rough surface, variability of data set in terms of wood properties) impact the success of predictive models. NIR features having the greatest influence on TreeNet models were examined and consistently had wood chemistry specific band assignments. The most important features occurred in the O-H first overtone, and C–H second overtone regions and a narrow zone (approximately 2400–2500 nm) of the C–H stretch C–C stretch combination region. Important features also differed by property and with surface roughness. Explaining ML model performance using the relative importance of the NIR features showed the importance of wood chemistry related information when developing models, however MOE and MOR TreeNet models based on smooth surface NIR spectra showed an increased importance of water related features. Overall, the chemistry-based explainable machine learning model approach allows for identification of important NIR features, and regions, and aids in understanding how they contribute to the performance of NIR-based wood property predictive models.KEYWORDS: Wood materialsmechanical propertiesensemble learninggradient boosting machinenear-infrared spectroscopysurface roughness Disclosure statementNo potential conflict of interest was reported by the author(s).","PeriodicalId":368077,"journal":{"name":"Wood Material Science and Engineering","volume":"31 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Wood Material Science and Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1080/17480272.2023.2265349","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

ABSTRACTA chemistry-based explainable machine learning (ML) approach was used to predict wood properties using near infrared (NIR) spectral data collected from rough and smooth surfaces, and to provide better understanding of the role of important NIR wavelengths (features) in the performance of ML models. NIR spectra collected from western hemlock (Tsuga heterophylla) and coastal Douglas-fir (Pseudotsuga menziesii) boards with rough and smooth surfaces were fed into random forest and TreeNet; a gradient boosting machine algorithm, for predicting wood density, modulus of elasticity (MOE) and modulus of rupture (MOR). The TreeNet model could predict the MOE, MOR, and density with R2 of 0.66, 0.64, and 0.64 using spectra collected from rough surface and R2 of 0.54, 0.46, and 0.46 using spectra collected from smooth surface. TreeNet outperformed the random forest, and for both ML algorithms higher R2 and lower error were obtained using NIR data collected from rough surfaces. This suggested that for Douglass fir and western hemlock, NIR spectra could be collected on a sawn surface prior to surface planing. However, it is difficult to generalize the impact of surface roughness on the performance of predictive model as different factors (e.g. what constitutes a smooth or rough surface, variability of data set in terms of wood properties) impact the success of predictive models. NIR features having the greatest influence on TreeNet models were examined and consistently had wood chemistry specific band assignments. The most important features occurred in the O-H first overtone, and C–H second overtone regions and a narrow zone (approximately 2400–2500 nm) of the C–H stretch C–C stretch combination region. Important features also differed by property and with surface roughness. Explaining ML model performance using the relative importance of the NIR features showed the importance of wood chemistry related information when developing models, however MOE and MOR TreeNet models based on smooth surface NIR spectra showed an increased importance of water related features. Overall, the chemistry-based explainable machine learning model approach allows for identification of important NIR features, and regions, and aids in understanding how they contribute to the performance of NIR-based wood property predictive models.KEYWORDS: Wood materialsmechanical propertiesensemble learninggradient boosting machinenear-infrared spectroscopysurface roughness Disclosure statementNo potential conflict of interest was reported by the author(s).
基于近红外光谱的化学可解释机器学习模型,用于预测木材特性和理解波长选择
利用基于化学的可解释机器学习(ML)方法,利用从粗糙和光滑表面收集的近红外(NIR)光谱数据来预测木材性能,并更好地理解重要的近红外波长(特征)在ML模型性能中的作用。将表面粗糙和光滑的西部铁杉(Tsuga heterophylla)和沿海道格拉斯冷杉(pseudosuga menziesii)板的近红外光谱采集到随机森林和TreeNet中;一种梯度增强算法,用于预测木材密度、弹性模量(MOE)和断裂模量(MOR)。TreeNet模型对粗糙表面光谱的MOE、MOR和密度的预测R2分别为0.66、0.64和0.64,对光滑表面光谱的预测R2分别为0.54、0.46和0.46。TreeNet优于随机森林,并且对于两种ML算法,使用从粗糙表面收集的近红外数据获得了更高的R2和更低的误差。这表明,对于道格拉斯冷杉和铁杉,可以在表面刨削之前收集近红外光谱。然而,表面粗糙度对预测模型性能的影响很难概括,因为不同的因素(例如,构成光滑或粗糙表面的因素,木材特性方面数据集的可变性)会影响预测模型的成功。对TreeNet模型影响最大的近红外特征进行了检查,并始终具有木材化学特定的波段分配。最重要的特征出现在O-H第一泛音区和C-H第二泛音区以及C-H拉伸- C-C拉伸组合区的一个狭窄区域(约2400-2500 nm)。重要的特征也因性质和表面粗糙度而不同。利用近红外特征的相对重要性来解释ML模型的性能,表明在开发模型时木材化学相关信息的重要性,然而基于光滑表面近红外光谱的MOE和MOR TreeNet模型显示水相关特征的重要性增加。总体而言,基于化学的可解释机器学习模型方法允许识别重要的近红外特征和区域,并有助于理解它们如何促进基于近红外的木材性能预测模型的性能。关键词:木质材料,机械力学性能,集成学习,梯度增强机,近红外光谱,表面粗糙度
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信