Predictive Modeling of Korean Traditional Paper Characteristics Using Machine Learning Approaches (Part 2): Prediction of Carbonyl Content and Analysis of Variable Importance Using Random Forest
Kang-Jae Kim, Jin-Ho Kim, Geunyong Park, Myung-Joon Jeong
{"title":"Predictive Modeling of Korean Traditional Paper Characteristics Using Machine Learning Approaches (Part 2): Prediction of Carbonyl Content and Analysis of Variable Importance Using Random Forest","authors":"Kang-Jae Kim, Jin-Ho Kim, Geunyong Park, Myung-Joon Jeong","doi":"10.7584/jktappi.2023.10.55.5.13","DOIUrl":null,"url":null,"abstract":"This paper introduces a random forest regression model trained with infrared spectral data to predict the carbonyl content of Hanji, a traditional Korean paper. The random forest model demonstrated excellent performance in carbonyl content prediction, surpassing the results obtained from the partial least squares model. To optimize the infrared spectral range for prediction, the spectral range was restricted from the entire range of 4000-400 cm-1 to the narrower range of 1800-1200 cm-1, known for its suitability in characterizing paper properties. This limitation enhanced the coefficients of determination of the model, increasing it from 0.921 to 0.937. A permutation variable importance measure was then applied to identify the key spectral regions contributing to carbonyl content prediction. The analysis pinpointed the 1650-1350 cm-1 range as a crucial region for accurate predictions. Subsequently, a new prediction model was built using data exclusively from this important region, yielding remarkably improved coefficients of determination of 0.960 and 0.965 for the raw and second derivative spectra, respectively. These findings affirm the validity and significance of the critical region identified by the permutation variable importance measure. The predictive performance of the established models is valid within the range of 7.2 to 29.4 μmol/g of carbonyl content in the training set.","PeriodicalId":52548,"journal":{"name":"Palpu Chongi Gisul/Journal of Korea Technical Association of the Pulp and Paper Industry","volume":"46 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Palpu Chongi Gisul/Journal of Korea Technical Association of the Pulp and Paper Industry","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.7584/jktappi.2023.10.55.5.13","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Engineering","Score":null,"Total":0}
引用次数: 0
Abstract
This paper introduces a random forest regression model trained with infrared spectral data to predict the carbonyl content of Hanji, a traditional Korean paper. The random forest model demonstrated excellent performance in carbonyl content prediction, surpassing the results obtained from the partial least squares model. To optimize the infrared spectral range for prediction, the spectral range was restricted from the entire range of 4000-400 cm-1 to the narrower range of 1800-1200 cm-1, known for its suitability in characterizing paper properties. This limitation enhanced the coefficients of determination of the model, increasing it from 0.921 to 0.937. A permutation variable importance measure was then applied to identify the key spectral regions contributing to carbonyl content prediction. The analysis pinpointed the 1650-1350 cm-1 range as a crucial region for accurate predictions. Subsequently, a new prediction model was built using data exclusively from this important region, yielding remarkably improved coefficients of determination of 0.960 and 0.965 for the raw and second derivative spectra, respectively. These findings affirm the validity and significance of the critical region identified by the permutation variable importance measure. The predictive performance of the established models is valid within the range of 7.2 to 29.4 μmol/g of carbonyl content in the training set.