Kang-Jae Kim, Jin-Ho Kim, Geunyong Park, Myung-Joon Jeong
{"title":"使用机器学习方法对韩国传统纸张特性进行预测建模(第二部分):使用随机森林预测羰基含量和分析变量重要性","authors":"Kang-Jae Kim, Jin-Ho Kim, Geunyong Park, Myung-Joon Jeong","doi":"10.7584/jktappi.2023.10.55.5.13","DOIUrl":null,"url":null,"abstract":"This paper introduces a random forest regression model trained with infrared spectral data to predict the carbonyl content of Hanji, a traditional Korean paper. The random forest model demonstrated excellent performance in carbonyl content prediction, surpassing the results obtained from the partial least squares model. To optimize the infrared spectral range for prediction, the spectral range was restricted from the entire range of 4000-400 cm-1 to the narrower range of 1800-1200 cm-1, known for its suitability in characterizing paper properties. This limitation enhanced the coefficients of determination of the model, increasing it from 0.921 to 0.937. A permutation variable importance measure was then applied to identify the key spectral regions contributing to carbonyl content prediction. The analysis pinpointed the 1650-1350 cm-1 range as a crucial region for accurate predictions. Subsequently, a new prediction model was built using data exclusively from this important region, yielding remarkably improved coefficients of determination of 0.960 and 0.965 for the raw and second derivative spectra, respectively. These findings affirm the validity and significance of the critical region identified by the permutation variable importance measure. The predictive performance of the established models is valid within the range of 7.2 to 29.4 μmol/g of carbonyl content in the training set.","PeriodicalId":52548,"journal":{"name":"Palpu Chongi Gisul/Journal of Korea Technical Association of the Pulp and Paper Industry","volume":"46 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Predictive Modeling of Korean Traditional Paper Characteristics Using Machine Learning Approaches (Part 2): Prediction of Carbonyl Content and Analysis of Variable Importance Using Random Forest\",\"authors\":\"Kang-Jae Kim, Jin-Ho Kim, Geunyong Park, Myung-Joon Jeong\",\"doi\":\"10.7584/jktappi.2023.10.55.5.13\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper introduces a random forest regression model trained with infrared spectral data to predict the carbonyl content of Hanji, a traditional Korean paper. The random forest model demonstrated excellent performance in carbonyl content prediction, surpassing the results obtained from the partial least squares model. To optimize the infrared spectral range for prediction, the spectral range was restricted from the entire range of 4000-400 cm-1 to the narrower range of 1800-1200 cm-1, known for its suitability in characterizing paper properties. This limitation enhanced the coefficients of determination of the model, increasing it from 0.921 to 0.937. A permutation variable importance measure was then applied to identify the key spectral regions contributing to carbonyl content prediction. The analysis pinpointed the 1650-1350 cm-1 range as a crucial region for accurate predictions. Subsequently, a new prediction model was built using data exclusively from this important region, yielding remarkably improved coefficients of determination of 0.960 and 0.965 for the raw and second derivative spectra, respectively. These findings affirm the validity and significance of the critical region identified by the permutation variable importance measure. The predictive performance of the established models is valid within the range of 7.2 to 29.4 μmol/g of carbonyl content in the training set.\",\"PeriodicalId\":52548,\"journal\":{\"name\":\"Palpu Chongi Gisul/Journal of Korea Technical Association of the Pulp and Paper Industry\",\"volume\":\"46 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-10-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Palpu Chongi Gisul/Journal of Korea Technical Association of the Pulp and Paper Industry\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.7584/jktappi.2023.10.55.5.13\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"Engineering\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Palpu Chongi Gisul/Journal of Korea Technical Association of the Pulp and Paper Industry","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.7584/jktappi.2023.10.55.5.13","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Engineering","Score":null,"Total":0}
Predictive Modeling of Korean Traditional Paper Characteristics Using Machine Learning Approaches (Part 2): Prediction of Carbonyl Content and Analysis of Variable Importance Using Random Forest
This paper introduces a random forest regression model trained with infrared spectral data to predict the carbonyl content of Hanji, a traditional Korean paper. The random forest model demonstrated excellent performance in carbonyl content prediction, surpassing the results obtained from the partial least squares model. To optimize the infrared spectral range for prediction, the spectral range was restricted from the entire range of 4000-400 cm-1 to the narrower range of 1800-1200 cm-1, known for its suitability in characterizing paper properties. This limitation enhanced the coefficients of determination of the model, increasing it from 0.921 to 0.937. A permutation variable importance measure was then applied to identify the key spectral regions contributing to carbonyl content prediction. The analysis pinpointed the 1650-1350 cm-1 range as a crucial region for accurate predictions. Subsequently, a new prediction model was built using data exclusively from this important region, yielding remarkably improved coefficients of determination of 0.960 and 0.965 for the raw and second derivative spectra, respectively. These findings affirm the validity and significance of the critical region identified by the permutation variable importance measure. The predictive performance of the established models is valid within the range of 7.2 to 29.4 μmol/g of carbonyl content in the training set.