Tiantian Liu , Xiangna Han , Yafang Yin , Guanglan Xi , Zhiguo Zhang , Jian Sun , Gang Chen , Lintong Zhang , Liuyang Han
{"title":"基于Wasserstein gan增强树模型的考古淹水木材多属性预测","authors":"Tiantian Liu , Xiangna Han , Yafang Yin , Guanglan Xi , Zhiguo Zhang , Jian Sun , Gang Chen , Lintong Zhang , Liuyang Han","doi":"10.1016/j.culher.2025.09.005","DOIUrl":null,"url":null,"abstract":"<div><div>To address the challenges of non-destructive evaluation and limited sample availability for waterlogged archaeological wood (WAW), this study developed a predictive model for physico-mechanical properties using near-infrared (NIR) spectroscopy. Furthermore, we proposed a data augmentation framework based on the Wasserstein Generative Adversarial Network with Gradient Penalty (WGAN-GP) to extend the NIR spectral data of WAW and associated physico-mechanical parameters - maximum water content (MWC), basic density (BD), modulus of rupture (MOR), and fracture strain (FS). Tree-based ensemble learning models (LGBM and Multi-Scale Derivative Enhanced Gradient Boosting Machine, MSDE-GBM) were built using the data generated by WGAN-GP, and the effect of extended dataset size on model performance was systematically investigated. The results showed significant correlations among the four physico-mechanical parameters of WAW, validating the feasibility of a multi-target generation mechanism to simultaneously synthesize spectral data corresponding to MWC, BD, MOR, and FS. Analysis of the generated data revealed that the WGAN-GP-generated spectral data exhibited significant noise during the initial training epochs; however, the morphology and smoothness of the synthetic spectra progressively approximated the real data with increasing training cycles, improving both diversity and authenticity. Further experiments identified optimal training epochs for different augmented dataset sizes: 4000 epochs for datasets expanded to 300 and 900 samples, and 6000 epochs for the 600-sample dataset. Subsequent modeling using data generated at these optimal epochs confirmed that WGAN-GP augmented datasets significantly improved the performance of LGBM and MSDE-GBM in predicting MWC and BD. Compared to the original dataset, the optimal models achieved RMSE reductions of 47.9 % (LGBM) and 59.9 % (MSDE-GBM) for MWC, 29.2 % (LGBM) and 13.3 % (MSDE-GBM) for BD. In contrast, the lower prediction accuracy for MOR and FS (R²< 0.7) highlighted the complex mapping relationships between micro-scale mechanical parameters (tested via thermomechanical analysis, TMA) and NIR spectral data. This study pioneers the simultaneous prediction of multiple WAW performance parameters, providing a novel paradigm for small sample regression modeling in heritage conservation. The generated data were successfully applied to assess the degradation of wooden components from the Southern Song Dynasty “Nanhai I” shipwreck and the Qing Dynasty “Zhiyuan” shipwreck, providing critical data-driven support for scientific conservation strategies of waterlogged archaeological artifacts.</div></div>","PeriodicalId":15480,"journal":{"name":"Journal of Cultural Heritage","volume":"76 ","pages":"Pages 86-98"},"PeriodicalIF":3.3000,"publicationDate":"2025-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Multi-property prediction of waterlogged archaeological wood based on Wasserstein GAN-augmented tree models\",\"authors\":\"Tiantian Liu , Xiangna Han , Yafang Yin , Guanglan Xi , Zhiguo Zhang , Jian Sun , Gang Chen , Lintong Zhang , Liuyang Han\",\"doi\":\"10.1016/j.culher.2025.09.005\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>To address the challenges of non-destructive evaluation and limited sample availability for waterlogged archaeological wood (WAW), this study developed a predictive model for physico-mechanical properties using near-infrared (NIR) spectroscopy. Furthermore, we proposed a data augmentation framework based on the Wasserstein Generative Adversarial Network with Gradient Penalty (WGAN-GP) to extend the NIR spectral data of WAW and associated physico-mechanical parameters - maximum water content (MWC), basic density (BD), modulus of rupture (MOR), and fracture strain (FS). Tree-based ensemble learning models (LGBM and Multi-Scale Derivative Enhanced Gradient Boosting Machine, MSDE-GBM) were built using the data generated by WGAN-GP, and the effect of extended dataset size on model performance was systematically investigated. The results showed significant correlations among the four physico-mechanical parameters of WAW, validating the feasibility of a multi-target generation mechanism to simultaneously synthesize spectral data corresponding to MWC, BD, MOR, and FS. Analysis of the generated data revealed that the WGAN-GP-generated spectral data exhibited significant noise during the initial training epochs; however, the morphology and smoothness of the synthetic spectra progressively approximated the real data with increasing training cycles, improving both diversity and authenticity. Further experiments identified optimal training epochs for different augmented dataset sizes: 4000 epochs for datasets expanded to 300 and 900 samples, and 6000 epochs for the 600-sample dataset. Subsequent modeling using data generated at these optimal epochs confirmed that WGAN-GP augmented datasets significantly improved the performance of LGBM and MSDE-GBM in predicting MWC and BD. Compared to the original dataset, the optimal models achieved RMSE reductions of 47.9 % (LGBM) and 59.9 % (MSDE-GBM) for MWC, 29.2 % (LGBM) and 13.3 % (MSDE-GBM) for BD. In contrast, the lower prediction accuracy for MOR and FS (R²< 0.7) highlighted the complex mapping relationships between micro-scale mechanical parameters (tested via thermomechanical analysis, TMA) and NIR spectral data. This study pioneers the simultaneous prediction of multiple WAW performance parameters, providing a novel paradigm for small sample regression modeling in heritage conservation. The generated data were successfully applied to assess the degradation of wooden components from the Southern Song Dynasty “Nanhai I” shipwreck and the Qing Dynasty “Zhiyuan” shipwreck, providing critical data-driven support for scientific conservation strategies of waterlogged archaeological artifacts.</div></div>\",\"PeriodicalId\":15480,\"journal\":{\"name\":\"Journal of Cultural Heritage\",\"volume\":\"76 \",\"pages\":\"Pages 86-98\"},\"PeriodicalIF\":3.3000,\"publicationDate\":\"2025-09-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Cultural Heritage\",\"FirstCategoryId\":\"103\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1296207425002018\",\"RegionNum\":2,\"RegionCategory\":\"综合性期刊\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"0\",\"JCRName\":\"ARCHAEOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Cultural Heritage","FirstCategoryId":"103","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1296207425002018","RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"ARCHAEOLOGY","Score":null,"Total":0}
Multi-property prediction of waterlogged archaeological wood based on Wasserstein GAN-augmented tree models
To address the challenges of non-destructive evaluation and limited sample availability for waterlogged archaeological wood (WAW), this study developed a predictive model for physico-mechanical properties using near-infrared (NIR) spectroscopy. Furthermore, we proposed a data augmentation framework based on the Wasserstein Generative Adversarial Network with Gradient Penalty (WGAN-GP) to extend the NIR spectral data of WAW and associated physico-mechanical parameters - maximum water content (MWC), basic density (BD), modulus of rupture (MOR), and fracture strain (FS). Tree-based ensemble learning models (LGBM and Multi-Scale Derivative Enhanced Gradient Boosting Machine, MSDE-GBM) were built using the data generated by WGAN-GP, and the effect of extended dataset size on model performance was systematically investigated. The results showed significant correlations among the four physico-mechanical parameters of WAW, validating the feasibility of a multi-target generation mechanism to simultaneously synthesize spectral data corresponding to MWC, BD, MOR, and FS. Analysis of the generated data revealed that the WGAN-GP-generated spectral data exhibited significant noise during the initial training epochs; however, the morphology and smoothness of the synthetic spectra progressively approximated the real data with increasing training cycles, improving both diversity and authenticity. Further experiments identified optimal training epochs for different augmented dataset sizes: 4000 epochs for datasets expanded to 300 and 900 samples, and 6000 epochs for the 600-sample dataset. Subsequent modeling using data generated at these optimal epochs confirmed that WGAN-GP augmented datasets significantly improved the performance of LGBM and MSDE-GBM in predicting MWC and BD. Compared to the original dataset, the optimal models achieved RMSE reductions of 47.9 % (LGBM) and 59.9 % (MSDE-GBM) for MWC, 29.2 % (LGBM) and 13.3 % (MSDE-GBM) for BD. In contrast, the lower prediction accuracy for MOR and FS (R²< 0.7) highlighted the complex mapping relationships between micro-scale mechanical parameters (tested via thermomechanical analysis, TMA) and NIR spectral data. This study pioneers the simultaneous prediction of multiple WAW performance parameters, providing a novel paradigm for small sample regression modeling in heritage conservation. The generated data were successfully applied to assess the degradation of wooden components from the Southern Song Dynasty “Nanhai I” shipwreck and the Qing Dynasty “Zhiyuan” shipwreck, providing critical data-driven support for scientific conservation strategies of waterlogged archaeological artifacts.
期刊介绍:
The Journal of Cultural Heritage publishes original papers which comprise previously unpublished data and present innovative methods concerning all aspects of science and technology of cultural heritage as well as interpretation and theoretical issues related to preservation.