{"title":"基于SSA优化CatBoost的甘南草原土壤有机碳含量估算[j]。","authors":"Zi-Ming Ma, Mei-Ling Zhang, Xing-Yu Liu","doi":"10.13227/j.hjkx.202408081","DOIUrl":null,"url":null,"abstract":"<p><p>Estimating the content of soil organic carbon (SOC) in Gannan Tibetan Autonomous Prefecture, studying its spatial distribution characteristics, and clarifying the main influencing factors of SOC are of great significance for improving grassland quality, optimizing management, regulating climate, and maintaining ecosystem functions. Taking the grassland in Gannan Tibetan Autonomous Prefecture of Gansu Province as the research object, multi-feature factor data were constructed by integrating data such as soil properties, meteorological factors, elevation, and vegetation index, and 24 significant feature factors were screened out using Pearson correlation analysis. Then, the normalized contribution degree was obtained according to the SHAP value. The machine learning model was used to divide the 8∶2 training set and test set, and the results were obtained by ten-fold cross-validation. According to the evaluation models such as MAE, RMSE, and <i>R</i><sup>2</sup>, the sparrow search algorithm (SSA) and whale optimization algorithm (WOA) were used to optimize the parameters and estimate the SOC content. The results showed that the spatial distribution of SOC reserves on grassland surface in Gannan Tibetan Autonomous Prefecture based on the model was gradually decreasing from west to east, being high in the northwest and low in the southeast, with relatively low average temperature and high organic carbon content in the northwest. The annual average temperature, enhanced vegetation index (EVI), and digital elevation model (DEM) contributed significantly to the SOC content of Gannan grassland, which were the main factors affecting the spatial distribution of SOC. Among the random forest, decision tree, gradient lifting regression, CatBoost, XGBoost, and LightGBM, the CatBoost model performed best on the test set. According to the convergence rate curves of SSA and WOA, it was found that SSA converged faster, and updating parameters was more effective. The optimized SSA-CatBoost model performed best in predicting SOC content. The spatial distribution of SOC has an important impact on the ecosystem and carbon cycle in the region. The grassland in the northwest of the Gannan region has greater potential in soil fertility and carbon storage, which is helpful to formulate more effective soil management and ecological protection strategies, slow down the process of climate warming, and further promote the sustainable development of the global ecosystem.</p>","PeriodicalId":35937,"journal":{"name":"环境科学","volume":"46 8","pages":"4961-4970"},"PeriodicalIF":0.0000,"publicationDate":"2025-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"[Estimation of Soil Organic Carbon Content in Gannan Grassland Based on SSA Optimized CatBoost].\",\"authors\":\"Zi-Ming Ma, Mei-Ling Zhang, Xing-Yu Liu\",\"doi\":\"10.13227/j.hjkx.202408081\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Estimating the content of soil organic carbon (SOC) in Gannan Tibetan Autonomous Prefecture, studying its spatial distribution characteristics, and clarifying the main influencing factors of SOC are of great significance for improving grassland quality, optimizing management, regulating climate, and maintaining ecosystem functions. Taking the grassland in Gannan Tibetan Autonomous Prefecture of Gansu Province as the research object, multi-feature factor data were constructed by integrating data such as soil properties, meteorological factors, elevation, and vegetation index, and 24 significant feature factors were screened out using Pearson correlation analysis. Then, the normalized contribution degree was obtained according to the SHAP value. The machine learning model was used to divide the 8∶2 training set and test set, and the results were obtained by ten-fold cross-validation. According to the evaluation models such as MAE, RMSE, and <i>R</i><sup>2</sup>, the sparrow search algorithm (SSA) and whale optimization algorithm (WOA) were used to optimize the parameters and estimate the SOC content. The results showed that the spatial distribution of SOC reserves on grassland surface in Gannan Tibetan Autonomous Prefecture based on the model was gradually decreasing from west to east, being high in the northwest and low in the southeast, with relatively low average temperature and high organic carbon content in the northwest. The annual average temperature, enhanced vegetation index (EVI), and digital elevation model (DEM) contributed significantly to the SOC content of Gannan grassland, which were the main factors affecting the spatial distribution of SOC. Among the random forest, decision tree, gradient lifting regression, CatBoost, XGBoost, and LightGBM, the CatBoost model performed best on the test set. According to the convergence rate curves of SSA and WOA, it was found that SSA converged faster, and updating parameters was more effective. The optimized SSA-CatBoost model performed best in predicting SOC content. The spatial distribution of SOC has an important impact on the ecosystem and carbon cycle in the region. The grassland in the northwest of the Gannan region has greater potential in soil fertility and carbon storage, which is helpful to formulate more effective soil management and ecological protection strategies, slow down the process of climate warming, and further promote the sustainable development of the global ecosystem.</p>\",\"PeriodicalId\":35937,\"journal\":{\"name\":\"环境科学\",\"volume\":\"46 8\",\"pages\":\"4961-4970\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-08-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"环境科学\",\"FirstCategoryId\":\"1087\",\"ListUrlMain\":\"https://doi.org/10.13227/j.hjkx.202408081\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"Environmental Science\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"环境科学","FirstCategoryId":"1087","ListUrlMain":"https://doi.org/10.13227/j.hjkx.202408081","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Environmental Science","Score":null,"Total":0}
[Estimation of Soil Organic Carbon Content in Gannan Grassland Based on SSA Optimized CatBoost].
Estimating the content of soil organic carbon (SOC) in Gannan Tibetan Autonomous Prefecture, studying its spatial distribution characteristics, and clarifying the main influencing factors of SOC are of great significance for improving grassland quality, optimizing management, regulating climate, and maintaining ecosystem functions. Taking the grassland in Gannan Tibetan Autonomous Prefecture of Gansu Province as the research object, multi-feature factor data were constructed by integrating data such as soil properties, meteorological factors, elevation, and vegetation index, and 24 significant feature factors were screened out using Pearson correlation analysis. Then, the normalized contribution degree was obtained according to the SHAP value. The machine learning model was used to divide the 8∶2 training set and test set, and the results were obtained by ten-fold cross-validation. According to the evaluation models such as MAE, RMSE, and R2, the sparrow search algorithm (SSA) and whale optimization algorithm (WOA) were used to optimize the parameters and estimate the SOC content. The results showed that the spatial distribution of SOC reserves on grassland surface in Gannan Tibetan Autonomous Prefecture based on the model was gradually decreasing from west to east, being high in the northwest and low in the southeast, with relatively low average temperature and high organic carbon content in the northwest. The annual average temperature, enhanced vegetation index (EVI), and digital elevation model (DEM) contributed significantly to the SOC content of Gannan grassland, which were the main factors affecting the spatial distribution of SOC. Among the random forest, decision tree, gradient lifting regression, CatBoost, XGBoost, and LightGBM, the CatBoost model performed best on the test set. According to the convergence rate curves of SSA and WOA, it was found that SSA converged faster, and updating parameters was more effective. The optimized SSA-CatBoost model performed best in predicting SOC content. The spatial distribution of SOC has an important impact on the ecosystem and carbon cycle in the region. The grassland in the northwest of the Gannan region has greater potential in soil fertility and carbon storage, which is helpful to formulate more effective soil management and ecological protection strategies, slow down the process of climate warming, and further promote the sustainable development of the global ecosystem.