A hybrid machine learning approach to identify potential green cover area for bio–physical suitability mapping in the western semi–arid Rarh region of West Bengal, Purulia

IF 3 4区环境科学与生态学 Q3 ENVIRONMENTAL SCIENCES

Environmental Monitoring and Assessment Pub Date : 2026-05-08 DOI:10.1007/s10661-026-15404-z

Bikash Manna, Shweta Rani

{"title":"A hybrid machine learning approach to identify potential green cover area for bio–physical suitability mapping in the western semi–arid Rarh region of West Bengal, Purulia","authors":"Bikash Manna, Shweta Rani","doi":"10.1007/s10661-026-15404-z","DOIUrl":null,"url":null,"abstract":"<div><p>Forest cover restoration is urgently needed in a semi–arid district of West Bengal, where land degradation endangers environmental stability and community welfare. The present study introduces and validates a robust, data–driven framework using machine learning to isolate optimal sites for afforestation, aiming to enhance climate adaptability and create sustainable, forest–centric livelihood opportunities. The methodology is structured as a sequential, hybrid workflow. Initially, an unsupervised K–Means clustering algorithm was applied to a suite of eleven environmental variables derived from SRTM, Landsat, and national geospatial databases to perform an exploratory delineation of potential zones. This was followed by a meticulous training data generation were manually digitized through high–resolution visual validation on Google Earth Pro. This dataset then served as the basis for training two supervised algorithms: RF and XGBoost. A rigorous comparative evaluation confirmed the superior predictive power of the Random Forest model, which achieved an overall accuracy of 89.1% and Area Under the ROC Curve (AUC) of 0.9508. An interpretability analysis using SHAP further revealed that slope, soil moisture, and elevation were the most critical determinants of suitable area. The primary outcome is spatially explicit suitability map with 20.9% area of the district as potentially suitable for afforestation that serves as a decision–support tool, enabling policymakers and community stakeholders to implement strategic and effective afforestation programs in the study area.</p></div>","PeriodicalId":544,"journal":{"name":"Environmental Monitoring and Assessment","volume":"198 6","pages":""},"PeriodicalIF":3.0000,"publicationDate":"2026-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Environmental Monitoring and Assessment","FirstCategoryId":"93","ListUrlMain":"https://link.springer.com/article/10.1007/s10661-026-15404-z","RegionNum":4,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ENVIRONMENTAL SCIENCES","Score":null,"Total":0}

引用次数: 0

Abstract

Forest cover restoration is urgently needed in a semi–arid district of West Bengal, where land degradation endangers environmental stability and community welfare. The present study introduces and validates a robust, data–driven framework using machine learning to isolate optimal sites for afforestation, aiming to enhance climate adaptability and create sustainable, forest–centric livelihood opportunities. The methodology is structured as a sequential, hybrid workflow. Initially, an unsupervised K–Means clustering algorithm was applied to a suite of eleven environmental variables derived from SRTM, Landsat, and national geospatial databases to perform an exploratory delineation of potential zones. This was followed by a meticulous training data generation were manually digitized through high–resolution visual validation on Google Earth Pro. This dataset then served as the basis for training two supervised algorithms: RF and XGBoost. A rigorous comparative evaluation confirmed the superior predictive power of the Random Forest model, which achieved an overall accuracy of 89.1% and Area Under the ROC Curve (AUC) of 0.9508. An interpretability analysis using SHAP further revealed that slope, soil moisture, and elevation were the most critical determinants of suitable area. The primary outcome is spatially explicit suitability map with 20.9% area of the district as potentially suitable for afforestation that serves as a decision–support tool, enabling policymakers and community stakeholders to implement strategic and effective afforestation programs in the study area.

Abstract Image

查看原文本刊更多论文

一种混合机器学习方法，用于识别普鲁里亚西孟加拉邦西部半干旱地区潜在的绿色覆盖面积，用于生物物理适宜性制图

西孟加拉邦半干旱地区迫切需要恢复森林覆盖，土地退化危及环境稳定和社区福利。本研究介绍并验证了一个强大的、数据驱动的框架，该框架使用机器学习来隔离植树造林的最佳地点，旨在提高气候适应性，创造可持续的、以森林为中心的生计机会。该方法的结构是一个顺序的混合工作流。首先，将无监督K-Means聚类算法应用于来自SRTM、Landsat和国家地理空间数据库的11个环境变量，以进行潜在区域的探索性划分。接下来是细致的训练数据生成，通过谷歌Earth Pro上的高分辨率视觉验证手动数字化。然后，该数据集作为训练两种监督算法的基础：RF和XGBoost。经过严格的对比评估，随机森林模型的预测能力较好，总体准确率为89.1%，ROC曲线下面积（AUC）为0.9508。利用SHAP进行的可解释性分析进一步表明，坡度、土壤湿度和海拔是适宜面积的最关键决定因素。研究结果显示，该地区20.9%的面积具有潜在的植树适宜性，可作为决策支持工具，使政策制定者和社区利益相关者能够在研究区域实施战略和有效的植树造林计划。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Environmental Monitoring and Assessment 环境科学-环境科学

CiteScore

4.70

自引率

6.70%

发文量

1000

审稿时长

7.3 months

期刊介绍： Environmental Monitoring and Assessment emphasizes technical developments and data arising from environmental monitoring and assessment, the use of scientific principles in the design of monitoring systems at the local, regional and global scales, and the use of monitoring data in assessing the consequences of natural resource management actions and pollution risks to man and the environment.