{"title":"Covariate selection approaches in spatial prediction of soil quality indices using machine learning models at the watershed scale, west of Iran","authors":"Marziyeh Zandi Baghche-Maryam , Mohsen Sheklabadi , Shamsollah Ayoubi","doi":"10.1016/j.still.2025.106571","DOIUrl":null,"url":null,"abstract":"<div><div>Assessing soil quality indices (SQI’s) is a fundamental approach for agricultural and natural resources as well as sustainable management practices. This study addresses the digital mapping of SQI’s, with the objective of comparing the efficacy of four variable selection methods in Hamadan Province (west of Iran). The following methods were utilized to evaluate the predictive power of three machine learning (ML) models: principal component analysis (PCA), Boruta, recursive feature elimination (RFE), and random forest (RF). The ML models include artificial neural network (ANN), random forest (RF), and Cubist algorithm. Environmental variables were extracted from the digital elevation model (DEM) and Sentinel-2 image and employed in three scenarios: (i) topographic attributes, (ii) remote sensing data, and (iii) integration of scenarios (i) and (ii). A systematic and random grid sampling method was employed to collect 150 soil samples. Surface soil samples, were collected from 0–25 cm depth in agricultural and rangeland areas. The dominant soil groups were Xerorthents, Calcixerepts, and Haploxerepts. The samples were analyzed for physical and chemical properties, and the minimum data set (MDS) was determined by applying PCA. The indicators were scored using both linear and non-linear functions, and the SQI’s were calculated using the Additive Soil Quality Index (SQIa), the Weighted Soil Quality Index (SQIw), and the Nemoro Soil Quality Index (SQIn) methods. The best performances of the SQI’s were observed for SQIn derived from MDS and TDS using linear scoring. The results showed that scenario (iii) consistently yielded the most accurate predictions. The Boruta method and Cubist algorithm produced R² and RMSE values of 0.84 and 0.023, respectively, which were the most optimal in this context. The accuracy assessment demonstrated that the Boruta method and Cubist algorithm exhibited the highest accuracy in all three scenarios for predicting SQI. The uncertainty assessment revealed that the northwestern of the studied regions exhibited a higher degree of uncertainty, which can be attributed to the high diversity in topographic and soil attributes. The findings of this study offer a framework for developing spatial-based models to generate soil quality maps at a large scale, thereby facilitating informed decision-making for the further future land use plannings.</div></div>","PeriodicalId":49503,"journal":{"name":"Soil & Tillage Research","volume":"252 ","pages":"Article 106571"},"PeriodicalIF":6.1000,"publicationDate":"2025-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Soil & Tillage Research","FirstCategoryId":"97","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167198725001254","RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"SOIL SCIENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Assessing soil quality indices (SQI’s) is a fundamental approach for agricultural and natural resources as well as sustainable management practices. This study addresses the digital mapping of SQI’s, with the objective of comparing the efficacy of four variable selection methods in Hamadan Province (west of Iran). The following methods were utilized to evaluate the predictive power of three machine learning (ML) models: principal component analysis (PCA), Boruta, recursive feature elimination (RFE), and random forest (RF). The ML models include artificial neural network (ANN), random forest (RF), and Cubist algorithm. Environmental variables were extracted from the digital elevation model (DEM) and Sentinel-2 image and employed in three scenarios: (i) topographic attributes, (ii) remote sensing data, and (iii) integration of scenarios (i) and (ii). A systematic and random grid sampling method was employed to collect 150 soil samples. Surface soil samples, were collected from 0–25 cm depth in agricultural and rangeland areas. The dominant soil groups were Xerorthents, Calcixerepts, and Haploxerepts. The samples were analyzed for physical and chemical properties, and the minimum data set (MDS) was determined by applying PCA. The indicators were scored using both linear and non-linear functions, and the SQI’s were calculated using the Additive Soil Quality Index (SQIa), the Weighted Soil Quality Index (SQIw), and the Nemoro Soil Quality Index (SQIn) methods. The best performances of the SQI’s were observed for SQIn derived from MDS and TDS using linear scoring. The results showed that scenario (iii) consistently yielded the most accurate predictions. The Boruta method and Cubist algorithm produced R² and RMSE values of 0.84 and 0.023, respectively, which were the most optimal in this context. The accuracy assessment demonstrated that the Boruta method and Cubist algorithm exhibited the highest accuracy in all three scenarios for predicting SQI. The uncertainty assessment revealed that the northwestern of the studied regions exhibited a higher degree of uncertainty, which can be attributed to the high diversity in topographic and soil attributes. The findings of this study offer a framework for developing spatial-based models to generate soil quality maps at a large scale, thereby facilitating informed decision-making for the further future land use plannings.
期刊介绍:
Soil & Tillage Research examines the physical, chemical and biological changes in the soil caused by tillage and field traffic. Manuscripts will be considered on aspects of soil science, physics, technology, mechanization and applied engineering for a sustainable balance among productivity, environmental quality and profitability. The following are examples of suitable topics within the scope of the journal of Soil and Tillage Research:
The agricultural and biosystems engineering associated with tillage (including no-tillage, reduced-tillage and direct drilling), irrigation and drainage, crops and crop rotations, fertilization, rehabilitation of mine spoils and processes used to modify soils. Soil change effects on establishment and yield of crops, growth of plants and roots, structure and erosion of soil, cycling of carbon and nutrients, greenhouse gas emissions, leaching, runoff and other processes that affect environmental quality. Characterization or modeling of tillage and field traffic responses, soil, climate, or topographic effects, soil deformation processes, tillage tools, traction devices, energy requirements, economics, surface and subsurface water quality effects, tillage effects on weed, pest and disease control, and their interactions.