{"title":"Inclusion of fractal dimension in machine learning models improves the prediction accuracy of hydraulic conductivity","authors":"Abhradip Sarkar, Pragati Pramanik Maity, Mrinmoy Ray, Aditi Kundu","doi":"10.1007/s00477-024-02793-1","DOIUrl":null,"url":null,"abstract":"<p>Measurement of hydraulic conductivity (HC) in the field and laboratory is time-consuming, laborious, and expensive, pedo-transfer functions can be used to predict the soil HC using easy-to-measure soil properties like bulk density (BD), soil texture, fractal dimension (D), organic carbon (OC) and glomalin content. In this study, 121 soil samples were used to predict HC using Multi Linear Regression, and four machine learning-based models i.e., Artificial Neural Network (ANN), Support Vector Machine (SVM), Classification and Regression Trees (CART) and Random Forest (RF). Two sets of input data were used i.e., dataset 1: texture data, BD, OC, and glomalin content and dataset 2: D, BD, OC, and glomalin content (Dataset 2). The models were evaluated based on Mean Absolute Error, Mean Absolute Percentage Error, Nash–Sutcliffe model efficiency, Root Mean Square Error (RMSE), and correlation coefficient. ANN with three hidden layers performed significantly for both input sets. The RMSE value was decreased by 17% in the training dataset and by 5.55% in the testing dataset when D was added to the input set for ANN. For both datasets, RF performed better and outperformed CART in predicting HC. According to the results, SVM with dataset 2 outperformed all other models which showed the inclusion of D in the dataset could predict HC more efficiently. However, further study is required for different combinations of datasets for evaluating the prediction efficiency of machine learning models for various regions.</p>","PeriodicalId":21987,"journal":{"name":"Stochastic Environmental Research and Risk Assessment","volume":"12 1","pages":""},"PeriodicalIF":3.9000,"publicationDate":"2024-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Stochastic Environmental Research and Risk Assessment","FirstCategoryId":"93","ListUrlMain":"https://doi.org/10.1007/s00477-024-02793-1","RegionNum":3,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, CIVIL","Score":null,"Total":0}
引用次数: 0
Abstract
Measurement of hydraulic conductivity (HC) in the field and laboratory is time-consuming, laborious, and expensive, pedo-transfer functions can be used to predict the soil HC using easy-to-measure soil properties like bulk density (BD), soil texture, fractal dimension (D), organic carbon (OC) and glomalin content. In this study, 121 soil samples were used to predict HC using Multi Linear Regression, and four machine learning-based models i.e., Artificial Neural Network (ANN), Support Vector Machine (SVM), Classification and Regression Trees (CART) and Random Forest (RF). Two sets of input data were used i.e., dataset 1: texture data, BD, OC, and glomalin content and dataset 2: D, BD, OC, and glomalin content (Dataset 2). The models were evaluated based on Mean Absolute Error, Mean Absolute Percentage Error, Nash–Sutcliffe model efficiency, Root Mean Square Error (RMSE), and correlation coefficient. ANN with three hidden layers performed significantly for both input sets. The RMSE value was decreased by 17% in the training dataset and by 5.55% in the testing dataset when D was added to the input set for ANN. For both datasets, RF performed better and outperformed CART in predicting HC. According to the results, SVM with dataset 2 outperformed all other models which showed the inclusion of D in the dataset could predict HC more efficiently. However, further study is required for different combinations of datasets for evaluating the prediction efficiency of machine learning models for various regions.
期刊介绍:
Stochastic Environmental Research and Risk Assessment (SERRA) will publish research papers, reviews and technical notes on stochastic and probabilistic approaches to environmental sciences and engineering, including interactions of earth and atmospheric environments with people and ecosystems. The basic idea is to bring together research papers on stochastic modelling in various fields of environmental sciences and to provide an interdisciplinary forum for the exchange of ideas, for communicating on issues that cut across disciplinary barriers, and for the dissemination of stochastic techniques used in different fields to the community of interested researchers. Original contributions will be considered dealing with modelling (theoretical and computational), measurements and instrumentation in one or more of the following topical areas:
- Spatiotemporal analysis and mapping of natural processes.
- Enviroinformatics.
- Environmental risk assessment, reliability analysis and decision making.
- Surface and subsurface hydrology and hydraulics.
- Multiphase porous media domains and contaminant transport modelling.
- Hazardous waste site characterization.
- Stochastic turbulence and random hydrodynamic fields.
- Chaotic and fractal systems.
- Random waves and seafloor morphology.
- Stochastic atmospheric and climate processes.
- Air pollution and quality assessment research.
- Modern geostatistics.
- Mechanisms of pollutant formation, emission, exposure and absorption.
- Physical, chemical and biological analysis of human exposure from single and multiple media and routes; control and protection.
- Bioinformatics.
- Probabilistic methods in ecology and population biology.
- Epidemiological investigations.
- Models using stochastic differential equations stochastic or partial differential equations.
- Hazardous waste site characterization.