Muhammad Hassan , Khabat Khosravi , Travis J. Esau , Gurjit S. Randhawa , Aitazaz A. Farooque , Seyyed Ebrahim Hashemi Garmdareh , Yulin Hu , Nauman Yaqoob , Asad T. Jappa
{"title":"Nitrous oxide prediction through machine learning and field-based experimentation: A novel strategy for data-driven insights","authors":"Muhammad Hassan , Khabat Khosravi , Travis J. Esau , Gurjit S. Randhawa , Aitazaz A. Farooque , Seyyed Ebrahim Hashemi Garmdareh , Yulin Hu , Nauman Yaqoob , Asad T. Jappa","doi":"10.1016/j.aeaoa.2025.100335","DOIUrl":null,"url":null,"abstract":"<div><div>Applying machine learning to predict complex environmental phenomena like greenhouse gas emissions (GHG) is gaining significant attention. This study introduces innovative ensemble learning models that integrate the randomizable filter classifier (RFC), regression by discretization (RBD), and attribute-selected classifier (ASC) with the random forest (RF) algorithm, resulting in hybrid models (RFC-RF, RBD-RF, and ASC-RF). These models predicted nitrous oxide (N<sub>2</sub>O) and water vapor (H<sub>2</sub>O) emissions from agricultural soils. These model were benchmarked against a support vector regression (SVR) model. The dataset comprised 401 samples from potato fields in Prince Edward Island (PEI) and 122 samples from New Brunswick (NB), including measurements of N<sub>2</sub>O and H<sub>2</sub>O and related input variables such as soil moisture (SM), temperature ST, electrical conductivity (EC), wind speed, solar radiation, relative humidity, precipitation, air temperature (AT), dew point, vapor pressure deficit, and reference evapotranspiration. Feature selection and optimization of input scenarios were achieved using a combination of particle swarm optimization (PSO) and manual methods. Model performance was evaluated using multiple metrics: scatter plots, kite diagrams, density distribution histograms of relative percentage error, coefficient of determination (R<sup>2</sup>), Nash–Sutcliffe efficiency coefficient (NSE), Percent of BIAS (PBIAS), coefficient of uncertainty at the 95 % confidence level (U95 %), Kling–Gupta efficiency (KGE), Willmott index of agreement (WI), and Legates and McCabe coefficient of efficiency (LME). Results demonstrated that the hybrid RFC-RF model outperformed the other models for N<sub>2</sub>O and H<sub>2</sub>O predictions in PEI and NB, followed by the RBD-RF, ASC-RF, and SVR models. The new models demonstrated good performance according to R<sup>2</sup> values, while the SVR model ranged from unacceptable to good. The study found that combining soil and climatic variables improved prediction accuracy, with ST, AT, and soil EC being the most influential variables. SHapley Additive exPlanations (SHAP) analysis confirmed the importance of ST for both N<sub>2</sub>O and H<sub>2</sub>O predictions. The findings underscore the significance of dataset length over input-output correlation and indicate that combining soil and climatic variables enhances model prediction accuracy. The developed models offer reliable and cost-effective tools for researchers, policymakers, and stakeholders to effectively predict and manage GHG in agricultural contexts.</div></div>","PeriodicalId":37150,"journal":{"name":"Atmospheric Environment: X","volume":"26 ","pages":"Article 100335"},"PeriodicalIF":3.4000,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Atmospheric Environment: X","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2590162125000255","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENVIRONMENTAL SCIENCES","Score":null,"Total":0}
引用次数: 0
Abstract
Applying machine learning to predict complex environmental phenomena like greenhouse gas emissions (GHG) is gaining significant attention. This study introduces innovative ensemble learning models that integrate the randomizable filter classifier (RFC), regression by discretization (RBD), and attribute-selected classifier (ASC) with the random forest (RF) algorithm, resulting in hybrid models (RFC-RF, RBD-RF, and ASC-RF). These models predicted nitrous oxide (N2O) and water vapor (H2O) emissions from agricultural soils. These model were benchmarked against a support vector regression (SVR) model. The dataset comprised 401 samples from potato fields in Prince Edward Island (PEI) and 122 samples from New Brunswick (NB), including measurements of N2O and H2O and related input variables such as soil moisture (SM), temperature ST, electrical conductivity (EC), wind speed, solar radiation, relative humidity, precipitation, air temperature (AT), dew point, vapor pressure deficit, and reference evapotranspiration. Feature selection and optimization of input scenarios were achieved using a combination of particle swarm optimization (PSO) and manual methods. Model performance was evaluated using multiple metrics: scatter plots, kite diagrams, density distribution histograms of relative percentage error, coefficient of determination (R2), Nash–Sutcliffe efficiency coefficient (NSE), Percent of BIAS (PBIAS), coefficient of uncertainty at the 95 % confidence level (U95 %), Kling–Gupta efficiency (KGE), Willmott index of agreement (WI), and Legates and McCabe coefficient of efficiency (LME). Results demonstrated that the hybrid RFC-RF model outperformed the other models for N2O and H2O predictions in PEI and NB, followed by the RBD-RF, ASC-RF, and SVR models. The new models demonstrated good performance according to R2 values, while the SVR model ranged from unacceptable to good. The study found that combining soil and climatic variables improved prediction accuracy, with ST, AT, and soil EC being the most influential variables. SHapley Additive exPlanations (SHAP) analysis confirmed the importance of ST for both N2O and H2O predictions. The findings underscore the significance of dataset length over input-output correlation and indicate that combining soil and climatic variables enhances model prediction accuracy. The developed models offer reliable and cost-effective tools for researchers, policymakers, and stakeholders to effectively predict and manage GHG in agricultural contexts.