Nitrous oxide prediction through machine learning and field-based experimentation: A novel strategy for data-driven insights

IF 3.4 Q2 ENVIRONMENTAL SCIENCES
Muhammad Hassan , Khabat Khosravi , Travis J. Esau , Gurjit S. Randhawa , Aitazaz A. Farooque , Seyyed Ebrahim Hashemi Garmdareh , Yulin Hu , Nauman Yaqoob , Asad T. Jappa
{"title":"Nitrous oxide prediction through machine learning and field-based experimentation: A novel strategy for data-driven insights","authors":"Muhammad Hassan ,&nbsp;Khabat Khosravi ,&nbsp;Travis J. Esau ,&nbsp;Gurjit S. Randhawa ,&nbsp;Aitazaz A. Farooque ,&nbsp;Seyyed Ebrahim Hashemi Garmdareh ,&nbsp;Yulin Hu ,&nbsp;Nauman Yaqoob ,&nbsp;Asad T. Jappa","doi":"10.1016/j.aeaoa.2025.100335","DOIUrl":null,"url":null,"abstract":"<div><div>Applying machine learning to predict complex environmental phenomena like greenhouse gas emissions (GHG) is gaining significant attention. This study introduces innovative ensemble learning models that integrate the randomizable filter classifier (RFC), regression by discretization (RBD), and attribute-selected classifier (ASC) with the random forest (RF) algorithm, resulting in hybrid models (RFC-RF, RBD-RF, and ASC-RF). These models predicted nitrous oxide (N<sub>2</sub>O) and water vapor (H<sub>2</sub>O) emissions from agricultural soils. These model were benchmarked against a support vector regression (SVR) model. The dataset comprised 401 samples from potato fields in Prince Edward Island (PEI) and 122 samples from New Brunswick (NB), including measurements of N<sub>2</sub>O and H<sub>2</sub>O and related input variables such as soil moisture (SM), temperature ST, electrical conductivity (EC), wind speed, solar radiation, relative humidity, precipitation, air temperature (AT), dew point, vapor pressure deficit, and reference evapotranspiration. Feature selection and optimization of input scenarios were achieved using a combination of particle swarm optimization (PSO) and manual methods. Model performance was evaluated using multiple metrics: scatter plots, kite diagrams, density distribution histograms of relative percentage error, coefficient of determination (R<sup>2</sup>), Nash–Sutcliffe efficiency coefficient (NSE), Percent of BIAS (PBIAS), coefficient of uncertainty at the 95 % confidence level (U95 %), Kling–Gupta efficiency (KGE), Willmott index of agreement (WI), and Legates and McCabe coefficient of efficiency (LME). Results demonstrated that the hybrid RFC-RF model outperformed the other models for N<sub>2</sub>O and H<sub>2</sub>O predictions in PEI and NB, followed by the RBD-RF, ASC-RF, and SVR models. The new models demonstrated good performance according to R<sup>2</sup> values, while the SVR model ranged from unacceptable to good. The study found that combining soil and climatic variables improved prediction accuracy, with ST, AT, and soil EC being the most influential variables. SHapley Additive exPlanations (SHAP) analysis confirmed the importance of ST for both N<sub>2</sub>O and H<sub>2</sub>O predictions. The findings underscore the significance of dataset length over input-output correlation and indicate that combining soil and climatic variables enhances model prediction accuracy. The developed models offer reliable and cost-effective tools for researchers, policymakers, and stakeholders to effectively predict and manage GHG in agricultural contexts.</div></div>","PeriodicalId":37150,"journal":{"name":"Atmospheric Environment: X","volume":"26 ","pages":"Article 100335"},"PeriodicalIF":3.4000,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Atmospheric Environment: X","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2590162125000255","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENVIRONMENTAL SCIENCES","Score":null,"Total":0}
引用次数: 0

Abstract

Applying machine learning to predict complex environmental phenomena like greenhouse gas emissions (GHG) is gaining significant attention. This study introduces innovative ensemble learning models that integrate the randomizable filter classifier (RFC), regression by discretization (RBD), and attribute-selected classifier (ASC) with the random forest (RF) algorithm, resulting in hybrid models (RFC-RF, RBD-RF, and ASC-RF). These models predicted nitrous oxide (N2O) and water vapor (H2O) emissions from agricultural soils. These model were benchmarked against a support vector regression (SVR) model. The dataset comprised 401 samples from potato fields in Prince Edward Island (PEI) and 122 samples from New Brunswick (NB), including measurements of N2O and H2O and related input variables such as soil moisture (SM), temperature ST, electrical conductivity (EC), wind speed, solar radiation, relative humidity, precipitation, air temperature (AT), dew point, vapor pressure deficit, and reference evapotranspiration. Feature selection and optimization of input scenarios were achieved using a combination of particle swarm optimization (PSO) and manual methods. Model performance was evaluated using multiple metrics: scatter plots, kite diagrams, density distribution histograms of relative percentage error, coefficient of determination (R2), Nash–Sutcliffe efficiency coefficient (NSE), Percent of BIAS (PBIAS), coefficient of uncertainty at the 95 % confidence level (U95 %), Kling–Gupta efficiency (KGE), Willmott index of agreement (WI), and Legates and McCabe coefficient of efficiency (LME). Results demonstrated that the hybrid RFC-RF model outperformed the other models for N2O and H2O predictions in PEI and NB, followed by the RBD-RF, ASC-RF, and SVR models. The new models demonstrated good performance according to R2 values, while the SVR model ranged from unacceptable to good. The study found that combining soil and climatic variables improved prediction accuracy, with ST, AT, and soil EC being the most influential variables. SHapley Additive exPlanations (SHAP) analysis confirmed the importance of ST for both N2O and H2O predictions. The findings underscore the significance of dataset length over input-output correlation and indicate that combining soil and climatic variables enhances model prediction accuracy. The developed models offer reliable and cost-effective tools for researchers, policymakers, and stakeholders to effectively predict and manage GHG in agricultural contexts.

Abstract Image

通过机器学习和现场实验预测氧化亚氮:一种数据驱动的新策略
应用机器学习来预测温室气体排放(GHG)等复杂的环境现象正受到广泛关注。本研究引入了创新的集成学习模型,将随机滤波分类器(RFC)、离散化回归(RBD)和属性选择分类器(ASC)与随机森林(RF)算法集成在一起,形成了混合模型(RFC-RF、RBD-RF和ASC-RF)。这些模型预测了农业土壤中氧化亚氮(N2O)和水蒸气(H2O)的排放。这些模型对支持向量回归(SVR)模型进行基准测试。该数据集包括来自爱德华王子岛(PEI)马铃薯田的401个样本和来自新不伦瑞克省(NB)的122个样本,包括N2O和H2O的测量以及相关的输入变量,如土壤湿度(SM)、温度ST、电导率(EC)、风速、太阳辐射、相对湿度、降水、气温(AT)、露点、蒸汽压差和参考蒸散发。采用粒子群算法和人工算法相结合的方法实现了输入场景的特征选择和优化。采用多种指标评估模型的性能:散点图、风筝图、相对误差百分比密度分布直方图、决定系数(R2)、纳什-苏特cliffe效率系数(NSE)、偏倚百分比(PBIAS)、95%置信水平下的不确定系数(u95%)、KGE效率(KGE)、Willmott一致指数(WI)、Legates和McCabe效率系数(LME)。结果表明,混合RFC-RF模型对PEI和NB的N2O和H2O的预测效果优于其他模型,其次是RBD-RF、ASC-RF和SVR模型。根据R2值,新模型表现出良好的性能,而SVR模型则从不可接受到良好。研究发现,结合土壤和气候变量提高了预测精度,其中ST、AT和土壤EC是影响最大的变量。SHapley加性解释(SHAP)分析证实了ST对N2O和H2O预测的重要性。研究结果强调了数据集长度对投入产出相关性的重要性,并表明土壤和气候变量的结合提高了模型的预测精度。开发的模型为研究人员、政策制定者和利益相关者有效预测和管理农业环境下的温室气体提供了可靠和具有成本效益的工具。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Atmospheric Environment: X
Atmospheric Environment: X Environmental Science-Environmental Science (all)
CiteScore
8.00
自引率
0.00%
发文量
47
审稿时长
12 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信