Ahmed Elbeltagi, Dinesh Kumar Vishwakarma, Okan Mert Katipoğlu, Kallem Sushanth, Salim Heddam, Bhaskar Pratap Singh, Abhishek Shukla, Vinay Kumar Gautam, Chaitanya Baliram Pande, Saddam Hussain, Subhankar Ghosh, Hossein Dehghanisanij, Ali Salem
{"title":"在埃及使用基于最佳子集回归模型的数据驱动技术进行气温估计和建模。","authors":"Ahmed Elbeltagi, Dinesh Kumar Vishwakarma, Okan Mert Katipoğlu, Kallem Sushanth, Salim Heddam, Bhaskar Pratap Singh, Abhishek Shukla, Vinay Kumar Gautam, Chaitanya Baliram Pande, Saddam Hussain, Subhankar Ghosh, Hossein Dehghanisanij, Ali Salem","doi":"10.1038/s41598-025-06277-2","DOIUrl":null,"url":null,"abstract":"<p><p>Air temperature plays a critical role in estimating agricultural water requirements, hydrological processes, and the climate change impacts. This study aims to identify the most accurate forecasting model for daily minimum (T<sub>min</sub>) and maximum (T<sub>max</sub>) temperatures in a semi-arid environment. Five machine learning models-linear regression (LR), additive regression (AR), support vector machine (SVM), random subspace (RSS), and M5 pruned (M5P)-were compared for T<sub>max</sub> and T<sub>min</sub> forecasting in Gharbia Governorate, Egypt, using data from 1979 to 2014. The dataset was divided into 75% for training and 25% for testing. Model input combinations were selected based on best subset regression analysis, result shows the best combination was T<sub>min(t-1)</sub>, T<sub>min(t-3)</sub>, T<sub>min(t-4)</sub>, T<sub>min(t-5)</sub>, T<sub>min(t-6)</sub>, T<sub>min(t-7)</sub>, T<sub>min(t-8)</sub> and T<sub>max (t-1)</sub>, T<sub>max (t-2)</sub>, T<sub>max (t-3)</sub>, T<sub>max (t-4)</sub>, T<sub>max (t-5)</sub>, T<sub>max (t-6)</sub>, T<sub>max (t-8)</sub> for daily minimum maximum air temperature forecasting, respectively. The M5P model outperformed the other models in predicting both T<sub>max</sub> and T<sub>min</sub>. For T<sub>min</sub>, the M5P model achieved the lowest root mean square error (RMSE) of 2.4881 °C, mean absolute error (MAE) of 1.9515, and relative absolute error (RAE) of 40.4887, alongside the highest Nash-Sutcliffe efficiency (NSE) of 0.8048 and Pearson correlation coefficient (PCC) of 0.8971. In T<sub>max</sub> forecasting, M5P showed a lower RMSE of 2.7696 °C, MAE of 1.9867, RAE of 29.5440, and higher NSE of 0.8720 and R² of 0.8720. These results suggest that M5P is a robust and precise model for temperature forecasting, significantly outperforming LR, AR, RSS, and SVM models. The findings provide valuable insights for improving decision-making in areas such as water resource management, irrigation systems, and agricultural productivity, offering a reliable tool for enhancing operational efficiency and sustainability in semi-arid regions. The Friedman ANOVA and Dunn's test confirm significant differences among temperature forecasting models. Additive Regression overestimates, while Linear Regression and SVM align closely with actual values. Random Subspace and M5P exhibit high variability, with SVM differing significantly. For maximum temperature, Random Subspace and M5P perform similarly, while SVM remains distinct.</p>","PeriodicalId":21811,"journal":{"name":"Scientific Reports","volume":"15 1","pages":"20200"},"PeriodicalIF":3.9000,"publicationDate":"2025-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12181251/pdf/","citationCount":"0","resultStr":"{\"title\":\"Air temperature estimation and modeling using data driven techniques based on best subset regression model in Egypt.\",\"authors\":\"Ahmed Elbeltagi, Dinesh Kumar Vishwakarma, Okan Mert Katipoğlu, Kallem Sushanth, Salim Heddam, Bhaskar Pratap Singh, Abhishek Shukla, Vinay Kumar Gautam, Chaitanya Baliram Pande, Saddam Hussain, Subhankar Ghosh, Hossein Dehghanisanij, Ali Salem\",\"doi\":\"10.1038/s41598-025-06277-2\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Air temperature plays a critical role in estimating agricultural water requirements, hydrological processes, and the climate change impacts. This study aims to identify the most accurate forecasting model for daily minimum (T<sub>min</sub>) and maximum (T<sub>max</sub>) temperatures in a semi-arid environment. Five machine learning models-linear regression (LR), additive regression (AR), support vector machine (SVM), random subspace (RSS), and M5 pruned (M5P)-were compared for T<sub>max</sub> and T<sub>min</sub> forecasting in Gharbia Governorate, Egypt, using data from 1979 to 2014. The dataset was divided into 75% for training and 25% for testing. Model input combinations were selected based on best subset regression analysis, result shows the best combination was T<sub>min(t-1)</sub>, T<sub>min(t-3)</sub>, T<sub>min(t-4)</sub>, T<sub>min(t-5)</sub>, T<sub>min(t-6)</sub>, T<sub>min(t-7)</sub>, T<sub>min(t-8)</sub> and T<sub>max (t-1)</sub>, T<sub>max (t-2)</sub>, T<sub>max (t-3)</sub>, T<sub>max (t-4)</sub>, T<sub>max (t-5)</sub>, T<sub>max (t-6)</sub>, T<sub>max (t-8)</sub> for daily minimum maximum air temperature forecasting, respectively. The M5P model outperformed the other models in predicting both T<sub>max</sub> and T<sub>min</sub>. For T<sub>min</sub>, the M5P model achieved the lowest root mean square error (RMSE) of 2.4881 °C, mean absolute error (MAE) of 1.9515, and relative absolute error (RAE) of 40.4887, alongside the highest Nash-Sutcliffe efficiency (NSE) of 0.8048 and Pearson correlation coefficient (PCC) of 0.8971. In T<sub>max</sub> forecasting, M5P showed a lower RMSE of 2.7696 °C, MAE of 1.9867, RAE of 29.5440, and higher NSE of 0.8720 and R² of 0.8720. These results suggest that M5P is a robust and precise model for temperature forecasting, significantly outperforming LR, AR, RSS, and SVM models. The findings provide valuable insights for improving decision-making in areas such as water resource management, irrigation systems, and agricultural productivity, offering a reliable tool for enhancing operational efficiency and sustainability in semi-arid regions. The Friedman ANOVA and Dunn's test confirm significant differences among temperature forecasting models. Additive Regression overestimates, while Linear Regression and SVM align closely with actual values. Random Subspace and M5P exhibit high variability, with SVM differing significantly. For maximum temperature, Random Subspace and M5P perform similarly, while SVM remains distinct.</p>\",\"PeriodicalId\":21811,\"journal\":{\"name\":\"Scientific Reports\",\"volume\":\"15 1\",\"pages\":\"20200\"},\"PeriodicalIF\":3.9000,\"publicationDate\":\"2025-06-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12181251/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Scientific Reports\",\"FirstCategoryId\":\"103\",\"ListUrlMain\":\"https://doi.org/10.1038/s41598-025-06277-2\",\"RegionNum\":2,\"RegionCategory\":\"综合性期刊\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"MULTIDISCIPLINARY SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Scientific Reports","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.1038/s41598-025-06277-2","RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
Air temperature estimation and modeling using data driven techniques based on best subset regression model in Egypt.
Air temperature plays a critical role in estimating agricultural water requirements, hydrological processes, and the climate change impacts. This study aims to identify the most accurate forecasting model for daily minimum (Tmin) and maximum (Tmax) temperatures in a semi-arid environment. Five machine learning models-linear regression (LR), additive regression (AR), support vector machine (SVM), random subspace (RSS), and M5 pruned (M5P)-were compared for Tmax and Tmin forecasting in Gharbia Governorate, Egypt, using data from 1979 to 2014. The dataset was divided into 75% for training and 25% for testing. Model input combinations were selected based on best subset regression analysis, result shows the best combination was Tmin(t-1), Tmin(t-3), Tmin(t-4), Tmin(t-5), Tmin(t-6), Tmin(t-7), Tmin(t-8) and Tmax (t-1), Tmax (t-2), Tmax (t-3), Tmax (t-4), Tmax (t-5), Tmax (t-6), Tmax (t-8) for daily minimum maximum air temperature forecasting, respectively. The M5P model outperformed the other models in predicting both Tmax and Tmin. For Tmin, the M5P model achieved the lowest root mean square error (RMSE) of 2.4881 °C, mean absolute error (MAE) of 1.9515, and relative absolute error (RAE) of 40.4887, alongside the highest Nash-Sutcliffe efficiency (NSE) of 0.8048 and Pearson correlation coefficient (PCC) of 0.8971. In Tmax forecasting, M5P showed a lower RMSE of 2.7696 °C, MAE of 1.9867, RAE of 29.5440, and higher NSE of 0.8720 and R² of 0.8720. These results suggest that M5P is a robust and precise model for temperature forecasting, significantly outperforming LR, AR, RSS, and SVM models. The findings provide valuable insights for improving decision-making in areas such as water resource management, irrigation systems, and agricultural productivity, offering a reliable tool for enhancing operational efficiency and sustainability in semi-arid regions. The Friedman ANOVA and Dunn's test confirm significant differences among temperature forecasting models. Additive Regression overestimates, while Linear Regression and SVM align closely with actual values. Random Subspace and M5P exhibit high variability, with SVM differing significantly. For maximum temperature, Random Subspace and M5P perform similarly, while SVM remains distinct.
期刊介绍:
We publish original research from all areas of the natural sciences, psychology, medicine and engineering. You can learn more about what we publish by browsing our specific scientific subject areas below or explore Scientific Reports by browsing all articles and collections.
Scientific Reports has a 2-year impact factor: 4.380 (2021), and is the 6th most-cited journal in the world, with more than 540,000 citations in 2020 (Clarivate Analytics, 2021).
•Engineering
Engineering covers all aspects of engineering, technology, and applied science. It plays a crucial role in the development of technologies to address some of the world''s biggest challenges, helping to save lives and improve the way we live.
•Physical sciences
Physical sciences are those academic disciplines that aim to uncover the underlying laws of nature — often written in the language of mathematics. It is a collective term for areas of study including astronomy, chemistry, materials science and physics.
•Earth and environmental sciences
Earth and environmental sciences cover all aspects of Earth and planetary science and broadly encompass solid Earth processes, surface and atmospheric dynamics, Earth system history, climate and climate change, marine and freshwater systems, and ecology. It also considers the interactions between humans and these systems.
•Biological sciences
Biological sciences encompass all the divisions of natural sciences examining various aspects of vital processes. The concept includes anatomy, physiology, cell biology, biochemistry and biophysics, and covers all organisms from microorganisms, animals to plants.
•Health sciences
The health sciences study health, disease and healthcare. This field of study aims to develop knowledge, interventions and technology for use in healthcare to improve the treatment of patients.