Ozgur Kisi, Salim Heddam, Kulwinder Singh Parmar, Andrea Petroselli, Christoph Külls, Mohammad Zounemat-Kermani
{"title":"整合高斯过程回归和K均值聚类,增强短期降雨径流模型。","authors":"Ozgur Kisi, Salim Heddam, Kulwinder Singh Parmar, Andrea Petroselli, Christoph Külls, Mohammad Zounemat-Kermani","doi":"10.1038/s41598-025-91339-8","DOIUrl":null,"url":null,"abstract":"<p><p>Accurate rainfall-runoff modeling is crucial for effective watershed management, hydraulic infrastructure safety, and flood mitigation. However, predicting rainfall-runoff remains challenging due to the nonlinear interplay between hydro-meteorological and topographical variables. This study introduces a hybrid Gaussian process regression (GPR) model integrated with K-means clustering (GPR-K-means) for short-term rainfall-runoff forecasting. The Orgeval watershed in France serves as the study area, providing hourly precipitation and streamflow data spanning 1970-2012. The performance of the GPR-K-means model is compared with standalone GPR and principal component regression (PCR) models across four forecasting horizons: 1-hour, 6-hour, 12-hour, and 24-hour ahead. The results reveal that the GPR-K-means model significantly improves forecasting accuracy across all lead times, with a Nash-Sutcliffe Efficiency (NSE) of approximately 0.999, 0.942, 0.891, and 0.859 for 1-hour, 6-hour, 12-hour, and 24-hour forecasts, respectively. These results outperform other ML models, such as Long Short-Term Memory, Support Vector Machines, and Random Forest, reported in the literature. The GPR-K-means model demonstrates enhanced reliability and robustness in hourly streamflow forecasting, emphasizing its potential for broader application in hydrological modeling. Furthermore, this study provides a novel methodology for combining clustering and Bayesian regression techniques in surface hydrology, contributing to more accurate and timely flood prediction.</p>","PeriodicalId":21811,"journal":{"name":"Scientific Reports","volume":"15 1","pages":"7444"},"PeriodicalIF":3.9000,"publicationDate":"2025-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11876638/pdf/","citationCount":"0","resultStr":"{\"title\":\"Integration of Gaussian process regression and K means clustering for enhanced short term rainfall runoff modeling.\",\"authors\":\"Ozgur Kisi, Salim Heddam, Kulwinder Singh Parmar, Andrea Petroselli, Christoph Külls, Mohammad Zounemat-Kermani\",\"doi\":\"10.1038/s41598-025-91339-8\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Accurate rainfall-runoff modeling is crucial for effective watershed management, hydraulic infrastructure safety, and flood mitigation. However, predicting rainfall-runoff remains challenging due to the nonlinear interplay between hydro-meteorological and topographical variables. This study introduces a hybrid Gaussian process regression (GPR) model integrated with K-means clustering (GPR-K-means) for short-term rainfall-runoff forecasting. The Orgeval watershed in France serves as the study area, providing hourly precipitation and streamflow data spanning 1970-2012. The performance of the GPR-K-means model is compared with standalone GPR and principal component regression (PCR) models across four forecasting horizons: 1-hour, 6-hour, 12-hour, and 24-hour ahead. The results reveal that the GPR-K-means model significantly improves forecasting accuracy across all lead times, with a Nash-Sutcliffe Efficiency (NSE) of approximately 0.999, 0.942, 0.891, and 0.859 for 1-hour, 6-hour, 12-hour, and 24-hour forecasts, respectively. These results outperform other ML models, such as Long Short-Term Memory, Support Vector Machines, and Random Forest, reported in the literature. The GPR-K-means model demonstrates enhanced reliability and robustness in hourly streamflow forecasting, emphasizing its potential for broader application in hydrological modeling. Furthermore, this study provides a novel methodology for combining clustering and Bayesian regression techniques in surface hydrology, contributing to more accurate and timely flood prediction.</p>\",\"PeriodicalId\":21811,\"journal\":{\"name\":\"Scientific Reports\",\"volume\":\"15 1\",\"pages\":\"7444\"},\"PeriodicalIF\":3.9000,\"publicationDate\":\"2025-03-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11876638/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Scientific Reports\",\"FirstCategoryId\":\"103\",\"ListUrlMain\":\"https://doi.org/10.1038/s41598-025-91339-8\",\"RegionNum\":2,\"RegionCategory\":\"综合性期刊\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"MULTIDISCIPLINARY SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Scientific Reports","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.1038/s41598-025-91339-8","RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 0
摘要
准确的降雨径流模型对于有效的流域管理、水利基础设施安全和洪水缓解至关重要。然而,由于水文气象和地形变量之间的非线性相互作用,预测降雨径流仍然具有挑战性。本文提出了一种结合k -均值聚类(GPR-K-means)的混合高斯过程回归(GPR)模型,用于短期降雨径流预报。法国的奥格瓦尔流域作为研究区域,提供1970-2012年的每小时降水和流量数据。GPR- k -means模型的性能与独立GPR和主成分回归(PCR)模型在四个预测范围内进行了比较:1小时、6小时、12小时和24小时。结果表明,GPR-K-means模型显著提高了所有提前期的预测精度,1小时、6小时、12小时和24小时预测的Nash-Sutcliffe效率(NSE)分别约为0.999、0.942、0.891和0.859。这些结果优于文献中报道的其他ML模型,如长短期记忆、支持向量机和随机森林。GPR-K-means模型在每小时流量预测中显示出更高的可靠性和鲁棒性,强调了其在水文建模中更广泛应用的潜力。此外,该研究还为地表水文的聚类和贝叶斯回归技术的结合提供了一种新的方法,有助于更准确、及时地预测洪水。
Integration of Gaussian process regression and K means clustering for enhanced short term rainfall runoff modeling.
Accurate rainfall-runoff modeling is crucial for effective watershed management, hydraulic infrastructure safety, and flood mitigation. However, predicting rainfall-runoff remains challenging due to the nonlinear interplay between hydro-meteorological and topographical variables. This study introduces a hybrid Gaussian process regression (GPR) model integrated with K-means clustering (GPR-K-means) for short-term rainfall-runoff forecasting. The Orgeval watershed in France serves as the study area, providing hourly precipitation and streamflow data spanning 1970-2012. The performance of the GPR-K-means model is compared with standalone GPR and principal component regression (PCR) models across four forecasting horizons: 1-hour, 6-hour, 12-hour, and 24-hour ahead. The results reveal that the GPR-K-means model significantly improves forecasting accuracy across all lead times, with a Nash-Sutcliffe Efficiency (NSE) of approximately 0.999, 0.942, 0.891, and 0.859 for 1-hour, 6-hour, 12-hour, and 24-hour forecasts, respectively. These results outperform other ML models, such as Long Short-Term Memory, Support Vector Machines, and Random Forest, reported in the literature. The GPR-K-means model demonstrates enhanced reliability and robustness in hourly streamflow forecasting, emphasizing its potential for broader application in hydrological modeling. Furthermore, this study provides a novel methodology for combining clustering and Bayesian regression techniques in surface hydrology, contributing to more accurate and timely flood prediction.
期刊介绍:
We publish original research from all areas of the natural sciences, psychology, medicine and engineering. You can learn more about what we publish by browsing our specific scientific subject areas below or explore Scientific Reports by browsing all articles and collections.
Scientific Reports has a 2-year impact factor: 4.380 (2021), and is the 6th most-cited journal in the world, with more than 540,000 citations in 2020 (Clarivate Analytics, 2021).
•Engineering
Engineering covers all aspects of engineering, technology, and applied science. It plays a crucial role in the development of technologies to address some of the world''s biggest challenges, helping to save lives and improve the way we live.
•Physical sciences
Physical sciences are those academic disciplines that aim to uncover the underlying laws of nature — often written in the language of mathematics. It is a collective term for areas of study including astronomy, chemistry, materials science and physics.
•Earth and environmental sciences
Earth and environmental sciences cover all aspects of Earth and planetary science and broadly encompass solid Earth processes, surface and atmospheric dynamics, Earth system history, climate and climate change, marine and freshwater systems, and ecology. It also considers the interactions between humans and these systems.
•Biological sciences
Biological sciences encompass all the divisions of natural sciences examining various aspects of vital processes. The concept includes anatomy, physiology, cell biology, biochemistry and biophysics, and covers all organisms from microorganisms, animals to plants.
•Health sciences
The health sciences study health, disease and healthcare. This field of study aims to develop knowledge, interventions and technology for use in healthcare to improve the treatment of patients.