{"title":"Enhancing Hydrological Extremes Forecasting Capabilities in Data-Scarce Regions Through Transfer Learning With Data Augmentation","authors":"Yehai Tang, Xiongpeng Tang, Zhanliang Zhu, Chao Gao, Lei Liu, Fubo Zhao, Silong Zhang","doi":"10.1029/2025EF006060","DOIUrl":null,"url":null,"abstract":"<p>Hydrological extremes forecasting in data-scarce basins remains a longstanding challenge in hydrological science. Despite significant advancements in transferring hydrological knowledge from data-rich to data-sparse basins, such as regionalization techniques for hydrological prediction and novel deep learning (DL)-based Transfer learning (TL) methods, the application of models trained in data-rich basins introduces inevitable noise into predictions within data-sparse basins. This potential distortion could misinterpret rainfall-runoff patterns within specific basins. This study introduces a TL framework based on data augmentation (DA-TL) within the context of hydrological modeling. The framework employs augmented rainfall data as input for conceptual models to generate pretraining runoff samples, addressing the challenges of sample scarcity and imbalance in target basins. Subsequently, TL is applied to fine-tune predictions in the target basin, thereby mitigating inappropriate hydrological knowledge transfer associated with cross-basin learning. The DA-TL framework was validated across nine river basins in China, representing three distinct climate zones (semi-arid, semi-humid, and humid regions). Results indicate that the DA-TL approach outperforms current DL methods for regionalized hydrological modeling. Specifically, under varying data scarcity scenarios, DA-TL achieved average Nash–Sutcliffe Efficiency improvements of 3.8% and 1.0% compared to similar-basin modeling and all-basin modeling strategies, respectively. Model interpretability analyses reveal that the effectiveness of the DA-TL framework primarily stems from its adept learning of the runoff generation and routing processes in target basins. These findings underscore the potential of using synthetic data derived from process-based models for pretraining in TL, offering promising avenues for improving hydrological extremes forecasting accuracy in observation-limited regions.</p>","PeriodicalId":48748,"journal":{"name":"Earths Future","volume":"13 10","pages":""},"PeriodicalIF":8.2000,"publicationDate":"2025-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://agupubs.onlinelibrary.wiley.com/doi/epdf/10.1029/2025EF006060","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Earths Future","FirstCategoryId":"89","ListUrlMain":"https://agupubs.onlinelibrary.wiley.com/doi/10.1029/2025EF006060","RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENVIRONMENTAL SCIENCES","Score":null,"Total":0}
引用次数: 0
Abstract
Hydrological extremes forecasting in data-scarce basins remains a longstanding challenge in hydrological science. Despite significant advancements in transferring hydrological knowledge from data-rich to data-sparse basins, such as regionalization techniques for hydrological prediction and novel deep learning (DL)-based Transfer learning (TL) methods, the application of models trained in data-rich basins introduces inevitable noise into predictions within data-sparse basins. This potential distortion could misinterpret rainfall-runoff patterns within specific basins. This study introduces a TL framework based on data augmentation (DA-TL) within the context of hydrological modeling. The framework employs augmented rainfall data as input for conceptual models to generate pretraining runoff samples, addressing the challenges of sample scarcity and imbalance in target basins. Subsequently, TL is applied to fine-tune predictions in the target basin, thereby mitigating inappropriate hydrological knowledge transfer associated with cross-basin learning. The DA-TL framework was validated across nine river basins in China, representing three distinct climate zones (semi-arid, semi-humid, and humid regions). Results indicate that the DA-TL approach outperforms current DL methods for regionalized hydrological modeling. Specifically, under varying data scarcity scenarios, DA-TL achieved average Nash–Sutcliffe Efficiency improvements of 3.8% and 1.0% compared to similar-basin modeling and all-basin modeling strategies, respectively. Model interpretability analyses reveal that the effectiveness of the DA-TL framework primarily stems from its adept learning of the runoff generation and routing processes in target basins. These findings underscore the potential of using synthetic data derived from process-based models for pretraining in TL, offering promising avenues for improving hydrological extremes forecasting accuracy in observation-limited regions.
期刊介绍:
Earth’s Future: A transdisciplinary open access journal, Earth’s Future focuses on the state of the Earth and the prediction of the planet’s future. By publishing peer-reviewed articles as well as editorials, essays, reviews, and commentaries, this journal will be the preeminent scholarly resource on the Anthropocene. It will also help assess the risks and opportunities associated with environmental changes and challenges.