Denoising Autoencoder for Reconstructing Sensor Observation Data and Predicting Evapotranspiration: Noisy and Missing Values Repair and Uncertainty Quantification
Timothy K. Johnsen, Xiangyu Bi, Chunwei Chou, Charuleka Varadharajan, Yuxin Wu, Jonathan Skone, Lavanya Ramakrishnan
{"title":"Denoising Autoencoder for Reconstructing Sensor Observation Data and Predicting Evapotranspiration: Noisy and Missing Values Repair and Uncertainty Quantification","authors":"Timothy K. Johnsen, Xiangyu Bi, Chunwei Chou, Charuleka Varadharajan, Yuxin Wu, Jonathan Skone, Lavanya Ramakrishnan","doi":"10.1029/2024wr039831","DOIUrl":null,"url":null,"abstract":"Machine learning (ML) methods applied in scientific research often deal with interrelated features in high‐dimensional data. Reducing data noise and redundancy is needed to increase prediction accuracy and efficiency especially when dealing with data from field sensors. We explored an unsupervised learning method, the denoising autoencoder (DAE), to extract the underlying data structure from noisy raw data in the context of predicting hydrologic quantities from multiple field sensors. These sensors have intrinsic instrumental noise and occasional malfunctions that cause missing values. Our DAE neural network reconstructed meteorological sensor data containing noise and missing values to predict evapotranspiration in a mountainous watershed. The DAE reconstructed the sensor variables with a mean coefficient of determination value of 0.77 across 15 dimensions representing individual sensors. It reduced variance and bias uncertainties compared to a classical autoencoder model. The reconstruction quality varied across dimensions depending on their cross‐correlation and alignment with the underlying data structure. Uncertainties arising from the model structure were overall higher than those resulting from data corruption. We attached the DAE structure to a downstream ET‐prediction neural network in three formats and achieved reasonably accurate ET predictions . The use of the DAE notably reduced variance uncertainty in ET prediction. However, excessive variance reduction may be accompanied by an increase in bias due to the intrinsic bias‐variance tradeoff. Our method of evaluating and reducing uncertainties in aggregated data from different sources can be used to improve predictive models, process understanding, and uncertainty quantification for better water resource management.","PeriodicalId":23799,"journal":{"name":"Water Resources Research","volume":"28 1","pages":""},"PeriodicalIF":5.0000,"publicationDate":"2025-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Water Resources Research","FirstCategoryId":"89","ListUrlMain":"https://doi.org/10.1029/2024wr039831","RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENVIRONMENTAL SCIENCES","Score":null,"Total":0}
引用次数: 0
Abstract
Machine learning (ML) methods applied in scientific research often deal with interrelated features in high‐dimensional data. Reducing data noise and redundancy is needed to increase prediction accuracy and efficiency especially when dealing with data from field sensors. We explored an unsupervised learning method, the denoising autoencoder (DAE), to extract the underlying data structure from noisy raw data in the context of predicting hydrologic quantities from multiple field sensors. These sensors have intrinsic instrumental noise and occasional malfunctions that cause missing values. Our DAE neural network reconstructed meteorological sensor data containing noise and missing values to predict evapotranspiration in a mountainous watershed. The DAE reconstructed the sensor variables with a mean coefficient of determination value of 0.77 across 15 dimensions representing individual sensors. It reduced variance and bias uncertainties compared to a classical autoencoder model. The reconstruction quality varied across dimensions depending on their cross‐correlation and alignment with the underlying data structure. Uncertainties arising from the model structure were overall higher than those resulting from data corruption. We attached the DAE structure to a downstream ET‐prediction neural network in three formats and achieved reasonably accurate ET predictions . The use of the DAE notably reduced variance uncertainty in ET prediction. However, excessive variance reduction may be accompanied by an increase in bias due to the intrinsic bias‐variance tradeoff. Our method of evaluating and reducing uncertainties in aggregated data from different sources can be used to improve predictive models, process understanding, and uncertainty quantification for better water resource management.
期刊介绍:
Water Resources Research (WRR) is an interdisciplinary journal that focuses on hydrology and water resources. It publishes original research in the natural and social sciences of water. It emphasizes the role of water in the Earth system, including physical, chemical, biological, and ecological processes in water resources research and management, including social, policy, and public health implications. It encompasses observational, experimental, theoretical, analytical, numerical, and data-driven approaches that advance the science of water and its management. Submissions are evaluated for their novelty, accuracy, significance, and broader implications of the findings.