Denoising Autoencoder for Reconstructing Sensor Observation Data and Predicting Evapotranspiration: Noisy and Missing Values Repair and Uncertainty Quantification

IF 5 1区地球科学 Q2 ENVIRONMENTAL SCIENCES

Water Resources Research Pub Date : 2025-09-30 DOI:10.1029/2024wr039831

Timothy K. Johnsen, Xiangyu Bi, Chunwei Chou, Charuleka Varadharajan, Yuxin Wu, Jonathan Skone, Lavanya Ramakrishnan

{"title":"Denoising Autoencoder for Reconstructing Sensor Observation Data and Predicting Evapotranspiration: Noisy and Missing Values Repair and Uncertainty Quantification","authors":"Timothy K. Johnsen, Xiangyu Bi, Chunwei Chou, Charuleka Varadharajan, Yuxin Wu, Jonathan Skone, Lavanya Ramakrishnan","doi":"10.1029/2024wr039831","DOIUrl":null,"url":null,"abstract":"Machine learning (ML) methods applied in scientific research often deal with interrelated features in high‐dimensional data. Reducing data noise and redundancy is needed to increase prediction accuracy and efficiency especially when dealing with data from field sensors. We explored an unsupervised learning method, the denoising autoencoder (DAE), to extract the underlying data structure from noisy raw data in the context of predicting hydrologic quantities from multiple field sensors. These sensors have intrinsic instrumental noise and occasional malfunctions that cause missing values. Our DAE neural network reconstructed meteorological sensor data containing noise and missing values to predict evapotranspiration in a mountainous watershed. The DAE reconstructed the sensor variables with a mean coefficient of determination value of 0.77 across 15 dimensions representing individual sensors. It reduced variance and bias uncertainties compared to a classical autoencoder model. The reconstruction quality varied across dimensions depending on their cross‐correlation and alignment with the underlying data structure. Uncertainties arising from the model structure were overall higher than those resulting from data corruption. We attached the DAE structure to a downstream ET‐prediction neural network in three formats and achieved reasonably accurate ET predictions . The use of the DAE notably reduced variance uncertainty in ET prediction. However, excessive variance reduction may be accompanied by an increase in bias due to the intrinsic bias‐variance tradeoff. Our method of evaluating and reducing uncertainties in aggregated data from different sources can be used to improve predictive models, process understanding, and uncertainty quantification for better water resource management.","PeriodicalId":23799,"journal":{"name":"Water Resources Research","volume":"28 1","pages":""},"PeriodicalIF":5.0000,"publicationDate":"2025-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Water Resources Research","FirstCategoryId":"89","ListUrlMain":"https://doi.org/10.1029/2024wr039831","RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENVIRONMENTAL SCIENCES","Score":null,"Total":0}

引用次数: 0

Abstract

Machine learning (ML) methods applied in scientific research often deal with interrelated features in high‐dimensional data. Reducing data noise and redundancy is needed to increase prediction accuracy and efficiency especially when dealing with data from field sensors. We explored an unsupervised learning method, the denoising autoencoder (DAE), to extract the underlying data structure from noisy raw data in the context of predicting hydrologic quantities from multiple field sensors. These sensors have intrinsic instrumental noise and occasional malfunctions that cause missing values. Our DAE neural network reconstructed meteorological sensor data containing noise and missing values to predict evapotranspiration in a mountainous watershed. The DAE reconstructed the sensor variables with a mean coefficient of determination value of 0.77 across 15 dimensions representing individual sensors. It reduced variance and bias uncertainties compared to a classical autoencoder model. The reconstruction quality varied across dimensions depending on their cross‐correlation and alignment with the underlying data structure. Uncertainties arising from the model structure were overall higher than those resulting from data corruption. We attached the DAE structure to a downstream ET‐prediction neural network in three formats and achieved reasonably accurate ET predictions . The use of the DAE notably reduced variance uncertainty in ET prediction. However, excessive variance reduction may be accompanied by an increase in bias due to the intrinsic bias‐variance tradeoff. Our method of evaluating and reducing uncertainties in aggregated data from different sources can be used to improve predictive models, process understanding, and uncertainty quantification for better water resource management.

查看原文本刊更多论文

传感器观测数据重构与蒸散预测的去噪自编码器：噪声、缺失值修复与不确定性量化

应用于科学研究的机器学习（ML）方法通常处理高维数据中的相关特征。为了提高预测精度和效率，特别是在处理来自现场传感器的数据时，需要减少数据噪声和冗余。我们探索了一种无监督学习方法，即去噪自动编码器（DAE），用于从多个现场传感器预测水文数量的背景下从噪声原始数据中提取底层数据结构。这些传感器有固有的仪器噪声和偶尔的故障，导致丢失的值。我们的DAE神经网络重建了包含噪声和缺失值的气象传感器数据，以预测山区流域的蒸散量。DAE在代表单个传感器的15个维度上重构传感器变量，平均决定系数为0.77。与经典的自编码器模型相比，它减少了方差和偏差的不确定性。重建质量在不同维度上的变化取决于它们的相互关系和与底层数据结构的一致性。模型结构引起的不确定性总体上高于数据损坏引起的不确定性。我们将DAE结构以三种格式连接到下游ET预测神经网络，并获得了相当准确的ET预测。DAE的使用显著降低了ET预测中的方差不确定性。然而，由于固有的偏差-方差权衡，过度的方差减少可能伴随着偏差的增加。我们评估和减少来自不同来源的汇总数据中的不确定性的方法可用于改进预测模型、过程理解和不确定性量化，从而更好地进行水资源管理。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Water Resources Research 环境科学-湖沼学

CiteScore

8.80

自引率

13.00%

发文量

599

审稿时长

3.5 months

期刊介绍： Water Resources Research (WRR) is an interdisciplinary journal that focuses on hydrology and water resources. It publishes original research in the natural and social sciences of water. It emphasizes the role of water in the Earth system, including physical, chemical, biological, and ecological processes in water resources research and management, including social, policy, and public health implications. It encompasses observational, experimental, theoretical, analytical, numerical, and data-driven approaches that advance the science of water and its management. Submissions are evaluated for their novelty, accuracy, significance, and broader implications of the findings.