Xuke Wu , Kun Shan , Lan Wang , Jingkai Wang , Mingsheng Shang
{"title":"时空水质数据重构:一个张量分解框架","authors":"Xuke Wu , Kun Shan , Lan Wang , Jingkai Wang , Mingsheng Shang","doi":"10.1016/j.ecoinf.2025.103283","DOIUrl":null,"url":null,"abstract":"<div><div>Automatic high-frequency monitoring (AHFM) of water quality parameters has gained growing attention for managing eutrophic lakes. However, missing data in water quality datasets remains a persistent challenge, often compromising the reliability of mathematical models and statistical analyses. While traditional imputation methods fail to adequately capture complex spatiotemporal dependencies among water quality variables, this study proposes a novel nonnegative tensor factorization (NTF) model designed to reconstruct missing values by effectively modeling variable-site-time triad interactions. Previous findings indicate that incorporating bias schemes into NTF architectures substantially reduces underfitting risks. Leveraging this insight, we develop and rigorously evaluate seven distinct biased NTF variants. Their diversified bias term designs not only enhance individual model performance but also enable highly effective ensemble learning through complementary strengths. To validate the proposed models, we conduct comprehensive experiments using real-world AHFM data from Lake Dianchi, China, under various missing data scenarios (20–80 % missing ratios and 1–4 weeks missing gaps). The key water quality parameters include chlorophyll-<em>a</em> concentration, water temperature, pH, dissolved oxygen, electrical conductivity, turbidity, chemical oxygen demand, ammonia, total phosphorus, and total nitrogen. The results demonstrate the superiority of the seven biased NTF models, achieving optimal performance with a root mean squared error (RMSE) of 0.2796 ± 0.0041, mean absolute error (MAE) of 0.1611 ± 0.0034, and Nash-Sutcliffe efficiency (NSE) of 0.9704 ± 0.0009 across all missingness scenarios. Compared to state-of-the-art models, these methods yield consistent improvements of 3.42 %–30.74 % in RMSE, 2.30 %–30.38 % in MAE, and 0.20 %–3.22 % in NSE. Notably, an ensemble of the seven models further elevates imputation accuracy, attaining an RMSE of 0.2409 ± 0.0018, MAE of 0.1384 ± 0.0012, and NSE of 0.9768 ± 0.0009. These findings underscore the potential of bias-enhanced NTF frameworks as a robust tool for analyzing high-dimensional monitoring data.</div></div>","PeriodicalId":51024,"journal":{"name":"Ecological Informatics","volume":"90 ","pages":"Article 103283"},"PeriodicalIF":7.3000,"publicationDate":"2025-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Spatiotemporal water quality data reconstruction: A tensor factorization framework\",\"authors\":\"Xuke Wu , Kun Shan , Lan Wang , Jingkai Wang , Mingsheng Shang\",\"doi\":\"10.1016/j.ecoinf.2025.103283\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Automatic high-frequency monitoring (AHFM) of water quality parameters has gained growing attention for managing eutrophic lakes. However, missing data in water quality datasets remains a persistent challenge, often compromising the reliability of mathematical models and statistical analyses. While traditional imputation methods fail to adequately capture complex spatiotemporal dependencies among water quality variables, this study proposes a novel nonnegative tensor factorization (NTF) model designed to reconstruct missing values by effectively modeling variable-site-time triad interactions. Previous findings indicate that incorporating bias schemes into NTF architectures substantially reduces underfitting risks. Leveraging this insight, we develop and rigorously evaluate seven distinct biased NTF variants. Their diversified bias term designs not only enhance individual model performance but also enable highly effective ensemble learning through complementary strengths. To validate the proposed models, we conduct comprehensive experiments using real-world AHFM data from Lake Dianchi, China, under various missing data scenarios (20–80 % missing ratios and 1–4 weeks missing gaps). The key water quality parameters include chlorophyll-<em>a</em> concentration, water temperature, pH, dissolved oxygen, electrical conductivity, turbidity, chemical oxygen demand, ammonia, total phosphorus, and total nitrogen. The results demonstrate the superiority of the seven biased NTF models, achieving optimal performance with a root mean squared error (RMSE) of 0.2796 ± 0.0041, mean absolute error (MAE) of 0.1611 ± 0.0034, and Nash-Sutcliffe efficiency (NSE) of 0.9704 ± 0.0009 across all missingness scenarios. Compared to state-of-the-art models, these methods yield consistent improvements of 3.42 %–30.74 % in RMSE, 2.30 %–30.38 % in MAE, and 0.20 %–3.22 % in NSE. Notably, an ensemble of the seven models further elevates imputation accuracy, attaining an RMSE of 0.2409 ± 0.0018, MAE of 0.1384 ± 0.0012, and NSE of 0.9768 ± 0.0009. These findings underscore the potential of bias-enhanced NTF frameworks as a robust tool for analyzing high-dimensional monitoring data.</div></div>\",\"PeriodicalId\":51024,\"journal\":{\"name\":\"Ecological Informatics\",\"volume\":\"90 \",\"pages\":\"Article 103283\"},\"PeriodicalIF\":7.3000,\"publicationDate\":\"2025-06-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Ecological Informatics\",\"FirstCategoryId\":\"93\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1574954125002924\",\"RegionNum\":2,\"RegionCategory\":\"环境科学与生态学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ECOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Ecological Informatics","FirstCategoryId":"93","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1574954125002924","RegionNum":2,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ECOLOGY","Score":null,"Total":0}
Spatiotemporal water quality data reconstruction: A tensor factorization framework
Automatic high-frequency monitoring (AHFM) of water quality parameters has gained growing attention for managing eutrophic lakes. However, missing data in water quality datasets remains a persistent challenge, often compromising the reliability of mathematical models and statistical analyses. While traditional imputation methods fail to adequately capture complex spatiotemporal dependencies among water quality variables, this study proposes a novel nonnegative tensor factorization (NTF) model designed to reconstruct missing values by effectively modeling variable-site-time triad interactions. Previous findings indicate that incorporating bias schemes into NTF architectures substantially reduces underfitting risks. Leveraging this insight, we develop and rigorously evaluate seven distinct biased NTF variants. Their diversified bias term designs not only enhance individual model performance but also enable highly effective ensemble learning through complementary strengths. To validate the proposed models, we conduct comprehensive experiments using real-world AHFM data from Lake Dianchi, China, under various missing data scenarios (20–80 % missing ratios and 1–4 weeks missing gaps). The key water quality parameters include chlorophyll-a concentration, water temperature, pH, dissolved oxygen, electrical conductivity, turbidity, chemical oxygen demand, ammonia, total phosphorus, and total nitrogen. The results demonstrate the superiority of the seven biased NTF models, achieving optimal performance with a root mean squared error (RMSE) of 0.2796 ± 0.0041, mean absolute error (MAE) of 0.1611 ± 0.0034, and Nash-Sutcliffe efficiency (NSE) of 0.9704 ± 0.0009 across all missingness scenarios. Compared to state-of-the-art models, these methods yield consistent improvements of 3.42 %–30.74 % in RMSE, 2.30 %–30.38 % in MAE, and 0.20 %–3.22 % in NSE. Notably, an ensemble of the seven models further elevates imputation accuracy, attaining an RMSE of 0.2409 ± 0.0018, MAE of 0.1384 ± 0.0012, and NSE of 0.9768 ± 0.0009. These findings underscore the potential of bias-enhanced NTF frameworks as a robust tool for analyzing high-dimensional monitoring data.
期刊介绍:
The journal Ecological Informatics is devoted to the publication of high quality, peer-reviewed articles on all aspects of computational ecology, data science and biogeography. The scope of the journal takes into account the data-intensive nature of ecology, the growing capacity of information technology to access, harness and leverage complex data as well as the critical need for informing sustainable management in view of global environmental and climate change.
The nature of the journal is interdisciplinary at the crossover between ecology and informatics. It focuses on novel concepts and techniques for image- and genome-based monitoring and interpretation, sensor- and multimedia-based data acquisition, internet-based data archiving and sharing, data assimilation, modelling and prediction of ecological data.