Zeno F Levy, Robin L Glas, Timothy J Stagnitta, Neil Terry
{"title":"ARCHI: A New R Package for Automated Imputation of Regionally Correlated Hydrologic Records.","authors":"Zeno F Levy, Robin L Glas, Timothy J Stagnitta, Neil Terry","doi":"10.1111/gwat.13474","DOIUrl":null,"url":null,"abstract":"<p><p>Missing data in hydrological records can limit resource assessment, process understanding, and predictive modeling. Here, we present ARCHI (Automated Regional Correlation Analysis for Hydrologic Record Imputation), a new, open-source software package in R designed to aggregate, impute, cluster, and visualize regionally correlated hydrologic records. ARCHI imputes missing data in \"target\" records by linear regression using more complete \"reference\" records as predictors. Automated imputation is implemented using a novel, iterative algorithm that allows each site to be considered a target or reference for regression, growing the pool of complete references with each imputed record until viable gap-filling ceases. Users can limit artifacts from spurious correlations by specifying model-acceptance criteria and applying geospatial, correlation, and group-based filters to control reference selection. ARCHI provides additional functions for visualizing results, clustering records with similar correlation structures, evaluating holdout data, and interactive parameterization with an accessible and intuitive graphical user interface (GUI). This methods brief provides an overview of the ARCHI package, modeling guidelines, and benchmarking on two regional groundwater-level datasets from the Central Valley, CA and Long Island, NY. We evaluate ARCHI alongside widely used multivariate imputation software to highlight and contextualize its computational efficiency, imputation accuracy, and model transparency when applied to large, groundwater-level datasets.</p>","PeriodicalId":94022,"journal":{"name":"Ground water","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2025-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Ground water","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1111/gwat.13474","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Missing data in hydrological records can limit resource assessment, process understanding, and predictive modeling. Here, we present ARCHI (Automated Regional Correlation Analysis for Hydrologic Record Imputation), a new, open-source software package in R designed to aggregate, impute, cluster, and visualize regionally correlated hydrologic records. ARCHI imputes missing data in "target" records by linear regression using more complete "reference" records as predictors. Automated imputation is implemented using a novel, iterative algorithm that allows each site to be considered a target or reference for regression, growing the pool of complete references with each imputed record until viable gap-filling ceases. Users can limit artifacts from spurious correlations by specifying model-acceptance criteria and applying geospatial, correlation, and group-based filters to control reference selection. ARCHI provides additional functions for visualizing results, clustering records with similar correlation structures, evaluating holdout data, and interactive parameterization with an accessible and intuitive graphical user interface (GUI). This methods brief provides an overview of the ARCHI package, modeling guidelines, and benchmarking on two regional groundwater-level datasets from the Central Valley, CA and Long Island, NY. We evaluate ARCHI alongside widely used multivariate imputation software to highlight and contextualize its computational efficiency, imputation accuracy, and model transparency when applied to large, groundwater-level datasets.
水文记录中缺少的数据会限制资源评估、过程理解和预测建模。本文介绍了ARCHI (Automated Regional Correlation Analysis for Hydrologic Record Imputation),这是一个用R语言编写的新的开源软件包,用于聚合、Imputation、聚类和可视化区域相关水文记录。ARCHI使用更完整的“参考”记录作为预测因子,通过线性回归来推算“目标”记录中缺失的数据。自动输入使用一种新颖的迭代算法实现,该算法允许将每个站点视为回归的目标或参考,使用每个输入的记录增加完整的参考池,直到可行的空白填充停止。用户可以通过指定模型接受标准和应用地理空间、相关性和基于组的过滤器来控制参考选择,从而限制伪相关性产生的工件。ARCHI提供了其他功能,用于可视化结果、具有相似关联结构的聚类记录、评估保留数据以及使用可访问且直观的图形用户界面(GUI)进行交互式参数化。该方法简要介绍了ARCHI软件包、建模指南以及来自加利福尼亚州中央山谷和纽约州长岛的两个区域地下水位数据集的基准测试。我们将ARCHI与广泛使用的多元数据输入软件一起进行评估,以突出其计算效率、输入精度和模型透明度,并将其应用于大型地下水位数据集。