Mind the information gap: How sampling and clustering impact the predictability of reach-scale channel types in California (USA)

IF 2.8 3区地球科学 Q2 GEOGRAPHY, PHYSICAL

Earth Surface Processes and Landforms Pub Date : 2024-09-23 DOI:10.1002/esp.5984

Hervé Guillon, Belize Lane, Colin F. Byrne, Samuel Sandoval-Solis, Gregory B. Pasternack

{"title":"Mind the information gap: How sampling and clustering impact the predictability of reach-scale channel types in California (USA)","authors":"Hervé Guillon, Belize Lane, Colin F. Byrne, Samuel Sandoval-Solis, Gregory B. Pasternack","doi":"10.1002/esp.5984","DOIUrl":null,"url":null,"abstract":"<p>Clustering and machine learning-based predictions are increasingly used for environmental data analysis and management. In fluvial geomorphology, examples include predicting channel types throughout a river network and segmenting river networks into a series of channel types, or groups of channel forms. However, when relevant information is unevenly distributed throughout a river network, the discrepancy between data-rich and data-poor locations creates an information gap. Combining clustering and predictions addresses this information gap, but challenges and limitations remain poorly documented. This is especially true when considering that predictions are often achieved with two approaches that are meaningfully different in terms of information processing: decision trees (e.g., RF: random forest) and deep learning (e.g., DNNs: deep neural networks). This presents challenges for downstream management decisions and when comparing clusters and predictions within or across study areas. To address this, we investigate the performance of RF and DNN with respect to the information gap between clustering data and prediction data. We use nine regional examples of clustering and predicting river channel types, stemming from a single clustering methodology applied in California, USA. Our results show that prediction performance decreases when the information gap between field-measured data and geospatial predictors increases. Furthermore, RF outperforms DNN, and their difference in performance decreases when the information gap between field-measured and geospatial data decreases. This suggests that mismatched scales between field-derived channel types and geospatial predictors hinder sequential information processing in DNN. Finally, our results highlight a sampling trade-off between uniformly capturing geomorphic variability and ensuring robust generalisation.</p>","PeriodicalId":11408,"journal":{"name":"Earth Surface Processes and Landforms","volume":"49 14","pages":"4610-4631"},"PeriodicalIF":2.8000,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Earth Surface Processes and Landforms","FirstCategoryId":"89","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/esp.5984","RegionNum":3,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"GEOGRAPHY, PHYSICAL","Score":null,"Total":0}

引用次数: 0

Abstract

Clustering and machine learning-based predictions are increasingly used for environmental data analysis and management. In fluvial geomorphology, examples include predicting channel types throughout a river network and segmenting river networks into a series of channel types, or groups of channel forms. However, when relevant information is unevenly distributed throughout a river network, the discrepancy between data-rich and data-poor locations creates an information gap. Combining clustering and predictions addresses this information gap, but challenges and limitations remain poorly documented. This is especially true when considering that predictions are often achieved with two approaches that are meaningfully different in terms of information processing: decision trees (e.g., RF: random forest) and deep learning (e.g., DNNs: deep neural networks). This presents challenges for downstream management decisions and when comparing clusters and predictions within or across study areas. To address this, we investigate the performance of RF and DNN with respect to the information gap between clustering data and prediction data. We use nine regional examples of clustering and predicting river channel types, stemming from a single clustering methodology applied in California, USA. Our results show that prediction performance decreases when the information gap between field-measured data and geospatial predictors increases. Furthermore, RF outperforms DNN, and their difference in performance decreases when the information gap between field-measured and geospatial data decreases. This suggests that mismatched scales between field-derived channel types and geospatial predictors hinder sequential information processing in DNN. Finally, our results highlight a sampling trade-off between uniformly capturing geomorphic variability and ensuring robust generalisation.

Abstract Image

查看原文本刊更多论文

注意信息差距：取样和聚类如何影响加利福尼亚（美国）河道类型的可预测性

基于聚类和机器学习的预测越来越多地用于环境数据分析和管理。在河道地貌学中，例子包括预测整个河网的河道类型，以及将河网划分为一系列河道类型或河道形式组。然而，当相关信息在整个河网中分布不均时，数据丰富和数据贫乏地点之间的差异就会造成信息差距。将聚类和预测结合起来可以解决这一信息缺口，但其挑战和局限性仍鲜有记载。尤其是考虑到预测通常是通过两种在信息处理方面存在重大差异的方法来实现的：决策树（如 RF：随机森林）和深度学习（如 DNN：深度神经网络）。这给下游管理决策以及在研究区域内或跨研究区域比较聚类和预测带来了挑战。为此，我们研究了 RF 和 DNN 在聚类数据与预测数据之间的信息差距方面的性能。我们使用了九个聚类和预测河道类型的区域示例，这些示例源于在美国加利福尼亚州应用的单一聚类方法。结果表明，当实地测量数据与地理空间预测数据之间的信息差距增大时，预测性能就会下降。此外，RF 的性能优于 DNN，而且当实地测量数据与地理空间数据之间的信息差距缩小时，两者的性能差距也会缩小。这表明，野外获取的信道类型与地理空间预测因子之间不匹配的尺度阻碍了 DNN 的顺序信息处理。最后，我们的结果强调了在均匀捕捉地貌变异性和确保稳健泛化之间的取样权衡。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Earth Surface Processes and Landforms 地学-地球科学综合

CiteScore

6.40

自引率

12.10%

发文量

215

审稿时长

4 months

期刊介绍： Earth Surface Processes and Landforms is an interdisciplinary international journal concerned with: the interactions between surface processes and landforms and landscapes; that lead to physical, chemical and biological changes; and which in turn create; current landscapes and the geological record of past landscapes. Its focus is core to both physical geographical and geological communities, and also the wider geosciences