Hervé Guillon, Belize Lane, Colin F. Byrne, Samuel Sandoval-Solis, Gregory B. Pasternack
{"title":"Mind the information gap: How sampling and clustering impact the predictability of reach-scale channel types in California (USA)","authors":"Hervé Guillon, Belize Lane, Colin F. Byrne, Samuel Sandoval-Solis, Gregory B. Pasternack","doi":"10.1002/esp.5984","DOIUrl":null,"url":null,"abstract":"<p>Clustering and machine learning-based predictions are increasingly used for environmental data analysis and management. In fluvial geomorphology, examples include predicting channel types throughout a river network and segmenting river networks into a series of channel types, or groups of channel forms. However, when relevant information is unevenly distributed throughout a river network, the discrepancy between data-rich and data-poor locations creates an information gap. Combining clustering and predictions addresses this information gap, but challenges and limitations remain poorly documented. This is especially true when considering that predictions are often achieved with two approaches that are meaningfully different in terms of information processing: decision trees (e.g., RF: random forest) and deep learning (e.g., DNNs: deep neural networks). This presents challenges for downstream management decisions and when comparing clusters and predictions within or across study areas. To address this, we investigate the performance of RF and DNN with respect to the information gap between clustering data and prediction data. We use nine regional examples of clustering and predicting river channel types, stemming from a single clustering methodology applied in California, USA. Our results show that prediction performance decreases when the information gap between field-measured data and geospatial predictors increases. Furthermore, RF outperforms DNN, and their difference in performance decreases when the information gap between field-measured and geospatial data decreases. This suggests that mismatched scales between field-derived channel types and geospatial predictors hinder sequential information processing in DNN. Finally, our results highlight a sampling trade-off between uniformly capturing geomorphic variability and ensuring robust generalisation.</p>","PeriodicalId":11408,"journal":{"name":"Earth Surface Processes and Landforms","volume":"49 14","pages":"4610-4631"},"PeriodicalIF":2.8000,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Earth Surface Processes and Landforms","FirstCategoryId":"89","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/esp.5984","RegionNum":3,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"GEOGRAPHY, PHYSICAL","Score":null,"Total":0}
引用次数: 0
Abstract
Clustering and machine learning-based predictions are increasingly used for environmental data analysis and management. In fluvial geomorphology, examples include predicting channel types throughout a river network and segmenting river networks into a series of channel types, or groups of channel forms. However, when relevant information is unevenly distributed throughout a river network, the discrepancy between data-rich and data-poor locations creates an information gap. Combining clustering and predictions addresses this information gap, but challenges and limitations remain poorly documented. This is especially true when considering that predictions are often achieved with two approaches that are meaningfully different in terms of information processing: decision trees (e.g., RF: random forest) and deep learning (e.g., DNNs: deep neural networks). This presents challenges for downstream management decisions and when comparing clusters and predictions within or across study areas. To address this, we investigate the performance of RF and DNN with respect to the information gap between clustering data and prediction data. We use nine regional examples of clustering and predicting river channel types, stemming from a single clustering methodology applied in California, USA. Our results show that prediction performance decreases when the information gap between field-measured data and geospatial predictors increases. Furthermore, RF outperforms DNN, and their difference in performance decreases when the information gap between field-measured and geospatial data decreases. This suggests that mismatched scales between field-derived channel types and geospatial predictors hinder sequential information processing in DNN. Finally, our results highlight a sampling trade-off between uniformly capturing geomorphic variability and ensuring robust generalisation.
期刊介绍:
Earth Surface Processes and Landforms is an interdisciplinary international journal concerned with:
the interactions between surface processes and landforms and landscapes;
that lead to physical, chemical and biological changes; and which in turn create;
current landscapes and the geological record of past landscapes.
Its focus is core to both physical geographical and geological communities, and also the wider geosciences