重新评估寻床器的预测能力：对冰下床形态检测的机器学习的见解-对“使用机器学习自动识别流线型冰下床形态：开源Python方法”的评论

IF 2.9 3区地球科学 Q2 GEOGRAPHY, PHYSICAL

Boreas Pub Date : 2025-03-17 DOI:10.1111/bor.70005

Ming Li, Huanyu Zhao, Tianfei Yu

{"title":"重新评估寻床器的预测能力：对冰下床形态检测的机器学习的见解-对“使用机器学习自动识别流线型冰下床形态：开源Python方法”的评论","authors":"Ming Li, Huanyu Zhao, Tianfei Yu","doi":"10.1111/bor.70005","DOIUrl":null,"url":null,"abstract":"The integration of machine learning (ML) into geomorphological research presents significant opportunities for automating the identification of streamlined subglacial bedforms. In their study, Abrahams et al. (2024) introduce bedfinder, an open-source Python tool designed to detect subglacial features with high efficiency. While the tool demonstrates promise, this discussion highlights critical challenges and areas for refinement. The representativeness of the training data set, dominated by sedimentary bed conditions, limits the model's generalizability to regions with diverse bedrock compositions. Additionally, the reliance on binary classifications oversimplifies complex geomorphic settings, reducing the model's adaptability. Performance metrics such as F1 scores, though favourable, warrant cautious interpretation due to class imbalances that may skew predictions. Furthermore, the integration of filtering techniques, while enhancing precision, raises concerns about potential biases from manual data curation. To enhance scientific rigour, future efforts should incorporate diverse data sets, conduct comprehensive evaluations of filtering methods, and explore modular approaches for greater applicability. Addressing these challenges will not only strengthen bedfinder but also contribute to the evolving role of ML in advancing glacial and geomorphological research. This contribution provides a constructive critique to guide future improvements and interdisciplinary applications of this innovative tool.Abrahams et al. (2024) present a cutting-edge approach to the identification of streamlined subglacial bedforms, which are crucial in understanding past glacial dynamics and their impact on geomorphology. The authors have developed an open-source Python tool, bedfinder, that employs supervised ML algorithms – Random Forest and XGBoost – to automate the identification of these features across deglaciated landscapes. This tool is an ambitious attempt to enhance efficiency and accuracy in a domain traditionally burdened by labour-intensive, subjective, and time-consuming manual processes. The authors also provide a thorough validation of the model using an extensive data set of known subglacial bedforms from various regions in the Northern Hemisphere.While the study contributes valuable insights into how ML can automate the mapping of complex glacial features, several critical issues arise from the selection of data, the application of ML methods, and the interpretation of results. Here, we will highlight some of these issues and offer suggestions to enhance the scientific robustness of the study and its potential applications. The concerns outlined here include the representativeness of the training data, the challenges in interpreting the performance metrics, and the potential over-simplifications in model design and validation.A central issue in Abrahams et al. (2024) lies in the selection of the training data. The authors utilize a data set derived primarily from sedimentary bed conditions, which introduces inherent biases in the model's predictive capabilities. Subglacial bedforms, such as streamlined ridges and grooves, are deeply influenced by a variety of factors, including the underlying bedrock composition, topography, and glacial dynamics. As such, the diversity of the landscape in which these features occur plays a crucial role in determining the model's ability to generalize across different glacial environments. The data set employed in this study, however, is heavily weighted towards sedimentary bed, with only limited representation of crystalline bed conditions. This imbalance limits the model's applicability to regions that predominantly feature crystalline or mixed bedrock, thereby potentially reducing its accuracy when applied to environments where these conditions prevail. As a result, the model may struggle to accurately detect and classify subglacial bedforms in these underrepresented areas, which could lead to misinterpretations of the underlying geomorphological processes.The sedimentary dominance in the training data implies that the model is optimized for environments where glacial erosion is more pronounced and where the bedforms are likely to exhibit greater contrast in relief. However, such conditions may not accurately reflect the range of bedform characteristics found in other glacial terrains. For example, glacial features formed over crystalline bedrock are often more subdued in terms of their topographic expression, and they may exhibit unique forms not adequately represented by the existing data set (Skyttä et al. 2023; Courtney-Davies et al. 2024). As a result, the model may struggle to detect such features with high accuracy, leading to underreporting or misclassification of subglacial bedforms in non-sedimentary terrains.Furthermore, the simplified binary classification of bed conditions into sedimentary vs. crystalline, and topography into constrained vs. unconstrained categories, oversimplifies the inherent complexity of glacial landscapes. Real-world geomorphological systems often exhibit a continuum of bedrock and topographic types, with gradual transitions between these categories (Tani 2013; Stewart & Jamieson 2018). The choice to categorize bed conditions so rigidly ignores the possibility of more complex hybrid environments, such as regions where bedrock characteristics and topographic constraints exist in subtle or transitional forms. This approach thus limits the flexibility of the model to handle areas that do not fit neatly into these predefined categories.In fact, glacial bedforms in areas with mixed or transitional bedrock may behave differently under the same climatic conditions (Oetting et al. 2022; Olesen et al. 2023). For instance, areas where sedimentary and crystalline bedrocks coexist may exhibit interactions between ice flow, erosion, and deposition that do not fit the typical patterns observed in homogeneous environments (Xie et al. 2022; Huang et al. 2023). Such regions could present challenges for the model, as the unique interplay between geological and topographical factors would not be captured by the limited training data set.The solution to this issue lies in expanding the training data set to better represent the diversity of glacial environments. Future iterations of the model should incorporate data sets from regions that exhibit a wider variety of bedrock types (including volcanic, metamorphic and mixed compositions), as well as topographic features that span the entire spectrum from constrained to unconstrained settings. By integrating data from areas such as the Arctic, Antarctic, and other high-latitude glaciated regions, the model would gain a broader understanding of how subglacial bedforms manifest under different conditions. Additionally, the classification system could be refined to capture the nuanced variations that exist within different bedrock and topographic types. Instead of relying on a binary classification, the authors could develop a multi-tiered system that includes subcategories for mixed or transitional environments. This would allow for more precise model predictions in regions where the interaction of various geological and topographical factors is crucial. Moreover, incorporating regional-specific metadata and environmental variables, such as glacial flow dynamics, sediment transport characteristics, and variations in ice-sheet thickness, could enhance the predictive power of the model. These variables would provide the necessary context for interpreting the specific glacial history and bedform characteristics of different regions, thus enabling the model to adjust its predictions based on the unique environmental conditions of each location.Abrahams et al. (2024) report impressive performance metrics, with F1 scores exceeding 94% for both Random Forest and the ensemble models. However, these figures must be viewed in light of the significant class imbalance present in the training data. The data set contains over 600 000 positive-relief features, but only a small proportion of these are identified as true subglacial bedforms. This creates an inherent imbalance between the classes of interest (glacial bedforms) and non-glacial features (e.g. rock outcrops, sedimentary anomalies, etc.), which has the potential to skew model performance.In ML tasks involving class imbalance, standard evaluation metrics like accuracy can be misleading, as models can achieve high accuracy simply by predicting the dominant class (in this case, ‘non-bedform’ features) without ever identifying true bedforms. To account for this, the authors use additional metrics, such as precision, recall, and the F1 score, which better reflect the model's ability to detect true positives while minimizing false positives and false negatives. While these metrics indicate a strong model performance, the authors do not sufficiently address the influence of the class imbalance on these results.The authors correctly point out that the model's tendency to overpredict false positives (especially in OOD regions) is a result of prioritizing the detection of bedforms rather than avoiding false positives. While this approach may be acceptable for applications where missing a bedform is more problematic than misclassifying a non-bedform, it remains essential to fully understand the implications of such decisions. False positives – where non-glacial features are misclassified as bedforms – could significantly alter the interpretation of a landscape, leading to incorrect assumptions about past ice-flow dynamics and glacial behaviour.Furthermore, the false negative rate, while implicitly addressed through the model's emphasis on recovering more bedforms, is not sufficiently explored. A higher false positive rate may be preferable in certain circumstances, but the trade-off with false negatives needs careful evaluation. For example, in a region with complex geomorphology, the presence of non-glacial features might lead to substantial false positive detections, which could undermine the scientific value of the model's predictions.To improve the robustness of the results, it would be beneficial to conduct a more comprehensive evaluation of the trade-offs between false positives and false negatives. The authors could employ a precision-recall curve (Saito & Rehmsmeier 2017; Fu et al. 2019; Williams 2021) to assess the model's performance in greater detail. This would allow for a better understanding of how different threshold values for classification impact the balance between precision and recall, ultimately providing a clearer picture of the model's practical utility in various contexts. In addition, applying cross-validation techniques using different training and test sets would provide a more rigorous assessment of the model's generalizability. Since the data set is likely to contain spatial and temporal dependencies, the authors could consider employing spatial cross-validation, where training and test sets are split based on geographical regions rather than random selection (Allen & Kim 2020; Palm et al. 2023). This approach would more accurately capture the model’s ability to generalize across different spatial scales and environmental conditions, further validating its effectiveness for real-world applications.The authors introduce a scientifically driven filtering method, referred to as McKenzie et al. (2022) filtering, which aims to narrow the data set to only those features that are likely to be streamlined subglacial bedforms. This method is intended to mitigate the errors associated with false positives and non-glacial features, ensuring that the model is trained on a high-quality subset of the data. However, while filtering improves model performance by excluding spurious features, it introduces a level of subjectivity into the process that warrants further scrutiny.The reliance on manual filtering – albeit guided by scientific principles – can introduce biases into the data set that are difficult to quantify. For example, the decision to exclude certain bedform types or morphologies based on subjective criteria may unintentionally eliminate features that are of scientific interest or that challenge conventional definitions of subglacial bedforms. As a result, the model may be trained on a data set that is more homogeneous than the actual diversity of features encountered in the field.Moreover, the combination of filtering methods – such as McKenzie filtering alongside Near Miss – deserves more attention in terms of its effectiveness. The authors suggest that the integration of these methods results in improved detection accuracy, but the rationale for this combination is not fully explained. Is it the scientific knowledge embedded in McKenzie filtering that enhances model performance, or is it the statistical adjustment provided by Near Miss? The authors do not sufficiently discuss how these methods interact, which makes it difficult to assess the true impact of their combined use.To enhance the scientific rigour of the filtering process, the authors could benefit from conducting a more transparent comparison of different filtering strategies. Ablation studies (Cosmo et al. 2024; Tseng et al. 2024) could be employed to evaluate the independent contribution of each filtering method and determine whether the combination of McKenzie filtering and Near Miss is truly optimal. Additionally, the authors could test the model's performance on unfiltered data sets to assess whether the added complexity of filtering results in a tangible improvement in predictive accuracy or merely serves to reduce the diversity of features considered. In the future, incorporating a flexible, data-driven filtering mechanism – perhaps informed by unsupervised clustering or density-based spatial analysis – could help automate the feature selection process while reducing potential biases introduced by manual choices. This would allow the model to adjust to new data without overfitting to pre-existing assumptions.Abrahams et al. (2024) present a significant advance in automating the identification of streamlined subglacial bedforms, a task traditionally dominated by manual methods that are prone to errors and biases. While the study demonstrates the potential of machine learning to transform geomorphology, several important challenges remain. These include issues related to data selection, model performance interpretation, and the integration of scientific filtering methods. By addressing these concerns, the authors could significantly improve the robustness and generalizability of the tool. Future iterations of bedfinder could benefit from expanding the training data set, refining the model's ability to handle class imbalance, and introducing a more flexible filtering mechanism.With these improvements, bedfinder could become a cornerstone for future research in glacial geomorphology, offering a powerful, reproducible and accessible tool for understanding past ice dynamics and advancing the field of glaciology.ML: writing – original draft. HZ: formal analysis. TY: conceptualization, supervision, funding acquisition, writing – review and editing.","PeriodicalId":9184,"journal":{"name":"Boreas","volume":"54 2","pages":"273-276"},"PeriodicalIF":2.9000,"publicationDate":"2025-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/bor.70005","citationCount":"0","resultStr":"{\"title\":\"Reassessing the predictive power of bedfinder: insights into machine learning for subglacial bedform detection – Comments on ‘Automatic identification of streamlined subglacial bedforms using machine learning: an open-source Python approach’\",\"authors\":\"Ming Li, Huanyu Zhao, Tianfei Yu\",\"doi\":\"10.1111/bor.70005\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The integration of machine learning (ML) into geomorphological research presents significant opportunities for automating the identification of streamlined subglacial bedforms. In their study, Abrahams et al. (2024) introduce bedfinder, an open-source Python tool designed to detect subglacial features with high efficiency. While the tool demonstrates promise, this discussion highlights critical challenges and areas for refinement. The representativeness of the training data set, dominated by sedimentary bed conditions, limits the model's generalizability to regions with diverse bedrock compositions. Additionally, the reliance on binary classifications oversimplifies complex geomorphic settings, reducing the model's adaptability. Performance metrics such as F1 scores, though favourable, warrant cautious interpretation due to class imbalances that may skew predictions. Furthermore, the integration of filtering techniques, while enhancing precision, raises concerns about potential biases from manual data curation. To enhance scientific rigour, future efforts should incorporate diverse data sets, conduct comprehensive evaluations of filtering methods, and explore modular approaches for greater applicability. Addressing these challenges will not only strengthen bedfinder but also contribute to the evolving role of ML in advancing glacial and geomorphological research. This contribution provides a constructive critique to guide future improvements and interdisciplinary applications of this innovative tool.Abrahams et al. (2024) present a cutting-edge approach to the identification of streamlined subglacial bedforms, which are crucial in understanding past glacial dynamics and their impact on geomorphology. The authors have developed an open-source Python tool, bedfinder, that employs supervised ML algorithms – Random Forest and XGBoost – to automate the identification of these features across deglaciated landscapes. This tool is an ambitious attempt to enhance efficiency and accuracy in a domain traditionally burdened by labour-intensive, subjective, and time-consuming manual processes. The authors also provide a thorough validation of the model using an extensive data set of known subglacial bedforms from various regions in the Northern Hemisphere.While the study contributes valuable insights into how ML can automate the mapping of complex glacial features, several critical issues arise from the selection of data, the application of ML methods, and the interpretation of results. Here, we will highlight some of these issues and offer suggestions to enhance the scientific robustness of the study and its potential applications. The concerns outlined here include the representativeness of the training data, the challenges in interpreting the performance metrics, and the potential over-simplifications in model design and validation.A central issue in Abrahams et al. (2024) lies in the selection of the training data. The authors utilize a data set derived primarily from sedimentary bed conditions, which introduces inherent biases in the model's predictive capabilities. Subglacial bedforms, such as streamlined ridges and grooves, are deeply influenced by a variety of factors, including the underlying bedrock composition, topography, and glacial dynamics. As such, the diversity of the landscape in which these features occur plays a crucial role in determining the model's ability to generalize across different glacial environments. The data set employed in this study, however, is heavily weighted towards sedimentary bed, with only limited representation of crystalline bed conditions. This imbalance limits the model's applicability to regions that predominantly feature crystalline or mixed bedrock, thereby potentially reducing its accuracy when applied to environments where these conditions prevail. As a result, the model may struggle to accurately detect and classify subglacial bedforms in these underrepresented areas, which could lead to misinterpretations of the underlying geomorphological processes.The sedimentary dominance in the training data implies that the model is optimized for environments where glacial erosion is more pronounced and where the bedforms are likely to exhibit greater contrast in relief. However, such conditions may not accurately reflect the range of bedform characteristics found in other glacial terrains. For example, glacial features formed over crystalline bedrock are often more subdued in terms of their topographic expression, and they may exhibit unique forms not adequately represented by the existing data set (Skyttä et al. 2023; Courtney-Davies et al. 2024). As a result, the model may struggle to detect such features with high accuracy, leading to underreporting or misclassification of subglacial bedforms in non-sedimentary terrains.Furthermore, the simplified binary classification of bed conditions into sedimentary vs. crystalline, and topography into constrained vs. unconstrained categories, oversimplifies the inherent complexity of glacial landscapes. Real-world geomorphological systems often exhibit a continuum of bedrock and topographic types, with gradual transitions between these categories (Tani 2013; Stewart & Jamieson 2018). The choice to categorize bed conditions so rigidly ignores the possibility of more complex hybrid environments, such as regions where bedrock characteristics and topographic constraints exist in subtle or transitional forms. This approach thus limits the flexibility of the model to handle areas that do not fit neatly into these predefined categories.In fact, glacial bedforms in areas with mixed or transitional bedrock may behave differently under the same climatic conditions (Oetting et al. 2022; Olesen et al. 2023). For instance, areas where sedimentary and crystalline bedrocks coexist may exhibit interactions between ice flow, erosion, and deposition that do not fit the typical patterns observed in homogeneous environments (Xie et al. 2022; Huang et al. 2023). Such regions could present challenges for the model, as the unique interplay between geological and topographical factors would not be captured by the limited training data set.The solution to this issue lies in expanding the training data set to better represent the diversity of glacial environments. Future iterations of the model should incorporate data sets from regions that exhibit a wider variety of bedrock types (including volcanic, metamorphic and mixed compositions), as well as topographic features that span the entire spectrum from constrained to unconstrained settings. By integrating data from areas such as the Arctic, Antarctic, and other high-latitude glaciated regions, the model would gain a broader understanding of how subglacial bedforms manifest under different conditions. Additionally, the classification system could be refined to capture the nuanced variations that exist within different bedrock and topographic types. Instead of relying on a binary classification, the authors could develop a multi-tiered system that includes subcategories for mixed or transitional environments. This would allow for more precise model predictions in regions where the interaction of various geological and topographical factors is crucial. Moreover, incorporating regional-specific metadata and environmental variables, such as glacial flow dynamics, sediment transport characteristics, and variations in ice-sheet thickness, could enhance the predictive power of the model. These variables would provide the necessary context for interpreting the specific glacial history and bedform characteristics of different regions, thus enabling the model to adjust its predictions based on the unique environmental conditions of each location.Abrahams et al. (2024) report impressive performance metrics, with F1 scores exceeding 94% for both Random Forest and the ensemble models. However, these figures must be viewed in light of the significant class imbalance present in the training data. The data set contains over 600 000 positive-relief features, but only a small proportion of these are identified as true subglacial bedforms. This creates an inherent imbalance between the classes of interest (glacial bedforms) and non-glacial features (e.g. rock outcrops, sedimentary anomalies, etc.), which has the potential to skew model performance.In ML tasks involving class imbalance, standard evaluation metrics like accuracy can be misleading, as models can achieve high accuracy simply by predicting the dominant class (in this case, ‘non-bedform’ features) without ever identifying true bedforms. To account for this, the authors use additional metrics, such as precision, recall, and the F1 score, which better reflect the model's ability to detect true positives while minimizing false positives and false negatives. While these metrics indicate a strong model performance, the authors do not sufficiently address the influence of the class imbalance on these results.The authors correctly point out that the model's tendency to overpredict false positives (especially in OOD regions) is a result of prioritizing the detection of bedforms rather than avoiding false positives. While this approach may be acceptable for applications where missing a bedform is more problematic than misclassifying a non-bedform, it remains essential to fully understand the implications of such decisions. False positives – where non-glacial features are misclassified as bedforms – could significantly alter the interpretation of a landscape, leading to incorrect assumptions about past ice-flow dynamics and glacial behaviour.Furthermore, the false negative rate, while implicitly addressed through the model's emphasis on recovering more bedforms, is not sufficiently explored. A higher false positive rate may be preferable in certain circumstances, but the trade-off with false negatives needs careful evaluation. For example, in a region with complex geomorphology, the presence of non-glacial features might lead to substantial false positive detections, which could undermine the scientific value of the model's predictions.To improve the robustness of the results, it would be beneficial to conduct a more comprehensive evaluation of the trade-offs between false positives and false negatives. The authors could employ a precision-recall curve (Saito & Rehmsmeier 2017; Fu et al. 2019; Williams 2021) to assess the model's performance in greater detail. This would allow for a better understanding of how different threshold values for classification impact the balance between precision and recall, ultimately providing a clearer picture of the model's practical utility in various contexts. In addition, applying cross-validation techniques using different training and test sets would provide a more rigorous assessment of the model's generalizability. Since the data set is likely to contain spatial and temporal dependencies, the authors could consider employing spatial cross-validation, where training and test sets are split based on geographical regions rather than random selection (Allen & Kim 2020; Palm et al. 2023). This approach would more accurately capture the model’s ability to generalize across different spatial scales and environmental conditions, further validating its effectiveness for real-world applications.The authors introduce a scientifically driven filtering method, referred to as McKenzie et al. (2022) filtering, which aims to narrow the data set to only those features that are likely to be streamlined subglacial bedforms. This method is intended to mitigate the errors associated with false positives and non-glacial features, ensuring that the model is trained on a high-quality subset of the data. However, while filtering improves model performance by excluding spurious features, it introduces a level of subjectivity into the process that warrants further scrutiny.The reliance on manual filtering – albeit guided by scientific principles – can introduce biases into the data set that are difficult to quantify. For example, the decision to exclude certain bedform types or morphologies based on subjective criteria may unintentionally eliminate features that are of scientific interest or that challenge conventional definitions of subglacial bedforms. As a result, the model may be trained on a data set that is more homogeneous than the actual diversity of features encountered in the field.Moreover, the combination of filtering methods – such as McKenzie filtering alongside Near Miss – deserves more attention in terms of its effectiveness. The authors suggest that the integration of these methods results in improved detection accuracy, but the rationale for this combination is not fully explained. Is it the scientific knowledge embedded in McKenzie filtering that enhances model performance, or is it the statistical adjustment provided by Near Miss? The authors do not sufficiently discuss how these methods interact, which makes it difficult to assess the true impact of their combined use.To enhance the scientific rigour of the filtering process, the authors could benefit from conducting a more transparent comparison of different filtering strategies. Ablation studies (Cosmo et al. 2024; Tseng et al. 2024) could be employed to evaluate the independent contribution of each filtering method and determine whether the combination of McKenzie filtering and Near Miss is truly optimal. Additionally, the authors could test the model's performance on unfiltered data sets to assess whether the added complexity of filtering results in a tangible improvement in predictive accuracy or merely serves to reduce the diversity of features considered. In the future, incorporating a flexible, data-driven filtering mechanism – perhaps informed by unsupervised clustering or density-based spatial analysis – could help automate the feature selection process while reducing potential biases introduced by manual choices. This would allow the model to adjust to new data without overfitting to pre-existing assumptions.Abrahams et al. (2024) present a significant advance in automating the identification of streamlined subglacial bedforms, a task traditionally dominated by manual methods that are prone to errors and biases. While the study demonstrates the potential of machine learning to transform geomorphology, several important challenges remain. These include issues related to data selection, model performance interpretation, and the integration of scientific filtering methods. By addressing these concerns, the authors could significantly improve the robustness and generalizability of the tool. Future iterations of bedfinder could benefit from expanding the training data set, refining the model's ability to handle class imbalance, and introducing a more flexible filtering mechanism.With these improvements, bedfinder could become a cornerstone for future research in glacial geomorphology, offering a powerful, reproducible and accessible tool for understanding past ice dynamics and advancing the field of glaciology.ML: writing – original draft. HZ: formal analysis. TY: conceptualization, supervision, funding acquisition, writing – review and editing.\",\"PeriodicalId\":9184,\"journal\":{\"name\":\"Boreas\",\"volume\":\"54 2\",\"pages\":\"273-276\"},\"PeriodicalIF\":2.9000,\"publicationDate\":\"2025-03-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.1111/bor.70005\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Boreas\",\"FirstCategoryId\":\"89\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1111/bor.70005\",\"RegionNum\":3,\"RegionCategory\":\"地球科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"GEOGRAPHY, PHYSICAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Boreas","FirstCategoryId":"89","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/bor.70005","RegionNum":3,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"GEOGRAPHY, PHYSICAL","Score":null,"Total":0}

引用次数: 0

摘要

将机器学习（ML）集成到地貌学研究中，为自动识别流线型冰下河床提供了重要的机会。在他们的研究中，Abrahams等人（2024）介绍了bedfinder，这是一种开源的Python工具，旨在高效地探测冰下特征。虽然该工具展示了前景，但本次讨论强调了关键的挑战和需要改进的领域。训练数据集的代表性以沉积层条件为主，限制了模型在基岩成分不同地区的推广能力。此外，对二元分类的依赖过度简化了复杂的地貌设置，降低了模型的适应性。F1成绩等表现指标虽然有利，但由于等级不平衡可能会影响预测，因此需要谨慎解读。此外，过滤技术的集成在提高精度的同时，也引起了对人工数据管理潜在偏见的担忧。为了提高科学的严谨性，未来的工作应该纳入不同的数据集，对过滤方法进行全面的评估，并探索模块化方法以获得更大的适用性。解决这些挑战不仅将加强寻床者，而且有助于ML在推进冰川和地貌学研究中的作用不断发展。这一贡献提供了一个建设性的批评，以指导未来的改进和跨学科的应用这一创新的工具。亚伯拉罕等人（2024）提出了一种识别流线型冰下河床的前沿方法，这对于理解过去的冰川动力学及其对地貌的影响至关重要。作者已经开发了一个开源的Python工具，bedfinder，它使用监督机器学习算法——随机森林和XGBoost——来自动识别冰川消退景观中的这些特征。该工具是一个雄心勃勃的尝试，旨在提高传统上由劳动密集型、主观和耗时的手动过程负担的领域的效率和准确性。作者还利用来自北半球不同地区的已知冰下地层的大量数据集，对该模型进行了彻底的验证。虽然这项研究为机器学习如何自动绘制复杂冰川特征提供了有价值的见解，但从数据的选择、机器学习方法的应用和结果的解释中出现了几个关键问题。在这里，我们将重点介绍其中的一些问题，并提出建议，以提高研究的科学稳健性及其潜在的应用。这里概述的关注点包括训练数据的代表性，解释性能度量的挑战，以及模型设计和验证中潜在的过度简化。Abrahams等人（2024）的核心问题在于训练数据的选择。作者使用的数据集主要来自沉积层条件，这在模型的预测能力中引入了固有的偏差。冰下床型，如流线型山脊和沟槽，深受多种因素的影响，包括下伏基岩组成、地形和冰川动力学。因此，这些特征所在景观的多样性在决定模型在不同冰川环境中推广的能力方面起着至关重要的作用。然而，本研究中使用的数据集严重偏重于沉积层，只有有限的结晶层条件的代表。这种不平衡限制了模型在主要以结晶或混合基岩为特征的地区的适用性，从而潜在地降低了模型在这些条件普遍存在的环境中的准确性。因此，在这些代表性不足的地区，该模型可能难以准确地检测和分类冰下河床，这可能导致对潜在地貌过程的误解。训练数据中的沉积优势表明，该模型适用于冰川侵蚀更明显的环境，以及河床可能表现出更大的起伏对比的环境。然而，这样的条件可能不能准确地反映在其他冰川地形中发现的床型特征的范围。例如，在结晶基岩上形成的冰川特征在地形表达方面往往更为柔和，并且它们可能表现出现有数据集未充分代表的独特形式(Skyttä等人。2023；Courtney-Davies et al. 2024)。因此，该模型可能难以以高精度检测这些特征，从而导致对非沉积地形的冰下河床的少报或错误分类。此外，将地层条件简化为沉积型和结晶型，地形划分为约束型和约束型。不受约束的分类，过度简化了冰川景观固有的复杂性。现实世界的地貌系统通常表现为基岩和地形类型的连续统一体，并在这些类别之间逐渐过渡(Tani 2013；斯图尔特,Jamieson 2018)。对床层条件进行严格分类的选择忽略了更复杂的混合环境的可能性，例如基岩特征和地形限制以微妙或过渡形式存在的区域。因此，这种方法限制了模型处理不完全适合这些预定义类别的区域的灵活性。事实上，混合基岩或过渡基岩地区的冰川床型在相同的气候条件下可能表现不同(Oetting et al. 2022；Olesen et al. 2023)。例如，沉积基岩和结晶基岩共存的地区可能表现出冰流、侵蚀和沉积之间的相互作用，这与在均质环境中观察到的典型模式不相符(Xie et al. 2022；Huang et al. 2023)。这些区域可能对模型构成挑战，因为有限的训练数据集无法捕捉到地质和地形因素之间独特的相互作用。这个问题的解决方案在于扩展训练数据集，以更好地代表冰川环境的多样性。该模型的未来迭代应纳入来自显示更多种基岩类型（包括火山、变质岩和混合成分）的区域的数据集，以及跨越从受约束到无约束设置的整个光谱的地形特征。通过整合来自北极、南极和其他高纬度冰川地区的数据，该模型将更广泛地了解不同条件下冰下河床的表现。此外，分类系统可以被改进，以捕捉存在于不同基岩和地形类型中的细微变化。与其依赖于二元分类，作者可以开发一个多层系统，其中包括混合或过渡环境的子类别。这将允许在各种地质和地形因素的相互作用至关重要的地区进行更精确的模型预测。此外，结合特定区域的元数据和环境变量，如冰川流动动力学、沉积物输运特征和冰盖厚度变化，可以提高模型的预测能力。这些变量将为解释不同地区特定的冰川历史和河床特征提供必要的背景，从而使模型能够根据每个地点独特的环境条件调整其预测。Abrahams等人（2024）报告了令人印象深刻的性能指标，随机森林和集成模型的F1分数都超过了94%。然而，这些数字必须根据训练数据中存在的明显的阶级不平衡来看待。该数据集包含超过60万个正地形特征，但其中只有一小部分被确定为真正的冰下河床。这在感兴趣的类别（冰川床型）和非冰川特征（如岩石露头、沉积异常等）之间造成了固有的不平衡，这有可能扭曲模型的性能。在涉及类别不平衡的机器学习任务中，像准确率这样的标准评估指标可能会产生误导，因为模型可以通过预测主要类别（在这种情况下，“非床型”特征）来实现高精度，而无需识别真正的床型。为了解释这一点，作者使用了额外的指标，如精度、召回率和F1分数，这些指标更好地反映了模型检测真阳性的能力，同时最大限度地减少假阳性和假阴性。虽然这些指标表明了强大的模型性能，但作者没有充分解决类不平衡对这些结果的影响。作者正确地指出，该模型倾向于过度预测假阳性（特别是在OOD区域）是优先检测形态而不是避免假阳性的结果。虽然这种方法对于缺少一个行为形式比错误分类一个非行为形式更有问题的应用程序来说是可以接受的，但是完全理解这种决策的含义仍然是必要的。误报——非冰川特征被错误地分类为河床——可能会显著改变对景观的解释，导致对过去冰流动力学和冰川行为的错误假设。此外，假阴性率虽然通过模型强调恢复更多的床型隐含地解决了问题，但没有得到充分的探讨。在某些情况下，较高的假阳性率可能是可取的，但与假阴性的权衡需要仔细评估。例如，在地貌复杂的地区，非冰川特征的存在可能导致大量的假阳性检测，这可能会破坏模型预测的科学价值。为了提高结果的稳健性，对假阳性和假阴性之间的权衡进行更全面的评估将是有益的。作者可以采用精确召回曲线(Saito &amp；Rehmsmeier 2017;Fu et al. 2019；Williams 2021)来更详细地评估模型的性能。这将有助于更好地理解不同的分类阈值如何影响准确率和召回率之间的平衡，最终为模型在各种环境中的实际效用提供更清晰的图像。此外，使用不同的训练和测试集应用交叉验证技术将为模型的泛化性提供更严格的评估。由于数据集可能包含空间和时间依赖关系，作者可以考虑采用空间交叉验证，其中训练集和测试集根据地理区域而不是随机选择进行分割(Allen &amp；金2020;Palm et al. 2023)。这种方法将更准确地捕捉到模型在不同空间尺度和环境条件下的泛化能力，进一步验证其在现实世界应用中的有效性。作者介绍了一种科学驱动的过滤方法，称为McKenzie等人（2022）过滤，旨在将数据集缩小到可能是流线型冰下河床的特征。该方法旨在减轻与假阳性和非冰川特征相关的误差，确保模型在高质量的数据子集上进行训练。然而，虽然过滤通过排除虚假特征来提高模型性能，但它在需要进一步审查的过程中引入了一定程度的主观性。对人工过滤的依赖——尽管是在科学原理的指导下——可能会给数据集带来难以量化的偏差。例如，根据主观标准排除某些床型类型或形态的决定可能无意中排除了具有科学意义或挑战冰下床型传统定义的特征。因此，该模型可能是在一个数据集上训练的，该数据集比在该领域遇到的实际特征多样性更均匀。此外，过滤方法的组合-例如麦肯齐过滤和近Miss -在其有效性方面值得更多关注。作者认为，这些方法的整合可以提高检测精度，但这种组合的基本原理并没有完全解释。是麦肯齐滤波中嵌入的科学知识提高了模型的性能，还是Near Miss提供的统计调整？作者没有充分讨论这些方法如何相互作用，这使得很难评估它们联合使用的真正影响。为了提高过滤过程的科学严谨性，作者可以对不同的过滤策略进行更透明的比较。消融研究(Cosmo et al. 2024；Tseng et al. 2024)可以用来评估每种滤波方法的独立贡献，并确定McKenzie滤波和Near Miss的组合是否真正最优。此外，作者可以在未过滤的数据集上测试模型的性能，以评估增加过滤的复杂性是否会导致预测准确性的切实改善，或者仅仅是减少所考虑的特征的多样性。在未来，结合一种灵活的、数据驱动的过滤机制——可能是由无监督聚类或基于密度的空间分析提供信息——可以帮助自动化特征选择过程，同时减少人工选择带来的潜在偏差。这将使模型能够适应新的数据，而不会过度拟合预先存在的假设。Abrahams等人（2024）在自动化识别流线型冰下河床方面取得了重大进展，这一任务传统上由人工方法主导，容易出现错误和偏差。虽然这项研究展示了机器学习改变地貌的潜力，但仍然存在一些重要的挑战。这些问题包括与数据选择、模型性能解释和科学过滤方法集成相关的问题。通过解决这些问题，作者可以显著提高工具的健壮性和泛化性。未来的寻床器迭代可以从扩展训练数据集、改进模型处理类不平衡的能力以及引入更灵活的过滤机制中受益。有了这些改进，床上探测仪可能成为未来冰川地貌学研究的基石，为了解过去的冰动力学和推进冰川学领域提供一个强大的、可复制的和可访问的工具。写作——原稿。HZ：形式分析。TY：构思、监督、资金获取、写作、评审和编辑。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Reassessing the predictive power of bedfinder: insights into machine learning for subglacial bedform detection – Comments on ‘Automatic identification of streamlined subglacial bedforms using machine learning: an open-source Python approach’

The integration of machine learning (ML) into geomorphological research presents significant opportunities for automating the identification of streamlined subglacial bedforms. In their study, Abrahams et al. (2024) introduce bedfinder, an open-source Python tool designed to detect subglacial features with high efficiency. While the tool demonstrates promise, this discussion highlights critical challenges and areas for refinement. The representativeness of the training data set, dominated by sedimentary bed conditions, limits the model's generalizability to regions with diverse bedrock compositions. Additionally, the reliance on binary classifications oversimplifies complex geomorphic settings, reducing the model's adaptability. Performance metrics such as F1 scores, though favourable, warrant cautious interpretation due to class imbalances that may skew predictions. Furthermore, the integration of filtering techniques, while enhancing precision, raises concerns about potential biases from manual data curation. To enhance scientific rigour, future efforts should incorporate diverse data sets, conduct comprehensive evaluations of filtering methods, and explore modular approaches for greater applicability. Addressing these challenges will not only strengthen bedfinder but also contribute to the evolving role of ML in advancing glacial and geomorphological research. This contribution provides a constructive critique to guide future improvements and interdisciplinary applications of this innovative tool.

Abrahams et al. (2024) present a cutting-edge approach to the identification of streamlined subglacial bedforms, which are crucial in understanding past glacial dynamics and their impact on geomorphology. The authors have developed an open-source Python tool, bedfinder, that employs supervised ML algorithms – Random Forest and XGBoost – to automate the identification of these features across deglaciated landscapes. This tool is an ambitious attempt to enhance efficiency and accuracy in a domain traditionally burdened by labour-intensive, subjective, and time-consuming manual processes. The authors also provide a thorough validation of the model using an extensive data set of known subglacial bedforms from various regions in the Northern Hemisphere.

While the study contributes valuable insights into how ML can automate the mapping of complex glacial features, several critical issues arise from the selection of data, the application of ML methods, and the interpretation of results. Here, we will highlight some of these issues and offer suggestions to enhance the scientific robustness of the study and its potential applications. The concerns outlined here include the representativeness of the training data, the challenges in interpreting the performance metrics, and the potential over-simplifications in model design and validation.

A central issue in Abrahams et al. (2024) lies in the selection of the training data. The authors utilize a data set derived primarily from sedimentary bed conditions, which introduces inherent biases in the model's predictive capabilities. Subglacial bedforms, such as streamlined ridges and grooves, are deeply influenced by a variety of factors, including the underlying bedrock composition, topography, and glacial dynamics. As such, the diversity of the landscape in which these features occur plays a crucial role in determining the model's ability to generalize across different glacial environments. The data set employed in this study, however, is heavily weighted towards sedimentary bed, with only limited representation of crystalline bed conditions. This imbalance limits the model's applicability to regions that predominantly feature crystalline or mixed bedrock, thereby potentially reducing its accuracy when applied to environments where these conditions prevail. As a result, the model may struggle to accurately detect and classify subglacial bedforms in these underrepresented areas, which could lead to misinterpretations of the underlying geomorphological processes.

The sedimentary dominance in the training data implies that the model is optimized for environments where glacial erosion is more pronounced and where the bedforms are likely to exhibit greater contrast in relief. However, such conditions may not accurately reflect the range of bedform characteristics found in other glacial terrains. For example, glacial features formed over crystalline bedrock are often more subdued in terms of their topographic expression, and they may exhibit unique forms not adequately represented by the existing data set (Skyttä et al. 2023; Courtney-Davies et al. 2024). As a result, the model may struggle to detect such features with high accuracy, leading to underreporting or misclassification of subglacial bedforms in non-sedimentary terrains.

Furthermore, the simplified binary classification of bed conditions into sedimentary vs. crystalline, and topography into constrained vs. unconstrained categories, oversimplifies the inherent complexity of glacial landscapes. Real-world geomorphological systems often exhibit a continuum of bedrock and topographic types, with gradual transitions between these categories (Tani 2013; Stewart & Jamieson 2018). The choice to categorize bed conditions so rigidly ignores the possibility of more complex hybrid environments, such as regions where bedrock characteristics and topographic constraints exist in subtle or transitional forms. This approach thus limits the flexibility of the model to handle areas that do not fit neatly into these predefined categories.

In fact, glacial bedforms in areas with mixed or transitional bedrock may behave differently under the same climatic conditions (Oetting et al. 2022; Olesen et al. 2023). For instance, areas where sedimentary and crystalline bedrocks coexist may exhibit interactions between ice flow, erosion, and deposition that do not fit the typical patterns observed in homogeneous environments (Xie et al. 2022; Huang et al. 2023). Such regions could present challenges for the model, as the unique interplay between geological and topographical factors would not be captured by the limited training data set.

The solution to this issue lies in expanding the training data set to better represent the diversity of glacial environments. Future iterations of the model should incorporate data sets from regions that exhibit a wider variety of bedrock types (including volcanic, metamorphic and mixed compositions), as well as topographic features that span the entire spectrum from constrained to unconstrained settings. By integrating data from areas such as the Arctic, Antarctic, and other high-latitude glaciated regions, the model would gain a broader understanding of how subglacial bedforms manifest under different conditions. Additionally, the classification system could be refined to capture the nuanced variations that exist within different bedrock and topographic types. Instead of relying on a binary classification, the authors could develop a multi-tiered system that includes subcategories for mixed or transitional environments. This would allow for more precise model predictions in regions where the interaction of various geological and topographical factors is crucial. Moreover, incorporating regional-specific metadata and environmental variables, such as glacial flow dynamics, sediment transport characteristics, and variations in ice-sheet thickness, could enhance the predictive power of the model. These variables would provide the necessary context for interpreting the specific glacial history and bedform characteristics of different regions, thus enabling the model to adjust its predictions based on the unique environmental conditions of each location.

Abrahams et al. (2024) report impressive performance metrics, with F1 scores exceeding 94% for both Random Forest and the ensemble models. However, these figures must be viewed in light of the significant class imbalance present in the training data. The data set contains over 600 000 positive-relief features, but only a small proportion of these are identified as true subglacial bedforms. This creates an inherent imbalance between the classes of interest (glacial bedforms) and non-glacial features (e.g. rock outcrops, sedimentary anomalies, etc.), which has the potential to skew model performance.

In ML tasks involving class imbalance, standard evaluation metrics like accuracy can be misleading, as models can achieve high accuracy simply by predicting the dominant class (in this case, ‘non-bedform’ features) without ever identifying true bedforms. To account for this, the authors use additional metrics, such as precision, recall, and the F1 score, which better reflect the model's ability to detect true positives while minimizing false positives and false negatives. While these metrics indicate a strong model performance, the authors do not sufficiently address the influence of the class imbalance on these results.

The authors correctly point out that the model's tendency to overpredict false positives (especially in OOD regions) is a result of prioritizing the detection of bedforms rather than avoiding false positives. While this approach may be acceptable for applications where missing a bedform is more problematic than misclassifying a non-bedform, it remains essential to fully understand the implications of such decisions. False positives – where non-glacial features are misclassified as bedforms – could significantly alter the interpretation of a landscape, leading to incorrect assumptions about past ice-flow dynamics and glacial behaviour.

Furthermore, the false negative rate, while implicitly addressed through the model's emphasis on recovering more bedforms, is not sufficiently explored. A higher false positive rate may be preferable in certain circumstances, but the trade-off with false negatives needs careful evaluation. For example, in a region with complex geomorphology, the presence of non-glacial features might lead to substantial false positive detections, which could undermine the scientific value of the model's predictions.

To improve the robustness of the results, it would be beneficial to conduct a more comprehensive evaluation of the trade-offs between false positives and false negatives. The authors could employ a precision-recall curve (Saito & Rehmsmeier 2017; Fu et al. 2019; Williams 2021) to assess the model's performance in greater detail. This would allow for a better understanding of how different threshold values for classification impact the balance between precision and recall, ultimately providing a clearer picture of the model's practical utility in various contexts. In addition, applying cross-validation techniques using different training and test sets would provide a more rigorous assessment of the model's generalizability. Since the data set is likely to contain spatial and temporal dependencies, the authors could consider employing spatial cross-validation, where training and test sets are split based on geographical regions rather than random selection (Allen & Kim 2020; Palm et al. 2023). This approach would more accurately capture the model’s ability to generalize across different spatial scales and environmental conditions, further validating its effectiveness for real-world applications.

The authors introduce a scientifically driven filtering method, referred to as McKenzie et al. (2022) filtering, which aims to narrow the data set to only those features that are likely to be streamlined subglacial bedforms. This method is intended to mitigate the errors associated with false positives and non-glacial features, ensuring that the model is trained on a high-quality subset of the data. However, while filtering improves model performance by excluding spurious features, it introduces a level of subjectivity into the process that warrants further scrutiny.

The reliance on manual filtering – albeit guided by scientific principles – can introduce biases into the data set that are difficult to quantify. For example, the decision to exclude certain bedform types or morphologies based on subjective criteria may unintentionally eliminate features that are of scientific interest or that challenge conventional definitions of subglacial bedforms. As a result, the model may be trained on a data set that is more homogeneous than the actual diversity of features encountered in the field.

Moreover, the combination of filtering methods – such as McKenzie filtering alongside Near Miss – deserves more attention in terms of its effectiveness. The authors suggest that the integration of these methods results in improved detection accuracy, but the rationale for this combination is not fully explained. Is it the scientific knowledge embedded in McKenzie filtering that enhances model performance, or is it the statistical adjustment provided by Near Miss? The authors do not sufficiently discuss how these methods interact, which makes it difficult to assess the true impact of their combined use.

To enhance the scientific rigour of the filtering process, the authors could benefit from conducting a more transparent comparison of different filtering strategies. Ablation studies (Cosmo et al. 2024; Tseng et al. 2024) could be employed to evaluate the independent contribution of each filtering method and determine whether the combination of McKenzie filtering and Near Miss is truly optimal. Additionally, the authors could test the model's performance on unfiltered data sets to assess whether the added complexity of filtering results in a tangible improvement in predictive accuracy or merely serves to reduce the diversity of features considered. In the future, incorporating a flexible, data-driven filtering mechanism – perhaps informed by unsupervised clustering or density-based spatial analysis – could help automate the feature selection process while reducing potential biases introduced by manual choices. This would allow the model to adjust to new data without overfitting to pre-existing assumptions.

Abrahams et al. (2024) present a significant advance in automating the identification of streamlined subglacial bedforms, a task traditionally dominated by manual methods that are prone to errors and biases. While the study demonstrates the potential of machine learning to transform geomorphology, several important challenges remain. These include issues related to data selection, model performance interpretation, and the integration of scientific filtering methods. By addressing these concerns, the authors could significantly improve the robustness and generalizability of the tool. Future iterations of bedfinder could benefit from expanding the training data set, refining the model's ability to handle class imbalance, and introducing a more flexible filtering mechanism.

With these improvements, bedfinder could become a cornerstone for future research in glacial geomorphology, offering a powerful, reproducible and accessible tool for understanding past ice dynamics and advancing the field of glaciology.

ML: writing – original draft. HZ: formal analysis. TY: conceptualization, supervision, funding acquisition, writing – review and editing.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Boreas 地学-地球科学综合

CiteScore

5.90

自引率

4.50%

发文量

审稿时长

>12 weeks

期刊介绍： Boreas has been published since 1972. Articles of wide international interest from all branches of Quaternary research are published. Biological as well as non-biological aspects of the Quaternary environment, in both glaciated and non-glaciated areas, are dealt with: Climate, shore displacement, glacial features, landforms, sediments, organisms and their habitat, and stratigraphical and chronological relationships. Anticipated international interest, at least within a continent or a considerable part of it, is a main criterion for the acceptance of papers. Besides articles, short items like discussion contributions and book reviews are published.