利用自动机器学习对碎屑锆石U-Pb年龄分布进行分类

IF 3.2 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Applied Computing and Geosciences Pub Date : 2025-05-16 DOI:10.1016/j.acags.2025.100251

Jack W. Fekete , Glenn R. Sharman , Xiao Huang

{"title":"利用自动机器学习对碎屑锆石U-Pb年龄分布进行分类","authors":"Jack W. Fekete , Glenn R. Sharman , Xiao Huang","doi":"10.1016/j.acags.2025.100251","DOIUrl":null,"url":null,"abstract":"<div><div>The prodigious use of detrital zircon U-Pb geochronology for provenance studies in recent decades has led many researchers to amass extensive datasets (>100,000 dates). When displayed as age distributions, individual samples are traditionally compared using visual inspection and statistical methods, which can become time-consuming and challenging when using large datasets. We propose that machine learning (ML) can more efficiently classify a sample by its source using detrital zircon U-Pb age distributions. Specifically, we hypothesize that automated machine learning (AutoML), which optimizes algorithm selection and hyperparameters, will outperform an unoptimized Random Forest (RF) classifier and the cross-correlation coefficient (R<sup>2</sup>), a commonly used metric for comparing age distributions. We test this approach using a well-constrained synthetic dataset and a natural dataset from the Jurassic-Eocene North American Cordillera. In synthetic experiments, AutoML models effectively classify samples by their sources when inter-source similarity across few sources is low to moderate and samples have more than ∼50 analyses. However, the effectiveness of AutoML is highly dependent on sample size and the variability of age modes within the data. Applied to the North American Cordillera dataset, AutoML achieves an ∼0.91 F<sub>1</sub> score when predicting between foreland and forearc basin tectonic settings and an ∼0.71 F<sub>1</sub> score when predicting subbasins within these settings, outperforming both RF and R<sup>2</sup>. Moreover, AutoML identifies discriminating age populations between groups, with the average feature importance of 100 models highlighting the 145–125 Ma age range, corresponding to a magmatic lull of the Cordilleran magmatic arc. These results demonstrate AutoML's potential as a powerful predictive and interpretive tool in detrital zircon studies.</div></div>","PeriodicalId":33804,"journal":{"name":"Applied Computing and Geosciences","volume":"26 ","pages":"Article 100251"},"PeriodicalIF":3.2000,"publicationDate":"2025-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Classifying detrital zircon U-Pb age distributions using automated machine learning\",\"authors\":\"Jack W. Fekete , Glenn R. Sharman , Xiao Huang\",\"doi\":\"10.1016/j.acags.2025.100251\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>The prodigious use of detrital zircon U-Pb geochronology for provenance studies in recent decades has led many researchers to amass extensive datasets (>100,000 dates). When displayed as age distributions, individual samples are traditionally compared using visual inspection and statistical methods, which can become time-consuming and challenging when using large datasets. We propose that machine learning (ML) can more efficiently classify a sample by its source using detrital zircon U-Pb age distributions. Specifically, we hypothesize that automated machine learning (AutoML), which optimizes algorithm selection and hyperparameters, will outperform an unoptimized Random Forest (RF) classifier and the cross-correlation coefficient (R<sup>2</sup>), a commonly used metric for comparing age distributions. We test this approach using a well-constrained synthetic dataset and a natural dataset from the Jurassic-Eocene North American Cordillera. In synthetic experiments, AutoML models effectively classify samples by their sources when inter-source similarity across few sources is low to moderate and samples have more than ∼50 analyses. However, the effectiveness of AutoML is highly dependent on sample size and the variability of age modes within the data. Applied to the North American Cordillera dataset, AutoML achieves an ∼0.91 F<sub>1</sub> score when predicting between foreland and forearc basin tectonic settings and an ∼0.71 F<sub>1</sub> score when predicting subbasins within these settings, outperforming both RF and R<sup>2</sup>. Moreover, AutoML identifies discriminating age populations between groups, with the average feature importance of 100 models highlighting the 145–125 Ma age range, corresponding to a magmatic lull of the Cordilleran magmatic arc. These results demonstrate AutoML's potential as a powerful predictive and interpretive tool in detrital zircon studies.</div></div>\",\"PeriodicalId\":33804,\"journal\":{\"name\":\"Applied Computing and Geosciences\",\"volume\":\"26 \",\"pages\":\"Article 100251\"},\"PeriodicalIF\":3.2000,\"publicationDate\":\"2025-05-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Applied Computing and Geosciences\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2590197425000333\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Computing and Geosciences","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2590197425000333","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

摘要

近几十年来，碎屑锆石U-Pb年代学在物源研究中的广泛应用，使许多研究人员积累了大量的数据集（10万个日期）。当显示为年龄分布时，单个样本通常使用目视检查和统计方法进行比较，当使用大型数据集时，这可能会变得耗时且具有挑战性。我们提出机器学习（ML）可以使用碎屑锆石U-Pb年龄分布更有效地根据其来源对样品进行分类。具体来说，我们假设优化算法选择和超参数的自动机器学习（AutoML）将优于未优化的随机森林（RF）分类器和相互关联系数（R2），后者是比较年龄分布的常用指标。我们使用一个约束良好的合成数据集和一个来自侏罗纪-始新世北美科迪勒拉的自然数据集来测试这种方法。在合成实验中，当几个来源之间的源间相似性较低或中等，并且样本有超过50个分析时，AutoML模型可以有效地根据它们的来源对样本进行分类。然而，AutoML的有效性高度依赖于样本大小和数据中年龄模式的可变性。应用于北美科迪勒拉数据集，AutoML在预测前陆盆地和前弧盆地构造环境之间的F1得分为~ 0.91，在预测这些构造环境中的子盆地时F1得分为~ 0.71，优于RF和R2。此外，AutoML识别了不同组之间的判别年龄群，100个模型的平均特征重要性突出了145-125 Ma的年龄范围，对应于科迪勒拉岩浆弧的岩浆间歇期。这些结果证明了AutoML在碎屑锆石研究中作为一种强大的预测和解释工具的潜力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Classifying detrital zircon U-Pb age distributions using automated machine learning

The prodigious use of detrital zircon U-Pb geochronology for provenance studies in recent decades has led many researchers to amass extensive datasets (>100,000 dates). When displayed as age distributions, individual samples are traditionally compared using visual inspection and statistical methods, which can become time-consuming and challenging when using large datasets. We propose that machine learning (ML) can more efficiently classify a sample by its source using detrital zircon U-Pb age distributions. Specifically, we hypothesize that automated machine learning (AutoML), which optimizes algorithm selection and hyperparameters, will outperform an unoptimized Random Forest (RF) classifier and the cross-correlation coefficient (R²), a commonly used metric for comparing age distributions. We test this approach using a well-constrained synthetic dataset and a natural dataset from the Jurassic-Eocene North American Cordillera. In synthetic experiments, AutoML models effectively classify samples by their sources when inter-source similarity across few sources is low to moderate and samples have more than ∼50 analyses. However, the effectiveness of AutoML is highly dependent on sample size and the variability of age modes within the data. Applied to the North American Cordillera dataset, AutoML achieves an ∼0.91 F₁ score when predicting between foreland and forearc basin tectonic settings and an ∼0.71 F₁ score when predicting subbasins within these settings, outperforming both RF and R². Moreover, AutoML identifies discriminating age populations between groups, with the average feature importance of 100 models highlighting the 145–125 Ma age range, corresponding to a magmatic lull of the Cordilleran magmatic arc. These results demonstrate AutoML's potential as a powerful predictive and interpretive tool in detrital zircon studies.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Applied Computing and Geosciences Computer Science-General Computer Science

CiteScore

5.50

自引率

0.00%

发文量

审稿时长

5 weeks