利用自动机器学习对碎屑锆石U-Pb年龄分布进行分类

IF 3.2 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS
Jack W. Fekete , Glenn R. Sharman , Xiao Huang
{"title":"利用自动机器学习对碎屑锆石U-Pb年龄分布进行分类","authors":"Jack W. Fekete ,&nbsp;Glenn R. Sharman ,&nbsp;Xiao Huang","doi":"10.1016/j.acags.2025.100251","DOIUrl":null,"url":null,"abstract":"<div><div>The prodigious use of detrital zircon U-Pb geochronology for provenance studies in recent decades has led many researchers to amass extensive datasets (&gt;100,000 dates). When displayed as age distributions, individual samples are traditionally compared using visual inspection and statistical methods, which can become time-consuming and challenging when using large datasets. We propose that machine learning (ML) can more efficiently classify a sample by its source using detrital zircon U-Pb age distributions. Specifically, we hypothesize that automated machine learning (AutoML), which optimizes algorithm selection and hyperparameters, will outperform an unoptimized Random Forest (RF) classifier and the cross-correlation coefficient (R<sup>2</sup>), a commonly used metric for comparing age distributions. We test this approach using a well-constrained synthetic dataset and a natural dataset from the Jurassic-Eocene North American Cordillera. In synthetic experiments, AutoML models effectively classify samples by their sources when inter-source similarity across few sources is low to moderate and samples have more than ∼50 analyses. However, the effectiveness of AutoML is highly dependent on sample size and the variability of age modes within the data. Applied to the North American Cordillera dataset, AutoML achieves an ∼0.91 F<sub>1</sub> score when predicting between foreland and forearc basin tectonic settings and an ∼0.71 F<sub>1</sub> score when predicting subbasins within these settings, outperforming both RF and R<sup>2</sup>. Moreover, AutoML identifies discriminating age populations between groups, with the average feature importance of 100 models highlighting the 145–125 Ma age range, corresponding to a magmatic lull of the Cordilleran magmatic arc. These results demonstrate AutoML's potential as a powerful predictive and interpretive tool in detrital zircon studies.</div></div>","PeriodicalId":33804,"journal":{"name":"Applied Computing and Geosciences","volume":"26 ","pages":"Article 100251"},"PeriodicalIF":3.2000,"publicationDate":"2025-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Classifying detrital zircon U-Pb age distributions using automated machine learning\",\"authors\":\"Jack W. Fekete ,&nbsp;Glenn R. Sharman ,&nbsp;Xiao Huang\",\"doi\":\"10.1016/j.acags.2025.100251\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>The prodigious use of detrital zircon U-Pb geochronology for provenance studies in recent decades has led many researchers to amass extensive datasets (&gt;100,000 dates). When displayed as age distributions, individual samples are traditionally compared using visual inspection and statistical methods, which can become time-consuming and challenging when using large datasets. We propose that machine learning (ML) can more efficiently classify a sample by its source using detrital zircon U-Pb age distributions. Specifically, we hypothesize that automated machine learning (AutoML), which optimizes algorithm selection and hyperparameters, will outperform an unoptimized Random Forest (RF) classifier and the cross-correlation coefficient (R<sup>2</sup>), a commonly used metric for comparing age distributions. We test this approach using a well-constrained synthetic dataset and a natural dataset from the Jurassic-Eocene North American Cordillera. In synthetic experiments, AutoML models effectively classify samples by their sources when inter-source similarity across few sources is low to moderate and samples have more than ∼50 analyses. However, the effectiveness of AutoML is highly dependent on sample size and the variability of age modes within the data. Applied to the North American Cordillera dataset, AutoML achieves an ∼0.91 F<sub>1</sub> score when predicting between foreland and forearc basin tectonic settings and an ∼0.71 F<sub>1</sub> score when predicting subbasins within these settings, outperforming both RF and R<sup>2</sup>. Moreover, AutoML identifies discriminating age populations between groups, with the average feature importance of 100 models highlighting the 145–125 Ma age range, corresponding to a magmatic lull of the Cordilleran magmatic arc. These results demonstrate AutoML's potential as a powerful predictive and interpretive tool in detrital zircon studies.</div></div>\",\"PeriodicalId\":33804,\"journal\":{\"name\":\"Applied Computing and Geosciences\",\"volume\":\"26 \",\"pages\":\"Article 100251\"},\"PeriodicalIF\":3.2000,\"publicationDate\":\"2025-05-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Applied Computing and Geosciences\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2590197425000333\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Computing and Geosciences","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2590197425000333","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0

摘要

近几十年来,碎屑锆石U-Pb年代学在物源研究中的广泛应用,使许多研究人员积累了大量的数据集(10万个日期)。当显示为年龄分布时,单个样本通常使用目视检查和统计方法进行比较,当使用大型数据集时,这可能会变得耗时且具有挑战性。我们提出机器学习(ML)可以使用碎屑锆石U-Pb年龄分布更有效地根据其来源对样品进行分类。具体来说,我们假设优化算法选择和超参数的自动机器学习(AutoML)将优于未优化的随机森林(RF)分类器和相互关联系数(R2),后者是比较年龄分布的常用指标。我们使用一个约束良好的合成数据集和一个来自侏罗纪-始新世北美科迪勒拉的自然数据集来测试这种方法。在合成实验中,当几个来源之间的源间相似性较低或中等,并且样本有超过50个分析时,AutoML模型可以有效地根据它们的来源对样本进行分类。然而,AutoML的有效性高度依赖于样本大小和数据中年龄模式的可变性。应用于北美科迪勒拉数据集,AutoML在预测前陆盆地和前弧盆地构造环境之间的F1得分为~ 0.91,在预测这些构造环境中的子盆地时F1得分为~ 0.71,优于RF和R2。此外,AutoML识别了不同组之间的判别年龄群,100个模型的平均特征重要性突出了145-125 Ma的年龄范围,对应于科迪勒拉岩浆弧的岩浆间歇期。这些结果证明了AutoML在碎屑锆石研究中作为一种强大的预测和解释工具的潜力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Classifying detrital zircon U-Pb age distributions using automated machine learning
The prodigious use of detrital zircon U-Pb geochronology for provenance studies in recent decades has led many researchers to amass extensive datasets (>100,000 dates). When displayed as age distributions, individual samples are traditionally compared using visual inspection and statistical methods, which can become time-consuming and challenging when using large datasets. We propose that machine learning (ML) can more efficiently classify a sample by its source using detrital zircon U-Pb age distributions. Specifically, we hypothesize that automated machine learning (AutoML), which optimizes algorithm selection and hyperparameters, will outperform an unoptimized Random Forest (RF) classifier and the cross-correlation coefficient (R2), a commonly used metric for comparing age distributions. We test this approach using a well-constrained synthetic dataset and a natural dataset from the Jurassic-Eocene North American Cordillera. In synthetic experiments, AutoML models effectively classify samples by their sources when inter-source similarity across few sources is low to moderate and samples have more than ∼50 analyses. However, the effectiveness of AutoML is highly dependent on sample size and the variability of age modes within the data. Applied to the North American Cordillera dataset, AutoML achieves an ∼0.91 F1 score when predicting between foreland and forearc basin tectonic settings and an ∼0.71 F1 score when predicting subbasins within these settings, outperforming both RF and R2. Moreover, AutoML identifies discriminating age populations between groups, with the average feature importance of 100 models highlighting the 145–125 Ma age range, corresponding to a magmatic lull of the Cordilleran magmatic arc. These results demonstrate AutoML's potential as a powerful predictive and interpretive tool in detrital zircon studies.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Applied Computing and Geosciences
Applied Computing and Geosciences Computer Science-General Computer Science
CiteScore
5.50
自引率
0.00%
发文量
23
审稿时长
5 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信