{"title":"利用自动机器学习对碎屑锆石U-Pb年龄分布进行分类","authors":"Jack W. Fekete , Glenn R. Sharman , Xiao Huang","doi":"10.1016/j.acags.2025.100251","DOIUrl":null,"url":null,"abstract":"<div><div>The prodigious use of detrital zircon U-Pb geochronology for provenance studies in recent decades has led many researchers to amass extensive datasets (>100,000 dates). When displayed as age distributions, individual samples are traditionally compared using visual inspection and statistical methods, which can become time-consuming and challenging when using large datasets. We propose that machine learning (ML) can more efficiently classify a sample by its source using detrital zircon U-Pb age distributions. Specifically, we hypothesize that automated machine learning (AutoML), which optimizes algorithm selection and hyperparameters, will outperform an unoptimized Random Forest (RF) classifier and the cross-correlation coefficient (R<sup>2</sup>), a commonly used metric for comparing age distributions. We test this approach using a well-constrained synthetic dataset and a natural dataset from the Jurassic-Eocene North American Cordillera. In synthetic experiments, AutoML models effectively classify samples by their sources when inter-source similarity across few sources is low to moderate and samples have more than ∼50 analyses. However, the effectiveness of AutoML is highly dependent on sample size and the variability of age modes within the data. Applied to the North American Cordillera dataset, AutoML achieves an ∼0.91 F<sub>1</sub> score when predicting between foreland and forearc basin tectonic settings and an ∼0.71 F<sub>1</sub> score when predicting subbasins within these settings, outperforming both RF and R<sup>2</sup>. Moreover, AutoML identifies discriminating age populations between groups, with the average feature importance of 100 models highlighting the 145–125 Ma age range, corresponding to a magmatic lull of the Cordilleran magmatic arc. These results demonstrate AutoML's potential as a powerful predictive and interpretive tool in detrital zircon studies.</div></div>","PeriodicalId":33804,"journal":{"name":"Applied Computing and Geosciences","volume":"26 ","pages":"Article 100251"},"PeriodicalIF":3.2000,"publicationDate":"2025-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Classifying detrital zircon U-Pb age distributions using automated machine learning\",\"authors\":\"Jack W. Fekete , Glenn R. Sharman , Xiao Huang\",\"doi\":\"10.1016/j.acags.2025.100251\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>The prodigious use of detrital zircon U-Pb geochronology for provenance studies in recent decades has led many researchers to amass extensive datasets (>100,000 dates). When displayed as age distributions, individual samples are traditionally compared using visual inspection and statistical methods, which can become time-consuming and challenging when using large datasets. We propose that machine learning (ML) can more efficiently classify a sample by its source using detrital zircon U-Pb age distributions. Specifically, we hypothesize that automated machine learning (AutoML), which optimizes algorithm selection and hyperparameters, will outperform an unoptimized Random Forest (RF) classifier and the cross-correlation coefficient (R<sup>2</sup>), a commonly used metric for comparing age distributions. We test this approach using a well-constrained synthetic dataset and a natural dataset from the Jurassic-Eocene North American Cordillera. In synthetic experiments, AutoML models effectively classify samples by their sources when inter-source similarity across few sources is low to moderate and samples have more than ∼50 analyses. However, the effectiveness of AutoML is highly dependent on sample size and the variability of age modes within the data. Applied to the North American Cordillera dataset, AutoML achieves an ∼0.91 F<sub>1</sub> score when predicting between foreland and forearc basin tectonic settings and an ∼0.71 F<sub>1</sub> score when predicting subbasins within these settings, outperforming both RF and R<sup>2</sup>. Moreover, AutoML identifies discriminating age populations between groups, with the average feature importance of 100 models highlighting the 145–125 Ma age range, corresponding to a magmatic lull of the Cordilleran magmatic arc. These results demonstrate AutoML's potential as a powerful predictive and interpretive tool in detrital zircon studies.</div></div>\",\"PeriodicalId\":33804,\"journal\":{\"name\":\"Applied Computing and Geosciences\",\"volume\":\"26 \",\"pages\":\"Article 100251\"},\"PeriodicalIF\":3.2000,\"publicationDate\":\"2025-05-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Applied Computing and Geosciences\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2590197425000333\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Computing and Geosciences","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2590197425000333","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
Classifying detrital zircon U-Pb age distributions using automated machine learning
The prodigious use of detrital zircon U-Pb geochronology for provenance studies in recent decades has led many researchers to amass extensive datasets (>100,000 dates). When displayed as age distributions, individual samples are traditionally compared using visual inspection and statistical methods, which can become time-consuming and challenging when using large datasets. We propose that machine learning (ML) can more efficiently classify a sample by its source using detrital zircon U-Pb age distributions. Specifically, we hypothesize that automated machine learning (AutoML), which optimizes algorithm selection and hyperparameters, will outperform an unoptimized Random Forest (RF) classifier and the cross-correlation coefficient (R2), a commonly used metric for comparing age distributions. We test this approach using a well-constrained synthetic dataset and a natural dataset from the Jurassic-Eocene North American Cordillera. In synthetic experiments, AutoML models effectively classify samples by their sources when inter-source similarity across few sources is low to moderate and samples have more than ∼50 analyses. However, the effectiveness of AutoML is highly dependent on sample size and the variability of age modes within the data. Applied to the North American Cordillera dataset, AutoML achieves an ∼0.91 F1 score when predicting between foreland and forearc basin tectonic settings and an ∼0.71 F1 score when predicting subbasins within these settings, outperforming both RF and R2. Moreover, AutoML identifies discriminating age populations between groups, with the average feature importance of 100 models highlighting the 145–125 Ma age range, corresponding to a magmatic lull of the Cordilleran magmatic arc. These results demonstrate AutoML's potential as a powerful predictive and interpretive tool in detrital zircon studies.