Enhancing machine learning thermobarometry for clinopyroxene-bearing magmas

IF 4.2 2区地球科学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Computers & Geosciences Pub Date : 2024-08-31 DOI:10.1016/j.cageo.2024.105707

Mónica Ágreda-López , Valerio Parodi , Alessandro Musu , Corin Jorgenson , Alessandro Carfì , Fulvio Mastrogiovanni , Luca Caricchi , Diego Perugini , Maurizio Petrelli

{"title":"Enhancing machine learning thermobarometry for clinopyroxene-bearing magmas","authors":"Mónica Ágreda-López , Valerio Parodi , Alessandro Musu , Corin Jorgenson , Alessandro Carfì , Fulvio Mastrogiovanni , Luca Caricchi , Diego Perugini , Maurizio Petrelli","doi":"10.1016/j.cageo.2024.105707","DOIUrl":null,"url":null,"abstract":"<div><p>In this study, we proposed a general workflow that aims to enhance the ML-based geothermobarometer modelling. Our workflow focuses on three key areas. Firstly, we developed a robust pre-processing pipeline that addresses data imbalance, feature engineering, and data augmentation. Secondly, we assessed modelling errors using a Monte Carlo approach to quantify the impact of analytical uncertainties on the final pressure and temperature estimates. Thirdly, we implemented a robust strategy to validate and test the ML models to avoid over- and under-fitting issues while correcting biases associated with the application of specific ML models (i.e., tree-based ensembles).</p><p>To facilitate the use of our workflow, we have developed a web app (<span><span>https://bit.ly/ml-pt-web</span><svg><path></path></svg></span>) and a Python module (<span><span>https://bit.ly/ml-pt-py</span><svg><path></path></svg></span>). The robustness of this strategy has been tested on two calibrations: clinopyroxene (cpx) and clinopyroxene-liquid (cpx-liq). Our results show a significant reduction in errors compared to the baseline model, as well as good generalization ability on an independent external dataset. The Root Mean Squared Errors are 57 °C and 2.5 kbar for the cpx calibration, and 36 °C and 2.1 kbar for the cpx-liq calibration. Finally, our models show improved outcomes on the external dataset compared to existing ML and classical cpx and cpx-liq thermobarometers.</p></div>","PeriodicalId":55221,"journal":{"name":"Computers & Geosciences","volume":"193 ","pages":"Article 105707"},"PeriodicalIF":4.2000,"publicationDate":"2024-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0098300424001900/pdfft?md5=35a76aa189a72d9015dd976686c4e57f&pid=1-s2.0-S0098300424001900-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Geosciences","FirstCategoryId":"89","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0098300424001900","RegionNum":2,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

Abstract

In this study, we proposed a general workflow that aims to enhance the ML-based geothermobarometer modelling. Our workflow focuses on three key areas. Firstly, we developed a robust pre-processing pipeline that addresses data imbalance, feature engineering, and data augmentation. Secondly, we assessed modelling errors using a Monte Carlo approach to quantify the impact of analytical uncertainties on the final pressure and temperature estimates. Thirdly, we implemented a robust strategy to validate and test the ML models to avoid over- and under-fitting issues while correcting biases associated with the application of specific ML models (i.e., tree-based ensembles).

To facilitate the use of our workflow, we have developed a web app (https://bit.ly/ml-pt-web) and a Python module (https://bit.ly/ml-pt-py). The robustness of this strategy has been tested on two calibrations: clinopyroxene (cpx) and clinopyroxene-liquid (cpx-liq). Our results show a significant reduction in errors compared to the baseline model, as well as good generalization ability on an independent external dataset. The Root Mean Squared Errors are 57 °C and 2.5 kbar for the cpx calibration, and 36 °C and 2.1 kbar for the cpx-liq calibration. Finally, our models show improved outcomes on the external dataset compared to existing ML and classical cpx and cpx-liq thermobarometers.

查看原文本刊更多论文

增强含烊辉石岩浆的机器学习热压测量法

在本研究中，我们提出了一个通用工作流程，旨在增强基于 ML 的地温热压计建模。我们的工作流程侧重于三个关键领域。首先，我们开发了一个强大的预处理管道，以解决数据不平衡、特征工程和数据增强等问题。其次，我们使用蒙特卡罗方法评估建模误差，量化分析不确定性对最终压力和温度估计值的影响。第三，我们实施了一种稳健的策略来验证和测试 ML 模型，以避免过度拟合和拟合不足的问题，同时纠正与应用特定 ML 模型（即基于树的集合）相关的偏差。为了方便使用我们的工作流程，我们开发了一个网络应用程序 (https://bit.ly/ml-pt-web) 和一个 Python 模块 (https://bit.ly/ml-pt-py)。我们在两个定标中测试了这一策略的稳健性：clinopyroxene (cpx) 和 clinopyroxene-liquid (cpx-liq)。结果表明，与基线模型相比，误差明显减少，而且在独立的外部数据集上具有良好的泛化能力。cpx 标定的均方根误差为 57 ℃ 和 2.5 千巴，cpx-liq 标定的均方根误差为 36 ℃ 和 2.1 千巴。最后，与现有的 ML 和经典 cpx 和 cpx-liq 温度计相比，我们的模型在外部数据集上显示出更好的结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Computers & Geosciences 地学-地球科学综合

CiteScore

9.30

自引率

6.80%

发文量

164

审稿时长

3.4 months

期刊介绍： Computers & Geosciences publishes high impact, original research at the interface between Computer Sciences and Geosciences. Publications should apply modern computer science paradigms, whether computational or informatics-based, to address problems in the geosciences.