使用 SHAP 值的机器学习模型可解释性:应用于火成岩分类任务

IF 2.6 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS
Antonella S. Antonini , Juan Tanzola , Lucía Asiain , Gabriela R. Ferracutti , Silvia M. Castro , Ernesto A. Bjerg , María Luján Ganuza
{"title":"使用 SHAP 值的机器学习模型可解释性:应用于火成岩分类任务","authors":"Antonella S. Antonini ,&nbsp;Juan Tanzola ,&nbsp;Lucía Asiain ,&nbsp;Gabriela R. Ferracutti ,&nbsp;Silvia M. Castro ,&nbsp;Ernesto A. Bjerg ,&nbsp;María Luján Ganuza","doi":"10.1016/j.acags.2024.100178","DOIUrl":null,"url":null,"abstract":"<div><p>El Fierro intrusive body is one of the bodies that compose the La Jovita–Las Aguilas mafic–ultramafic belt, located in the Sierra Grande de San Luis, Argentina. The units of this belt carry a base metal sulfide (BMS) mineralization and platinum group minerals (PGM). The macroscopic description of mafic and ultramafic rocks, as is usually done by the mining exploration companies, leads to an imprecise modal classification of the rocks. In this study, we develop a random forest-based prediction model, which uses geochemical parameters to classify mafic and ultramafic rocks intercepted by drill cores. This model showed an accuracy of between 86% and 94%, and an f1_score of 96%. Random forest classification is a widely adopted Machine Learning approach to construct predictive models across various research domains. However, as models become more complex, their interpretation can be considerably difficult. To interpret the model results, we use both global and local perspectives, incorporating the SHAP (SHapley Additive exPlanations) method. The SHAP technique allows us to analyze individual samples using force plots, and provides a measure of the importance of each geochemical input attribute in the model output. As a result of analyzing the contribution of each input feature to the model, the three variables with the highest contributions were identified in the following order: <span><math><mrow><msub><mrow><mi>Al</mi></mrow><mrow><mn>2</mn></mrow></msub><msub><mrow><mi>O</mi></mrow><mrow><mn>3</mn></mrow></msub></mrow></math></span>, <span><math><mi>MgO</mi></math></span>, and <span><math><mi>Sr</mi></math></span>.</p></div>","PeriodicalId":33804,"journal":{"name":"Applied Computing and Geosciences","volume":"23 ","pages":"Article 100178"},"PeriodicalIF":2.6000,"publicationDate":"2024-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2590197424000259/pdfft?md5=4c1e0ad425c657a335a51d5db628874f&pid=1-s2.0-S2590197424000259-main.pdf","citationCount":"0","resultStr":"{\"title\":\"Machine Learning model interpretability using SHAP values: Application to Igneous Rock Classification task\",\"authors\":\"Antonella S. Antonini ,&nbsp;Juan Tanzola ,&nbsp;Lucía Asiain ,&nbsp;Gabriela R. Ferracutti ,&nbsp;Silvia M. Castro ,&nbsp;Ernesto A. Bjerg ,&nbsp;María Luján Ganuza\",\"doi\":\"10.1016/j.acags.2024.100178\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>El Fierro intrusive body is one of the bodies that compose the La Jovita–Las Aguilas mafic–ultramafic belt, located in the Sierra Grande de San Luis, Argentina. The units of this belt carry a base metal sulfide (BMS) mineralization and platinum group minerals (PGM). The macroscopic description of mafic and ultramafic rocks, as is usually done by the mining exploration companies, leads to an imprecise modal classification of the rocks. In this study, we develop a random forest-based prediction model, which uses geochemical parameters to classify mafic and ultramafic rocks intercepted by drill cores. This model showed an accuracy of between 86% and 94%, and an f1_score of 96%. Random forest classification is a widely adopted Machine Learning approach to construct predictive models across various research domains. However, as models become more complex, their interpretation can be considerably difficult. To interpret the model results, we use both global and local perspectives, incorporating the SHAP (SHapley Additive exPlanations) method. The SHAP technique allows us to analyze individual samples using force plots, and provides a measure of the importance of each geochemical input attribute in the model output. As a result of analyzing the contribution of each input feature to the model, the three variables with the highest contributions were identified in the following order: <span><math><mrow><msub><mrow><mi>Al</mi></mrow><mrow><mn>2</mn></mrow></msub><msub><mrow><mi>O</mi></mrow><mrow><mn>3</mn></mrow></msub></mrow></math></span>, <span><math><mi>MgO</mi></math></span>, and <span><math><mi>Sr</mi></math></span>.</p></div>\",\"PeriodicalId\":33804,\"journal\":{\"name\":\"Applied Computing and Geosciences\",\"volume\":\"23 \",\"pages\":\"Article 100178\"},\"PeriodicalIF\":2.6000,\"publicationDate\":\"2024-07-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S2590197424000259/pdfft?md5=4c1e0ad425c657a335a51d5db628874f&pid=1-s2.0-S2590197424000259-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Applied Computing and Geosciences\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2590197424000259\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Computing and Geosciences","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2590197424000259","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0

摘要

El Fierro 侵入体是构成 La Jovita-Las Aguilas 黑云母-超黑云母岩带的岩体之一,位于阿根廷的 Sierra Grande de San Luis。该岩带的岩体含有贱金属硫化物(BMS)矿化物和铂族矿物(PGM)。矿业勘探公司通常对黑云母岩和超黑云母岩进行宏观描述,导致岩石的模式分类不精确。在这项研究中,我们开发了一种基于随机森林的预测模型,利用地球化学参数对钻探岩心截获的岩浆岩和超基性岩进行分类。该模型的准确率在 86% 到 94% 之间,f1_score 为 96%。随机森林分类法是一种广泛采用的机器学习方法,用于构建各种研究领域的预测模型。然而,随着模型变得越来越复杂,对模型的解释也变得相当困难。为了解释模型结果,我们结合 SHAP(SHapley Additive exPlanations)方法,使用了全局和局部视角。通过 SHAP 技术,我们可以使用力图分析单个样本,并对模型输出中每个地球化学输入属性的重要性进行衡量。通过分析每个输入特征对模型的贡献,确定了贡献最大的三个变量,其顺序如下:Al2O3、MgO 和 Sr。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Machine Learning model interpretability using SHAP values: Application to Igneous Rock Classification task

El Fierro intrusive body is one of the bodies that compose the La Jovita–Las Aguilas mafic–ultramafic belt, located in the Sierra Grande de San Luis, Argentina. The units of this belt carry a base metal sulfide (BMS) mineralization and platinum group minerals (PGM). The macroscopic description of mafic and ultramafic rocks, as is usually done by the mining exploration companies, leads to an imprecise modal classification of the rocks. In this study, we develop a random forest-based prediction model, which uses geochemical parameters to classify mafic and ultramafic rocks intercepted by drill cores. This model showed an accuracy of between 86% and 94%, and an f1_score of 96%. Random forest classification is a widely adopted Machine Learning approach to construct predictive models across various research domains. However, as models become more complex, their interpretation can be considerably difficult. To interpret the model results, we use both global and local perspectives, incorporating the SHAP (SHapley Additive exPlanations) method. The SHAP technique allows us to analyze individual samples using force plots, and provides a measure of the importance of each geochemical input attribute in the model output. As a result of analyzing the contribution of each input feature to the model, the three variables with the highest contributions were identified in the following order: Al2O3, MgO, and Sr.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Applied Computing and Geosciences
Applied Computing and Geosciences Computer Science-General Computer Science
CiteScore
5.50
自引率
0.00%
发文量
23
审稿时长
5 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信