利用机器学习集成稀疏和连续数据集进行岩心矿物学解释

Q2 Earth and Planetary Sciences
Leading Edge Pub Date : 2023-06-01 DOI:10.1190/tle42060421.1
M. Nawal, B. Shekar, P. Jaiswal
{"title":"利用机器学习集成稀疏和连续数据集进行岩心矿物学解释","authors":"M. Nawal, B. Shekar, P. Jaiswal","doi":"10.1190/tle42060421.1","DOIUrl":null,"url":null,"abstract":"In earth science, integrating noninvasive continuous data streams with discrete invasive measurements remains an open challenge. We address such a problem — that of predicting whole-core mineralogy using discrete measurements with the help of machine learning. Our targets are sparsely sampled mineralogy from X-ray diffraction, and features are continually sampled elemental oxides from X-ray fluorescence. Both data sets are acquired on a core cut from a Mississippian-age mixed siliciclastic-carbonate formation in the U.S. midcontinent. The novelty lies in predicting multiple classes of output targets from input features in a small multidimensional data setting. Our workflow has three salient aspects. First, it shows how single-output models are more effective in relating selective target-feature subsets than using a multi-output model for simultaneously relating the entire target-feature set. Specifically, we adopt a competitive ensemble strategy comprising three classes of regression algorithms — elastic net (linear regression), XGBoost (tree-based), and feedforward neural networks (nonlinear regression). Second, it shows that feature selection and engineering, when done using statistical relationships within the data set and domain knowledge, can significantly improve target predictability. Third, it incorporates k-fold cross-validation and grid-search-based parameter tuning to predict targets within 4%–6% accuracy using 40% training data. Results open doors to generating a wealth of information in energy, environmental, and climate sciences where remotely sensed data are inexpensive and abundant but physical sampling may be limited due to analytic, logistic, or economic issues.","PeriodicalId":35661,"journal":{"name":"Leading Edge","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Integration of sparse and continuous data sets using machine learning for core mineralogy interpretation\",\"authors\":\"M. Nawal, B. Shekar, P. Jaiswal\",\"doi\":\"10.1190/tle42060421.1\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In earth science, integrating noninvasive continuous data streams with discrete invasive measurements remains an open challenge. We address such a problem — that of predicting whole-core mineralogy using discrete measurements with the help of machine learning. Our targets are sparsely sampled mineralogy from X-ray diffraction, and features are continually sampled elemental oxides from X-ray fluorescence. Both data sets are acquired on a core cut from a Mississippian-age mixed siliciclastic-carbonate formation in the U.S. midcontinent. The novelty lies in predicting multiple classes of output targets from input features in a small multidimensional data setting. Our workflow has three salient aspects. First, it shows how single-output models are more effective in relating selective target-feature subsets than using a multi-output model for simultaneously relating the entire target-feature set. Specifically, we adopt a competitive ensemble strategy comprising three classes of regression algorithms — elastic net (linear regression), XGBoost (tree-based), and feedforward neural networks (nonlinear regression). Second, it shows that feature selection and engineering, when done using statistical relationships within the data set and domain knowledge, can significantly improve target predictability. Third, it incorporates k-fold cross-validation and grid-search-based parameter tuning to predict targets within 4%–6% accuracy using 40% training data. Results open doors to generating a wealth of information in energy, environmental, and climate sciences where remotely sensed data are inexpensive and abundant but physical sampling may be limited due to analytic, logistic, or economic issues.\",\"PeriodicalId\":35661,\"journal\":{\"name\":\"Leading Edge\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Leading Edge\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1190/tle42060421.1\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"Earth and Planetary Sciences\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Leading Edge","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1190/tle42060421.1","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Earth and Planetary Sciences","Score":null,"Total":0}
引用次数: 0

摘要

在地球科学中,将非侵入性连续数据流与离散侵入性测量相结合仍然是一个开放的挑战。我们解决了这样一个问题-在机器学习的帮助下使用离散测量来预测整个岩心矿物学。我们的目标是来自x射线衍射的稀疏采样矿物学,特征是来自x射线荧光的连续采样元素氧化物。这两组数据都是在美国中部一个密西西比时代的混合硅-塑料-碳酸盐地层的岩心上获得的。新颖之处在于从一个小的多维数据集的输入特征预测多个类别的输出目标。我们的工作流有三个突出的方面。首先,它展示了单输出模型如何在关联选择性目标特征子集方面比使用多输出模型同时关联整个目标特征集更有效。具体来说,我们采用了一种竞争性集成策略,包括三类回归算法——弹性网络(线性回归)、XGBoost(基于树的)和前馈神经网络(非线性回归)。其次,它表明,当使用数据集和领域知识中的统计关系进行特征选择和工程时,可以显着提高目标的可预测性。第三,结合k-fold交叉验证和基于网格搜索的参数调优,使用40%的训练数据预测目标,准确率在4%-6%之间。研究结果为能源、环境和气候科学领域产生丰富的信息打开了大门,在这些领域,遥感数据既便宜又丰富,但由于分析、物流或经济问题,物理采样可能受到限制。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Integration of sparse and continuous data sets using machine learning for core mineralogy interpretation
In earth science, integrating noninvasive continuous data streams with discrete invasive measurements remains an open challenge. We address such a problem — that of predicting whole-core mineralogy using discrete measurements with the help of machine learning. Our targets are sparsely sampled mineralogy from X-ray diffraction, and features are continually sampled elemental oxides from X-ray fluorescence. Both data sets are acquired on a core cut from a Mississippian-age mixed siliciclastic-carbonate formation in the U.S. midcontinent. The novelty lies in predicting multiple classes of output targets from input features in a small multidimensional data setting. Our workflow has three salient aspects. First, it shows how single-output models are more effective in relating selective target-feature subsets than using a multi-output model for simultaneously relating the entire target-feature set. Specifically, we adopt a competitive ensemble strategy comprising three classes of regression algorithms — elastic net (linear regression), XGBoost (tree-based), and feedforward neural networks (nonlinear regression). Second, it shows that feature selection and engineering, when done using statistical relationships within the data set and domain knowledge, can significantly improve target predictability. Third, it incorporates k-fold cross-validation and grid-search-based parameter tuning to predict targets within 4%–6% accuracy using 40% training data. Results open doors to generating a wealth of information in energy, environmental, and climate sciences where remotely sensed data are inexpensive and abundant but physical sampling may be limited due to analytic, logistic, or economic issues.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Leading Edge
Leading Edge Earth and Planetary Sciences-Geology
CiteScore
3.10
自引率
0.00%
发文量
180
期刊介绍: THE LEADING EDGE complements GEOPHYSICS, SEG"s peer-reviewed publication long unrivalled as the world"s most respected vehicle for dissemination of developments in exploration and development geophysics. TLE is a gateway publication, introducing new geophysical theory, instrumentation, and established practices to scientists in a wide range of geoscience disciplines. Most material is presented in a semitechnical manner that minimizes mathematical theory and emphasizes practical applications. TLE also serves as SEG"s publication venue for official society business.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信