{"title":"Integration of sparse and continuous data sets using machine learning for core mineralogy interpretation","authors":"M. Nawal, B. Shekar, P. Jaiswal","doi":"10.1190/tle42060421.1","DOIUrl":null,"url":null,"abstract":"In earth science, integrating noninvasive continuous data streams with discrete invasive measurements remains an open challenge. We address such a problem — that of predicting whole-core mineralogy using discrete measurements with the help of machine learning. Our targets are sparsely sampled mineralogy from X-ray diffraction, and features are continually sampled elemental oxides from X-ray fluorescence. Both data sets are acquired on a core cut from a Mississippian-age mixed siliciclastic-carbonate formation in the U.S. midcontinent. The novelty lies in predicting multiple classes of output targets from input features in a small multidimensional data setting. Our workflow has three salient aspects. First, it shows how single-output models are more effective in relating selective target-feature subsets than using a multi-output model for simultaneously relating the entire target-feature set. Specifically, we adopt a competitive ensemble strategy comprising three classes of regression algorithms — elastic net (linear regression), XGBoost (tree-based), and feedforward neural networks (nonlinear regression). Second, it shows that feature selection and engineering, when done using statistical relationships within the data set and domain knowledge, can significantly improve target predictability. Third, it incorporates k-fold cross-validation and grid-search-based parameter tuning to predict targets within 4%–6% accuracy using 40% training data. Results open doors to generating a wealth of information in energy, environmental, and climate sciences where remotely sensed data are inexpensive and abundant but physical sampling may be limited due to analytic, logistic, or economic issues.","PeriodicalId":35661,"journal":{"name":"Leading Edge","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Leading Edge","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1190/tle42060421.1","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Earth and Planetary Sciences","Score":null,"Total":0}
引用次数: 0
Abstract
In earth science, integrating noninvasive continuous data streams with discrete invasive measurements remains an open challenge. We address such a problem — that of predicting whole-core mineralogy using discrete measurements with the help of machine learning. Our targets are sparsely sampled mineralogy from X-ray diffraction, and features are continually sampled elemental oxides from X-ray fluorescence. Both data sets are acquired on a core cut from a Mississippian-age mixed siliciclastic-carbonate formation in the U.S. midcontinent. The novelty lies in predicting multiple classes of output targets from input features in a small multidimensional data setting. Our workflow has three salient aspects. First, it shows how single-output models are more effective in relating selective target-feature subsets than using a multi-output model for simultaneously relating the entire target-feature set. Specifically, we adopt a competitive ensemble strategy comprising three classes of regression algorithms — elastic net (linear regression), XGBoost (tree-based), and feedforward neural networks (nonlinear regression). Second, it shows that feature selection and engineering, when done using statistical relationships within the data set and domain knowledge, can significantly improve target predictability. Third, it incorporates k-fold cross-validation and grid-search-based parameter tuning to predict targets within 4%–6% accuracy using 40% training data. Results open doors to generating a wealth of information in energy, environmental, and climate sciences where remotely sensed data are inexpensive and abundant but physical sampling may be limited due to analytic, logistic, or economic issues.
期刊介绍:
THE LEADING EDGE complements GEOPHYSICS, SEG"s peer-reviewed publication long unrivalled as the world"s most respected vehicle for dissemination of developments in exploration and development geophysics. TLE is a gateway publication, introducing new geophysical theory, instrumentation, and established practices to scientists in a wide range of geoscience disciplines. Most material is presented in a semitechnical manner that minimizes mathematical theory and emphasizes practical applications. TLE also serves as SEG"s publication venue for official society business.