Shuguang Zhou , Zhizhong Cheng , Jinlin Wang , Nuo Li , Guo Jiang
{"title":"Uncover implicit associations among geochemical elements using machine learning","authors":"Shuguang Zhou , Zhizhong Cheng , Jinlin Wang , Nuo Li , Guo Jiang","doi":"10.1016/j.oregeorev.2025.106506","DOIUrl":null,"url":null,"abstract":"<div><div>The production of geochemical data serves diverse purposes, and a variety of analytical methods are utilized for analyzing geochemical element content. However, due to limitations in project funds, censored or missing values are common in geochemical data. This scarcity of data becomes more pronounced when dealing with large datasets. Regrettably, numerous data analysis techniques are unable to process datasets containing missing values, which presents a significant hurdle for researchers who depend on geochemical data. To address this issue, here we employed a random forest model to simulate the geochemical elements of rocks and stream sediments. By comparing and analyzing the effects of model parameters and feature variable selection on the simulation results of major and trace elements, the study found that with appropriate model parameters and variable selection, the simulation results for many elements are reliable, and the generalization performance of the random forest model is satisfactory. This research sheds light on the inherent correlations among various elements in nature, offers solutions to the challenges posed by missing values in geochemical data, and provides valuable technical support for disciplines such as geology, environmental science and soil science.</div></div>","PeriodicalId":19644,"journal":{"name":"Ore Geology Reviews","volume":"179 ","pages":"Article 106506"},"PeriodicalIF":3.2000,"publicationDate":"2025-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Ore Geology Reviews","FirstCategoryId":"89","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0169136825000666","RegionNum":2,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GEOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
The production of geochemical data serves diverse purposes, and a variety of analytical methods are utilized for analyzing geochemical element content. However, due to limitations in project funds, censored or missing values are common in geochemical data. This scarcity of data becomes more pronounced when dealing with large datasets. Regrettably, numerous data analysis techniques are unable to process datasets containing missing values, which presents a significant hurdle for researchers who depend on geochemical data. To address this issue, here we employed a random forest model to simulate the geochemical elements of rocks and stream sediments. By comparing and analyzing the effects of model parameters and feature variable selection on the simulation results of major and trace elements, the study found that with appropriate model parameters and variable selection, the simulation results for many elements are reliable, and the generalization performance of the random forest model is satisfactory. This research sheds light on the inherent correlations among various elements in nature, offers solutions to the challenges posed by missing values in geochemical data, and provides valuable technical support for disciplines such as geology, environmental science and soil science.
期刊介绍:
Ore Geology Reviews aims to familiarize all earth scientists with recent advances in a number of interconnected disciplines related to the study of, and search for, ore deposits. The reviews range from brief to longer contributions, but the journal preferentially publishes manuscripts that fill the niche between the commonly shorter journal articles and the comprehensive book coverages, and thus has a special appeal to many authors and readers.