Weilin Chen, Jiyin Zhang, Wenjia Li, Xiang Que, Chenhao Li, Xiaogang Ma
{"title":"Integrating neuro-symbolic AI and knowledge graph for enhanced geochemical prediction in copper deposits","authors":"Weilin Chen, Jiyin Zhang, Wenjia Li, Xiang Que, Chenhao Li, Xiaogang Ma","doi":"10.1016/j.acags.2025.100259","DOIUrl":"10.1016/j.acags.2025.100259","url":null,"abstract":"<div><div>The integration of machine learning (ML) and deep learning (DL) in geoscience has demonstrated great promise for mineral prediction. However, existing approaches are predominantly data-driven and often overlook expert geological knowledge, limiting their interpretability, accuracy, and practical applicability. This study introduces a new method that combines Large Language Models (LLMs), knowledge graphs (KGs), and Neuro-Symbolic AI (NSAI) models to predict mineralization systems in diverse copper deposits, significantly increasing the precision in prediction results. We utilize LLMs to generate KGs from geological literature, extracting symbolic rules that encode domain-specific insights about copper mineralization. These rules, derived dynamically from expert knowledge, are integrated into ML models as guidance during the training and prediction phases. By fusing symbolic reasoning with ML's computational power, our approach overcomes the limitations of black-box models, offering both improved accuracy and transparency in mineral prediction. To validate this method, we apply it to a comprehensive geochemical dataset of global copper deposits. The results show that rule-guided ML models achieve notable performance improvements, outperforming traditional ML methods in accuracy, precision, and robustness. Interpretability is further enhanced by using tools such as SHAP values, which explain the influence of individual geochemical features within the rule-based framework. This combination not only identifies critical geochemical elements like Cu, Fe, and S but also provides coherent, domain-aligned explanations for the predicted mineralization patterns. Our findings demonstrate the transformative potential of combining LLMs, KGs, and ML models for mineral prediction. This hybrid approach enables geoscientists to leverage both computational and expert knowledge, achieving a deeper understanding of mineralization systems.</div></div>","PeriodicalId":33804,"journal":{"name":"Applied Computing and Geosciences","volume":"27 ","pages":"Article 100259"},"PeriodicalIF":2.6,"publicationDate":"2025-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144331437","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mohammad Ashar Hussain , Venkatesh Budamala , Rajarshi Das Bhowmik
{"title":"Application of machine learning-based post-processing to improve crowd-sourced urban rainfall categorizations","authors":"Mohammad Ashar Hussain , Venkatesh Budamala , Rajarshi Das Bhowmik","doi":"10.1016/j.acags.2025.100255","DOIUrl":"10.1016/j.acags.2025.100255","url":null,"abstract":"<div><div>In recent years, citizen science has gained significant attention in the hydrometeorological sciences as an alternative to traditional monitoring systems while also raising awareness of natural processes. Crowd participation in reporting rainfall, known as crowdsourcing rainfall, has the potential to provide insights into the spatio-temporal variability of urban rainfall. However, crowdsourcing often suffers from inaccuracies in rainfall classification due to inadequately trained participants. This study investigates whether machine learning models can reduce misclassification in crowd-sourced rainfall reports under a synthetic framework. A state-of-the-art stochastic rainfall generator is deployed to simulate high-resolution rainfall over Bangalore, India, traditionally monitored by only two rain gauge stations. The study assumes that the 'synthetic' crowd reports qualitative descriptions of two rainfall characteristics—intensity and duration—based on which a categorization of a rainfall event (normal/moderate/severe) is issued. Ten scenarios are introduced to represent varying degrees of misclassification in the crowd reports. Two machine learning models, random forest and logistic regression, are employed to address these misclassifications and improve the resulting rainfall categorization. The findings indicate that while the random forest model outperforms logistic regression, its performance declines as misclassification rates increase. Moreover, the study highlights that increasing the number of participants significantly enhances the post-processing performance, emphasizing the importance of properly training the crowd for accurate reporting.</div></div>","PeriodicalId":33804,"journal":{"name":"Applied Computing and Geosciences","volume":"26 ","pages":"Article 100255"},"PeriodicalIF":2.6,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144243509","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Suchanun Piriyasatit , Ercan Engin Kuruoglu , Mehmet Sinan Ozeren
{"title":"Comparison of ETAS parameter estimates across different time windows within the North and East Anatolian Fault Zones, Turkey","authors":"Suchanun Piriyasatit , Ercan Engin Kuruoglu , Mehmet Sinan Ozeren","doi":"10.1016/j.acags.2025.100253","DOIUrl":"10.1016/j.acags.2025.100253","url":null,"abstract":"<div><div>Located at the intersection of major lithospheric plates, Turkey is characterized by significant seismic activity, particularly along the North Anatolian Fault (NAF) and East Anatolian Fault (EAF). This paper employs the Epidemic-Type Aftershock Sequence (ETAS) model, fitted using the BFGS quasi-Newton method, to study earthquake triggering processes along these faults from 1990 to 2023. Our findings show distinct temporal variations in seismicity parameters along these faults. Along the NAF, the ETAS model highlighted a lower background seismicity rate (<span><math><mi>μ</mi></math></span>) and aftershock productivity (<span><math><msub><mrow><mi>K</mi></mrow><mrow><mn>0</mn></mrow></msub></math></span>) compared to the EAF. In contrast, the EAF exhibits lower magnitude sensitivity (<span><math><mi>α</mi></math></span>), indicating that smaller earthquakes are more likely to trigger aftershocks, due to weaker dependence on mainshock magnitude. The aftershock decay rate (<span><math><mi>p</mi></math></span>) is notably faster in the NAF, suggesting quicker post-event stabilization. Our analysis across different time windows reveals significant non-stationarities in ETAS parameters, indicating that seismic behaviors along these faults do not strictly follow historical patterns. This temporal variability highlights the challenges in short-term seismic forecasting using historical data alone. A detailed comparison of ETAS parameters across time frames showcases the necessity for incorporating dynamic modeling approaches to improve earthquake forecasting in seismically active regions.</div></div>","PeriodicalId":33804,"journal":{"name":"Applied Computing and Geosciences","volume":"26 ","pages":"Article 100253"},"PeriodicalIF":2.6,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144279364","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ali Aouf , Eric Laloy , Bart Rogiers , Christophe De Vleeschouwer
{"title":"3D clay microstructure synthesis using Denoising Diffusion Probabilistic Models","authors":"Ali Aouf , Eric Laloy , Bart Rogiers , Christophe De Vleeschouwer","doi":"10.1016/j.acags.2025.100248","DOIUrl":"10.1016/j.acags.2025.100248","url":null,"abstract":"<div><div>This work is concerned with the challenging task of generating 3D-consistent binary microstructures of heterogeneous clay materials. We leverage denoising diffusion probabilistic models (DDPMs) to do so and show that DDPMs outperform two classical generative adversarial networks (GANs) for a 2D generation task. Next, our experiments demonstrate that our DDPMs can produce high-quality, diverse realizations that well capture the spatial statistics of two distinct clay microstructures. Moreover, we show that DDPMs can be implicitly trained to generate porosity-conditioned samples. To the best of our knowledge, this is the first study that addresses clay microstructure generation with DDPMs.</div></div>","PeriodicalId":33804,"journal":{"name":"Applied Computing and Geosciences","volume":"26 ","pages":"Article 100248"},"PeriodicalIF":2.6,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144189514","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kalpesh R. Patil, Takeshi Doi, J.V. Ratnam, Swadhin K. Behera
{"title":"Enhancing Indian summer monsoon prediction: Deep learning approach for skillful long-lead forecasts of rainfall","authors":"Kalpesh R. Patil, Takeshi Doi, J.V. Ratnam, Swadhin K. Behera","doi":"10.1016/j.acags.2025.100257","DOIUrl":"10.1016/j.acags.2025.100257","url":null,"abstract":"<div><div>The prediction of the Indian summer monsoon rainfall (ISMR) in the June–September (JJAS) season at long-lead times is challenging. The state-of-the-art dynamical models often fail to capture the sign and amplitude of the rainfall anomalies in the extreme rainfall seasons, limiting the overall skill of the models. We attempted to address this issue using a deep learning model based on convolutional neural networks (CNN). An ensemble of JJAS rainfall predictions using the CNN model with a unique custom function showed high skills in predicting ISMR at a long-lead time of 12 months. The predictions had an anomaly correlation coefficient (ACC) exceeding 0.5 at all the lead times from 2 to 17 months. The CNN model predictions could capture the sign and phase of the extreme rainfall events in the study period realistically. Analysis of saliency-based heatmaps indicated the high skill to be due to the model capturing the leading modes of climate variability, such as the Indian Ocean Dipole and El Niño-Southern Oscillation, realistically. The ensemble of CNN ISMR predictions can supplement the predictions of the forecasting centers.</div></div>","PeriodicalId":33804,"journal":{"name":"Applied Computing and Geosciences","volume":"26 ","pages":"Article 100257"},"PeriodicalIF":2.6,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144291535","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Data-driven dynamic friction models based on Recurrent Neural Networks","authors":"Gaëtan Cortes, Joaquin Garcia-Suarez","doi":"10.1016/j.acags.2025.100249","DOIUrl":"10.1016/j.acags.2025.100249","url":null,"abstract":"<div><div>In this concise contribution, it is demonstrated that Recurrent Neural Networks (RNNs) based on Gated Recurrent Unit (GRU) architecture, possess the capability to learn the complex dynamics of rate-and-state friction (RSF) laws from synthetic data. The data employed for training the network is generated through the application of traditional RSF equations coupled with either the aging law or the slip law for state evolution. A novel aspect of this approach is the formulation of a loss function that explicitly accounts for the direct effect by means of automatic differentiation. It is found that the GRU-based RNNs effectively learns to predict changes in the friction coefficient resulting from velocity jumps (with and without noise in the target data), thereby showcasing the potential of machine learning models in capturing and simulating the physics of frictional processes. Current limitations and challenges are discussed.</div></div>","PeriodicalId":33804,"journal":{"name":"Applied Computing and Geosciences","volume":"26 ","pages":"Article 100249"},"PeriodicalIF":2.6,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144243508","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sungil Kim , Tea-Woo Kim , Yongjun Hong , Hoonyoung Jeong
{"title":"Prediction of carbon dioxide phase at bottomhole by adaptive factorization network considering well geometry","authors":"Sungil Kim , Tea-Woo Kim , Yongjun Hong , Hoonyoung Jeong","doi":"10.1016/j.acags.2025.100254","DOIUrl":"10.1016/j.acags.2025.100254","url":null,"abstract":"<div><div>Accurate carbon dioxide (CO<sub>2</sub>) phase prediction at the bottomhole of injection wells is essential for ensuring safe and efficient CO<sub>2</sub> storage and enhanced gas recovery (EGR). Phase misclassification can cause operational inefficiencies, equipment failure, and compromised storage integrity, posing significant risks to CO<sub>2</sub> injection projects. While previous studies have contributed to CO<sub>2</sub> phase prediction, they have overlooked well geometry effects, which can impact reliability in real-world applications. This study addresses these challenges by introducing a deep learning framework based on the adaptive factorization network (AFN), which enhances CO<sub>2</sub> phase prediction accuracy by leveraging feature interactions. The AFN model was trained on ∼43,000 wells across seven major North American shale gas basins, covering a wide range of well geometries and injection conditions. CO<sub>2</sub> phases were classified into supercritical and dense categories, reflecting prevailing flow conditions. To enhance practical applicability, we incorporated real-field wellbore data, ensuring alignment with actual injection environments. The standard AFN model achieved an F1-score of 0.94, with data augmentation further improving performance by reducing false predictions by 50 % and increasing the F1-score to 0.97. Rigorous validation demonstrated the model's robustness for optimizing wellhead temperature to achieve the desired CO<sub>2</sub> phase transition. By explicitly considering well geometry effects and real-field conditions, this study advances data-driven CO<sub>2</sub> injection modeling, providing a scalable, high-accuracy framework for evaluating CO<sub>2</sub> storage and EGR feasibility.</div></div>","PeriodicalId":33804,"journal":{"name":"Applied Computing and Geosciences","volume":"26 ","pages":"Article 100254"},"PeriodicalIF":2.6,"publicationDate":"2025-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144123572","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Soil organic carbon retrieval using a machine learning approach from satellite and environmental covariates in the Lower Brazos River Watershed, Texas, USA","authors":"Birhan Getachew Tikuye, Ram Lakhan Ray","doi":"10.1016/j.acags.2025.100252","DOIUrl":"10.1016/j.acags.2025.100252","url":null,"abstract":"<div><div>Soil is critical in global carbon storage, holding more carbon than terrestrial vegetation and the atmosphere combined. Accurate soil organic carbon (SOC) estimation is essential for improving agricultural productivity and mitigating climate change. This study aims to explore the retrieval of SOC using a machine learning (ML) approach, leveraging remote sensing data and environmental covariates, focusing on the Lower Brazos River Watershed, southern Texas, USA. The study used Sentinel 2A satellite data-derived indices such as vegetation and water indices, topographic features, soil properties, and climatic factors. Three ML models, namely Gradient Boosting (GB), Random Forest (RF), and eXtreme Gradient Boosting (XGBoost), were deployed, with performance assessed using the R<sup>2</sup>, RMSE, and MAE. All explanatory variables are geospatial gridded datasets, except for the point-based measurement of SOC on the Prairie View A&M University (PVAMU) research farm plot used to train the model. The RF model demonstrated the best performance in model testing, with the lowest root mean square error (RMSE = 4.17) and mean absolute error (MAE = 3), as well as the highest coefficient of determination (R<sup>2</sup> = 0.78). GB was the second-best performing model, achieving an RMSE of 4.23 and an MAE of 3.12, with similar R<sup>2</sup> values to the RF model. The average SOC throughout the watershed is 45.5 tons/ha, while the total amount of SOC in the watershed is around 4,278,263 tons. These results suggest that integrating satellite data with environmental covariates and machine learning models holds excellent potential for SOC prediction and supports climate change mitigation efforts by improving carbon stock assessments.</div></div>","PeriodicalId":33804,"journal":{"name":"Applied Computing and Geosciences","volume":"26 ","pages":"Article 100252"},"PeriodicalIF":2.6,"publicationDate":"2025-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144117054","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Classifying detrital zircon U-Pb age distributions using automated machine learning","authors":"Jack W. Fekete , Glenn R. Sharman , Xiao Huang","doi":"10.1016/j.acags.2025.100251","DOIUrl":"10.1016/j.acags.2025.100251","url":null,"abstract":"<div><div>The prodigious use of detrital zircon U-Pb geochronology for provenance studies in recent decades has led many researchers to amass extensive datasets (>100,000 dates). When displayed as age distributions, individual samples are traditionally compared using visual inspection and statistical methods, which can become time-consuming and challenging when using large datasets. We propose that machine learning (ML) can more efficiently classify a sample by its source using detrital zircon U-Pb age distributions. Specifically, we hypothesize that automated machine learning (AutoML), which optimizes algorithm selection and hyperparameters, will outperform an unoptimized Random Forest (RF) classifier and the cross-correlation coefficient (R<sup>2</sup>), a commonly used metric for comparing age distributions. We test this approach using a well-constrained synthetic dataset and a natural dataset from the Jurassic-Eocene North American Cordillera. In synthetic experiments, AutoML models effectively classify samples by their sources when inter-source similarity across few sources is low to moderate and samples have more than ∼50 analyses. However, the effectiveness of AutoML is highly dependent on sample size and the variability of age modes within the data. Applied to the North American Cordillera dataset, AutoML achieves an ∼0.91 F<sub>1</sub> score when predicting between foreland and forearc basin tectonic settings and an ∼0.71 F<sub>1</sub> score when predicting subbasins within these settings, outperforming both RF and R<sup>2</sup>. Moreover, AutoML identifies discriminating age populations between groups, with the average feature importance of 100 models highlighting the 145–125 Ma age range, corresponding to a magmatic lull of the Cordilleran magmatic arc. These results demonstrate AutoML's potential as a powerful predictive and interpretive tool in detrital zircon studies.</div></div>","PeriodicalId":33804,"journal":{"name":"Applied Computing and Geosciences","volume":"26 ","pages":"Article 100251"},"PeriodicalIF":2.6,"publicationDate":"2025-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144131307","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. Araújo , F. López , S. Johansson , A. Westman , M. Bodin
{"title":"Efficient computation and visualization of ionospheric volumetric images for the enhanced interpretation of Incoherent scatter radar data","authors":"J. Araújo , F. López , S. Johansson , A. Westman , M. Bodin","doi":"10.1016/j.acags.2025.100245","DOIUrl":"10.1016/j.acags.2025.100245","url":null,"abstract":"<div><div>Incoherent scatter radar (ISR) techniques provide reliable measurements for the analysis of ionospheric plasma. Recent developments in ISR technologies allow the generation of high-resolution 3D data. Examples of such technologies employ the so-called phased-array antenna systems like the AMISR systems in North America or the upcoming EISCAT_3D in the Northern Fennoscandia region. EISCAT_3D will be capable of generating the highest resolution ISR datasets that have ever been measured. We present a novel fast computational strategy for the generation of high-resolution and smooth volumetric ionospheric images that represent ISR data. Through real-time processing, our computational framework will enable a fast decision-making during the monitoring process, where the experimental parameters are adapted in real time as the radars monitor specific phenomena. Real-time monitoring would allow the radar beams to be conveniently pointed at regions of interest and would therefore increase the science impact. We describe our strategy, which implements a flexible mesh generator along with an efficient interpolator specialized for ISR technologies. The proposed strategy is generic in the sense that it can be applied to a large variety of data sets and supports interactive visual analysis and exploration of ionospheric data, supplemented by interactive data transformations and filters.</div></div>","PeriodicalId":33804,"journal":{"name":"Applied Computing and Geosciences","volume":"26 ","pages":"Article 100245"},"PeriodicalIF":2.6,"publicationDate":"2025-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144099748","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}