Hossein Ghorbanfekr, Pieter Jan Kerstens, Katrijn Dirix
{"title":"Classification of geological borehole descriptions using a domain adapted large language model","authors":"Hossein Ghorbanfekr, Pieter Jan Kerstens, Katrijn Dirix","doi":"10.1016/j.acags.2025.100229","DOIUrl":"10.1016/j.acags.2025.100229","url":null,"abstract":"<div><div>Geological borehole descriptions contain detailed textual information about the composition of the subsurface. However, their unstructured format presents significant challenges for extracting relevant features into a structured format. This paper introduces GEOBERTje: a domain adapted large language model trained on geological borehole descriptions from Flanders (Belgium) in the Dutch language. This model effectively extracts relevant information from the borehole descriptions and represents it into a numeric vector space. Showcasing just one potential application of GEOBERTje, we finetune a classifier model on a limited number of manually labeled observations. This classifier categorizes borehole descriptions into a main, second and third lithology class. We show that our classifier outperforms a rule-based approach (by 30% on average), non-contextual Word2Vec embeddings combined with a random forest classifier (by 38% on average), and a prompt engineering method with large language models (i.e., GPT-4 (by 11% on average) and Gemma 2 (by 28% on average)). This study exemplifies how domain adapted large language models enhance the efficiency and accuracy of extracting information from complex, unstructured geological descriptions. This offers new opportunities for geological analysis and modeling using vast amounts of data.</div></div>","PeriodicalId":33804,"journal":{"name":"Applied Computing and Geosciences","volume":"25 ","pages":"Article 100229"},"PeriodicalIF":2.6,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143520867","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nikhil Prakash , Andrea Manconi , Alessandro Cesare Mondini
{"title":"Rapid mapping of landslides using satellite SAR imagery: A progressive learning approach","authors":"Nikhil Prakash , Andrea Manconi , Alessandro Cesare Mondini","doi":"10.1016/j.acags.2025.100224","DOIUrl":"10.1016/j.acags.2025.100224","url":null,"abstract":"<div><div>Rapid detection of landslides after an exceptional event is critical for planning effective disaster management. Previous works have typically used machine learning-based methods, including the recently popular deep-learning approaches, to identify characteristics surface features from satellite remote sensing data, especially from optical images. However, data acquisition from optical images is not possible in cloudy conditions, leading to unpredictable delays in any mapping task from future events. These methods also rely on large manually labelled inventories for training, which is often not available before the event. In this work, we propose an active training strategy to generate a landslide map after an event using the first available synthetic-aperture radar (SAR) image and improve it once subsequent cloud-free optical images are acquired. The proposed active learning workflow can start with a small (<span><math><mrow><mo>∼</mo><mn>100</mn><msup><mrow><mi>m</mi></mrow><mrow><mn>2</mn></mrow></msup></mrow></math></span>) and incomplete inventory,- and can grow the extent and completeness in iterative steps with manual updates after each step. This significantly reduces the slow manual mapping typically required for generating a large training inventory. We designed our experiments to map the landslides triggered by the <span><math><msub><mrow><mi>M</mi></mrow><mrow><mi>w</mi></mrow></msub></math></span> 6.6 Hokkaido Eastern Iburi earthquake of 2018 in Japan using sequentially ALOS-2 (SAR) and PlanetScope (Optical) scenes in the order they are acquired. The choice of active learning prioritizes speed over accuracy. However, we note only a modest reduction in performance (<span><math><mrow><mo>∼</mo><mn>10</mn><mtext>%</mtext></mrow></math></span> drop in F1 and MCC scores), with our method allowing a preliminary landslide inventory to be completed within a single day. This is of major importance in disaster response, improving performance and reducing the potential subjectivity associated with manual mapping.</div></div>","PeriodicalId":33804,"journal":{"name":"Applied Computing and Geosciences","volume":"25 ","pages":"Article 100224"},"PeriodicalIF":2.6,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143395700","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Chemical map classification in XMapTools","authors":"Pierre Lanari , Mahyra Tedeschi","doi":"10.1016/j.acags.2025.100230","DOIUrl":"10.1016/j.acags.2025.100230","url":null,"abstract":"<div><div>Chemical mapping using electron beam or laser instruments is an important analytical technique that allows the study of the compositional variability of materials in two dimensions. While quantitative compositional mapping of minerals has received considerable attention over the last two decades, pixel misclassification in commonly used software solutions remains a fundamental limitation affecting several applications. Calibration of intensity maps to fully quantitative compositional maps requires accurate classification, for example when a calibration curve is applied to a group of pixels that are assumed to have the same matrix behavior under the electron beam or the laser. This paper compares seven automated supervised machine learning classification algorithms implemented in the open source XMapTools software along with various tools for manual classification, for selecting training data and assessing the quality of a classification result. This new implementation aims to provide the research and industry communities with a free software tool for fast and robust classification of chemical maps. A standardized color scheme with reference colors for minerals and mineral groups is proposed to improve the readability of the classified maps in petrological studies. The performance of each algorithm varies depending on the data set, especially when minerals exhibit strong compositional zoning or when different minerals have similar compositions for a given element. The random forest algorithm based on bootstrap aggregation provides satisfactory results in most situations and is recommended for general use in XMapTools.</div></div>","PeriodicalId":33804,"journal":{"name":"Applied Computing and Geosciences","volume":"25 ","pages":"Article 100230"},"PeriodicalIF":2.6,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143509291","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"lasertram: A Python library for time resolved analysis of laser ablation inductively coupled plasma mass spectrometry data","authors":"Jordan Lubbers , Adam J.R. Kent , Chris Russo","doi":"10.1016/j.acags.2025.100225","DOIUrl":"10.1016/j.acags.2025.100225","url":null,"abstract":"<div><div>Laser ablation inductively coupled plasma mass spectrometry (LA-ICP-MS) data has a wide variety of uses in the geosciences for in-situ chemical analysis of complex natural materials. Improvements to instrument capabilities and operating software have drastically reduced the time required to generate large volumes of data relative to previous methodologies. Raw data from LA-ICP-MS, however, is in counts per unit time (typically counts per second), not elemental concentrations and converting these count ratesto concentrations requires additional processing. For complex materials where the ablated volume may contain a range of material compositions, a moderate amount of user input is also required if appropriate concentrations are to be accurately calculated. In geologic materials such as glasses and minerals that potentially have numerous heterogeneities (e.g., microlites or other inclusions) within them, this is typically determiningwhether the total ablation signal should be filtered to remove these heterogeneities. This necessitates that the LA-ICP-MS data processing pipeline is one that is not automated, but is also designed to enable rapid and efficient processing of large volumes of data.</div><div>Here we introduce <figure><img></figure> , a Python library for the time resolved analysis of LA-ICP-MS data. We outline its mathematical theory, code structure, and provide an example of how it can be used to provide the time resolved analysis necessitated by LA-ICP-MS data of complex geologic materials. Throughout the <figure><img></figure> pipeline we show how metadata and data are incrementally added to the objects created such that virtually any aspect of an experiment may be interrogated and its quality assessed. We also show, that when combined with other Python libraries for building graphical user interfaces, it can be utilized outside of a pure scripting environment. <figure><img></figure> can be found at <span><span>https://doi.org/10.5066/P1DZUR3Z</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":33804,"journal":{"name":"Applied Computing and Geosciences","volume":"25 ","pages":"Article 100225"},"PeriodicalIF":2.6,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143549376","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Do more with less: Exploring semi-supervised learning for geological image classification","authors":"Hisham I. Mamode, Gary J. Hampson, Cédric M. John","doi":"10.1016/j.acags.2024.100216","DOIUrl":"10.1016/j.acags.2024.100216","url":null,"abstract":"<div><div>Labelled datasets within geoscience can often be small, with data acquisition both costly and challenging, and their interpretation and downstream use in machine learning difficult due to data scarcity. Deep learning algorithms require large datasets to learn a robust relationship between the data and its label and avoid overfitting. To overcome the paucity of data, transfer learning has been employed in classification tasks. But an alternative exists: there often is a large corpus of unlabeled data which may enhance the learning process. To evaluate this potential for subsurface data, we compare a high-performance semi-supervised learning (SSL) algorithm (SimCLRv2) with supervised transfer learning on a Convolutional Neural Network (CNN) in geological image classification.</div><div>We tested the two approaches on a classification task of sediment disturbance from cores of International Ocean Drilling Program (IODP) Expeditions 383 and 385. Our results show that semi-supervised transfer learning can be an effective strategy to adopt, with SimCLRv2 capable of producing representations comparable to those of supervised transfer learning. However attempts to enhance the performance of semi-supervised transfer learning with task-specific unlabeled images during self-supervision degraded representations. Significantly, we demonstrate that SimCLRv2 trained on a dataset of core disturbance images can out-perform supervised transfer learning of a CNN once a critical number of task-specific unlabeled images are available for self-supervision. The gain in performance compared to supervised transfer learning is 1% and 3% for binary and multi-class classification, respectively.</div><div>Supervised transfer learning can be deployed with comparative ease, whereas the current SSL algorithms such as SimCLRv2 require more effort. We recommend that SSL be explored in cases when large amounts of unlabeled task-specific images exist and improvement of a few percent in metrics matter. When examining small, highly specialized datasets, without large amounts of unlabeled images, supervised transfer learning might be the best strategy to adopt. Overall, SSL is a promising approach and future work should explore this approach utilizing different dataset types, quantity, and quality.</div></div>","PeriodicalId":33804,"journal":{"name":"Applied Computing and Geosciences","volume":"25 ","pages":"Article 100216"},"PeriodicalIF":2.6,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143166137","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Maureen Llinares, Ghislain Gassier, Sophie Viseur, Lucilla Benedetti
{"title":"A new inversion algorithm (PyMDS) based on the Pyro library to use chlorine 36 data as a paleoseismological tool on normal fault scarps","authors":"Maureen Llinares, Ghislain Gassier, Sophie Viseur, Lucilla Benedetti","doi":"10.1016/j.acags.2025.100234","DOIUrl":"10.1016/j.acags.2025.100234","url":null,"abstract":"<div><div>Paleoseismology (study of earthquakes that occurred before records were kept and before instruments can record them) provides useful information such as recurrence periods and slip rate to assess seismic hazard and better understand fault mechanisms. Chlorine 36 is one of the paleoseismological tools that can be used to date scarp exhumation associated with earthquakes events.</div><div>We propose an algorithm, PyMDS, that uses chlorine 36 data sampled on a fault scarp to retrieve seismic sequences (age and slip associated to each earthquake) and long term slip rate on a normal fault.</div><div>We show that the algorithm, based on Hamiltonian kernels, can successfully retrieve earthquakes and long term slip rate on a synthetic dataset. The precision on the ages can vary between few thousand years for old earthquakes (>5000 yr BP) and down to few hundreds of years for the most recent ones (<2000 yr BP). The resolution on the slip is ∼30–50 cm and on the slip rate is ∼ 1 mm/yr. Diagnostic tools (R<sub>hat</sub> and divergences on chains) are used to check the convergence of the results.</div><div>Our new code is applied to a site in Central Italy, the results yielded are in agreement with the ones obtained previously with another inversion procedure. We found 4 events 7800±400 yr, 4700±400 yr, 3000±200 and 400 ±20 yr BP on the MA3 site. The associated slips were of 130±10 cm, 140±20 cm, 580 ± 20 cm and 205±20 cm. The results are comparable with a previous study made by (Schlagenhauf et al., 2010). The yielded slip rate of 2.7 mm/yr ± 0.4 mm/yr is also coherent with the one determined by Tesson et al. (2020).</div></div>","PeriodicalId":33804,"journal":{"name":"Applied Computing and Geosciences","volume":"25 ","pages":"Article 100234"},"PeriodicalIF":2.6,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143631906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
D. Pragnath , G. Srijayanthi , Santosh Kumar , Sumer Chopra
{"title":"SeisAug: A data augmentation python toolkit","authors":"D. Pragnath , G. Srijayanthi , Santosh Kumar , Sumer Chopra","doi":"10.1016/j.acags.2025.100232","DOIUrl":"10.1016/j.acags.2025.100232","url":null,"abstract":"<div><div>A common limitation in applying any deep learning and machine learning techniques is the limited labelled dataset which can be addressed through Data augmentation (DA). SeisAug is a DA python toolkit to address this challenge in seismological studies. DA. DA helps to balance the imbalanced classes of a dataset by creating more examples of under-represented classes. It significantly mitigates overfitting by increasing the volume of training data and introducing variability, thereby improving the model's performance on unseen data. Given the rapid advancements in deep learning for seismology, ‘SeisAug’ assists in extensibility by generating a substantial amount of data (2–6 times more data) which can aid in developing an indigenous robust model. Further, this study demonstrates the role of DA in developing a robust model. For this we utilized a basic two class identification models between earthquake/signal and noise/(non-earthquake). The model is trained with original, 1 and 5 times augmented datasets and their performance metrics are evaluated. The model trained with 5X times augmented dataset significantly outperforms with accuracy of 0.991, AUC 0.999 and AUC-PR 0.999 compared to the model trained with original dataset with accuracy of 0.50, AUC 0.75 and AUC-PR 0.80. Furthermore, by making all codes available on GitHub, the toolkit facilitates the easy application of DA techniques, empowering end-users to enhance their seismological waveform datasets effectively and overcome the initial drawbacks posed by the scarcity of labelled data.</div></div>","PeriodicalId":33804,"journal":{"name":"Applied Computing and Geosciences","volume":"25 ","pages":"Article 100232"},"PeriodicalIF":2.6,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143591747","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Semantic segmentation framework for atoll satellite imagery: An in-depth exploration using UNet variants and Segmentation Gym","authors":"Ray Wang , Tahiya Chowdhury , Alejandra C. Ortiz","doi":"10.1016/j.acags.2024.100217","DOIUrl":"10.1016/j.acags.2024.100217","url":null,"abstract":"<div><div>This paper presents a framework for semantic segmentation of satellite imagery aimed at studying atoll morphometrics. Recent advances in deep neural networks for automated segmentation have been valuable across a variety of satellite and aerial imagery applications, such as land cover classification, mineral characterization, and disaster impact assessment. However, identifying an appropriate segmentation approach for geoscience research remains challenging, often relying on trial-and-error experimentation for data preparation, model selection, and validation. Building on prior efforts to create reproducible research pipelines for aerial image segmentation, we propose a systematic framework for custom segmentation model development using Segmentation Gym, a software tool designed for efficient model experimentation. Additionally, we evaluate state-of-the-art U-Net model variants to identify the most accurate and precise model for specific segmentation tasks. Using a dataset of 288 Landsat images of atolls as a case study, we conduct a detailed analysis of various annotation techniques, image types, and training methods, offering a structured framework for practitioners to design and explore segmentation models. Furthermore, we address dataset imbalance, a common challenge in geographical data, and discuss strategies to mitigate its impact on segmentation outcomes. Based on our findings, we provide recommendations for applying this framework to other geoscience research areas to address similar challenges.</div></div>","PeriodicalId":33804,"journal":{"name":"Applied Computing and Geosciences","volume":"25 ","pages":"Article 100217"},"PeriodicalIF":2.6,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143166135","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Christopher G. Smith , Julie Bernier , Alisha M. Ellis , Kathryn E.L. Smith
{"title":"Predictive regressive models of recent marsh sediment thickness improve the quantification of coastal marsh sediment budgets","authors":"Christopher G. Smith , Julie Bernier , Alisha M. Ellis , Kathryn E.L. Smith","doi":"10.1016/j.acags.2024.100215","DOIUrl":"10.1016/j.acags.2024.100215","url":null,"abstract":"<div><div>Coastal marsh wetlands experience variations in vertical gains and losses through time, which have allowed them to infill relict topography and record variations in drivers. The stratigraphic unit associated with the development of the marsh also reflects the long-term importance of key ecosystem services supplied by the marsh environment, including carbon storage and storm mitigation. Mapping these coastal wetland sediments and the marsh unit thickness is challenging as traditional coastal geophysical tools are not easily deployable (acoustic methods) or are unreliable in saline-soil environments (e.g., ground-penetrating radar), leaving core-based methods the most viable mapping method. In the present study, we utilized prior information on the geologic architecture of the region to select spatial and physical metrics that likely persisted throughout evolution of the marsh during the late Holocene. We then assessed the individual and collective power of these metrics to predict marsh thickness observed from cores. Employing regressive predictive models powered by these data, we improve the quantification of marsh thickness for a coastal fringing marsh within the Grand Bay estuary in Mississippi and Alabama (USA). The information gained from this approach yields improved estimates of the carbon stocks in this environment. Additionally, the stored sediment masses reflect the past, and potential future, persistence of the Grand Bay marsh under historical and present marsh-estuarine sediment exchange fluxes. Such improvements to both the sediment budget of recent marsh stratigraphic units and the spatial extent provide new resources for comparison with large-scale landscape models, the latter of which may be used, when validated, to predict future change and ecosystem transformations.</div></div>","PeriodicalId":33804,"journal":{"name":"Applied Computing and Geosciences","volume":"25 ","pages":"Article 100215"},"PeriodicalIF":2.6,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143166136","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Relationships between fault friction, slip time, and physical parameters explored by experiment-based friction model: A machine learning approach using recurrent neural networks (RNNs)","authors":"Tae-Hoon Uhmb , Yohei Hamada , Takehiro Hirose","doi":"10.1016/j.acags.2025.100231","DOIUrl":"10.1016/j.acags.2025.100231","url":null,"abstract":"<div><div>Understanding the relationship between fault friction and physical parameters is crucial for comprehending earthquake physics. Despite various friction models developed to explain this relationship, representing the relationships in a friction model with greater detail remains a challenge due to intricate correlations, including the nonlinear interplay between physical parameters and friction. Here we develop new models to define the relationship between various physical parameters (slip velocity, axial displacement, temperature, rate of temperature, and rate of axial displacement), friction coefficient, and slip time. The models are established by utilizing Recurrent Neural Networks (RNNs) to analyze continuous data in high-velocity rotary shear experiments (HVR), as reported by previous work. The experiment has been conducted on diorite specimens at a slip velocity (0.004 m/s) in various normal stress (0.3–5.8 MPa). At this conditions, frictional heating occurs inevitably at the sliding surface, reaching temperature up to 68 °C. We first identified the optimal model by assessing its accuracy in relation to the time interval for defining friction. Following this, we explored the relationship between friction and physical parameters with varying slip time and conditions by analyzing the gradient importance of physical parameters within the identified model. Our results demonstrate that the importance of physical parameters continuously shifts over slip time and conditions, and temperature stands out as the most influential parameter affecting fault friction under slip conditions of this study accompanied by frictional heating. Our study demonstrates the potential of deep learning analysis in enhancing our understanding of complex frictional processes, contributing to the development of more refined friction models and improving predictive models for earthquake physics.</div></div>","PeriodicalId":33804,"journal":{"name":"Applied Computing and Geosciences","volume":"25 ","pages":"Article 100231"},"PeriodicalIF":2.6,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143562771","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}