{"title":"Do more with less: Exploring semi-supervised learning for geological image classification","authors":"Hisham I. Mamode, Gary J. Hampson, Cédric M. John","doi":"10.1016/j.acags.2024.100216","DOIUrl":"10.1016/j.acags.2024.100216","url":null,"abstract":"<div><div>Labelled datasets within geoscience can often be small, with data acquisition both costly and challenging, and their interpretation and downstream use in machine learning difficult due to data scarcity. Deep learning algorithms require large datasets to learn a robust relationship between the data and its label and avoid overfitting. To overcome the paucity of data, transfer learning has been employed in classification tasks. But an alternative exists: there often is a large corpus of unlabeled data which may enhance the learning process. To evaluate this potential for subsurface data, we compare a high-performance semi-supervised learning (SSL) algorithm (SimCLRv2) with supervised transfer learning on a Convolutional Neural Network (CNN) in geological image classification.</div><div>We tested the two approaches on a classification task of sediment disturbance from cores of International Ocean Drilling Program (IODP) Expeditions 383 and 385. Our results show that semi-supervised transfer learning can be an effective strategy to adopt, with SimCLRv2 capable of producing representations comparable to those of supervised transfer learning. However attempts to enhance the performance of semi-supervised transfer learning with task-specific unlabeled images during self-supervision degraded representations. Significantly, we demonstrate that SimCLRv2 trained on a dataset of core disturbance images can out-perform supervised transfer learning of a CNN once a critical number of task-specific unlabeled images are available for self-supervision. The gain in performance compared to supervised transfer learning is 1% and 3% for binary and multi-class classification, respectively.</div><div>Supervised transfer learning can be deployed with comparative ease, whereas the current SSL algorithms such as SimCLRv2 require more effort. We recommend that SSL be explored in cases when large amounts of unlabeled task-specific images exist and improvement of a few percent in metrics matter. When examining small, highly specialized datasets, without large amounts of unlabeled images, supervised transfer learning might be the best strategy to adopt. Overall, SSL is a promising approach and future work should explore this approach utilizing different dataset types, quantity, and quality.</div></div>","PeriodicalId":33804,"journal":{"name":"Applied Computing and Geosciences","volume":"25 ","pages":"Article 100216"},"PeriodicalIF":2.6,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143166137","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"lasertram: A Python library for time resolved analysis of laser ablation inductively coupled plasma mass spectrometry data","authors":"Jordan Lubbers , Adam J.R. Kent , Chris Russo","doi":"10.1016/j.acags.2025.100225","DOIUrl":"10.1016/j.acags.2025.100225","url":null,"abstract":"<div><div>Laser ablation inductively coupled plasma mass spectrometry (LA-ICP-MS) data has a wide variety of uses in the geosciences for in-situ chemical analysis of complex natural materials. Improvements to instrument capabilities and operating software have drastically reduced the time required to generate large volumes of data relative to previous methodologies. Raw data from LA-ICP-MS, however, is in counts per unit time (typically counts per second), not elemental concentrations and converting these count ratesto concentrations requires additional processing. For complex materials where the ablated volume may contain a range of material compositions, a moderate amount of user input is also required if appropriate concentrations are to be accurately calculated. In geologic materials such as glasses and minerals that potentially have numerous heterogeneities (e.g., microlites or other inclusions) within them, this is typically determiningwhether the total ablation signal should be filtered to remove these heterogeneities. This necessitates that the LA-ICP-MS data processing pipeline is one that is not automated, but is also designed to enable rapid and efficient processing of large volumes of data.</div><div>Here we introduce <figure><img></figure> , a Python library for the time resolved analysis of LA-ICP-MS data. We outline its mathematical theory, code structure, and provide an example of how it can be used to provide the time resolved analysis necessitated by LA-ICP-MS data of complex geologic materials. Throughout the <figure><img></figure> pipeline we show how metadata and data are incrementally added to the objects created such that virtually any aspect of an experiment may be interrogated and its quality assessed. We also show, that when combined with other Python libraries for building graphical user interfaces, it can be utilized outside of a pure scripting environment. <figure><img></figure> can be found at <span><span>https://doi.org/10.5066/P1DZUR3Z</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":33804,"journal":{"name":"Applied Computing and Geosciences","volume":"25 ","pages":"Article 100225"},"PeriodicalIF":2.6,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143549376","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Semantic segmentation framework for atoll satellite imagery: An in-depth exploration using UNet variants and Segmentation Gym","authors":"Ray Wang , Tahiya Chowdhury , Alejandra C. Ortiz","doi":"10.1016/j.acags.2024.100217","DOIUrl":"10.1016/j.acags.2024.100217","url":null,"abstract":"<div><div>This paper presents a framework for semantic segmentation of satellite imagery aimed at studying atoll morphometrics. Recent advances in deep neural networks for automated segmentation have been valuable across a variety of satellite and aerial imagery applications, such as land cover classification, mineral characterization, and disaster impact assessment. However, identifying an appropriate segmentation approach for geoscience research remains challenging, often relying on trial-and-error experimentation for data preparation, model selection, and validation. Building on prior efforts to create reproducible research pipelines for aerial image segmentation, we propose a systematic framework for custom segmentation model development using Segmentation Gym, a software tool designed for efficient model experimentation. Additionally, we evaluate state-of-the-art U-Net model variants to identify the most accurate and precise model for specific segmentation tasks. Using a dataset of 288 Landsat images of atolls as a case study, we conduct a detailed analysis of various annotation techniques, image types, and training methods, offering a structured framework for practitioners to design and explore segmentation models. Furthermore, we address dataset imbalance, a common challenge in geographical data, and discuss strategies to mitigate its impact on segmentation outcomes. Based on our findings, we provide recommendations for applying this framework to other geoscience research areas to address similar challenges.</div></div>","PeriodicalId":33804,"journal":{"name":"Applied Computing and Geosciences","volume":"25 ","pages":"Article 100217"},"PeriodicalIF":2.6,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143166135","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Christopher G. Smith , Julie Bernier , Alisha M. Ellis , Kathryn E.L. Smith
{"title":"Predictive regressive models of recent marsh sediment thickness improve the quantification of coastal marsh sediment budgets","authors":"Christopher G. Smith , Julie Bernier , Alisha M. Ellis , Kathryn E.L. Smith","doi":"10.1016/j.acags.2024.100215","DOIUrl":"10.1016/j.acags.2024.100215","url":null,"abstract":"<div><div>Coastal marsh wetlands experience variations in vertical gains and losses through time, which have allowed them to infill relict topography and record variations in drivers. The stratigraphic unit associated with the development of the marsh also reflects the long-term importance of key ecosystem services supplied by the marsh environment, including carbon storage and storm mitigation. Mapping these coastal wetland sediments and the marsh unit thickness is challenging as traditional coastal geophysical tools are not easily deployable (acoustic methods) or are unreliable in saline-soil environments (e.g., ground-penetrating radar), leaving core-based methods the most viable mapping method. In the present study, we utilized prior information on the geologic architecture of the region to select spatial and physical metrics that likely persisted throughout evolution of the marsh during the late Holocene. We then assessed the individual and collective power of these metrics to predict marsh thickness observed from cores. Employing regressive predictive models powered by these data, we improve the quantification of marsh thickness for a coastal fringing marsh within the Grand Bay estuary in Mississippi and Alabama (USA). The information gained from this approach yields improved estimates of the carbon stocks in this environment. Additionally, the stored sediment masses reflect the past, and potential future, persistence of the Grand Bay marsh under historical and present marsh-estuarine sediment exchange fluxes. Such improvements to both the sediment budget of recent marsh stratigraphic units and the spatial extent provide new resources for comparison with large-scale landscape models, the latter of which may be used, when validated, to predict future change and ecosystem transformations.</div></div>","PeriodicalId":33804,"journal":{"name":"Applied Computing and Geosciences","volume":"25 ","pages":"Article 100215"},"PeriodicalIF":2.6,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143166136","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Relationships between fault friction, slip time, and physical parameters explored by experiment-based friction model: A machine learning approach using recurrent neural networks (RNNs)","authors":"Tae-Hoon Uhmb , Yohei Hamada , Takehiro Hirose","doi":"10.1016/j.acags.2025.100231","DOIUrl":"10.1016/j.acags.2025.100231","url":null,"abstract":"<div><div>Understanding the relationship between fault friction and physical parameters is crucial for comprehending earthquake physics. Despite various friction models developed to explain this relationship, representing the relationships in a friction model with greater detail remains a challenge due to intricate correlations, including the nonlinear interplay between physical parameters and friction. Here we develop new models to define the relationship between various physical parameters (slip velocity, axial displacement, temperature, rate of temperature, and rate of axial displacement), friction coefficient, and slip time. The models are established by utilizing Recurrent Neural Networks (RNNs) to analyze continuous data in high-velocity rotary shear experiments (HVR), as reported by previous work. The experiment has been conducted on diorite specimens at a slip velocity (0.004 m/s) in various normal stress (0.3–5.8 MPa). At this conditions, frictional heating occurs inevitably at the sliding surface, reaching temperature up to 68 °C. We first identified the optimal model by assessing its accuracy in relation to the time interval for defining friction. Following this, we explored the relationship between friction and physical parameters with varying slip time and conditions by analyzing the gradient importance of physical parameters within the identified model. Our results demonstrate that the importance of physical parameters continuously shifts over slip time and conditions, and temperature stands out as the most influential parameter affecting fault friction under slip conditions of this study accompanied by frictional heating. Our study demonstrates the potential of deep learning analysis in enhancing our understanding of complex frictional processes, contributing to the development of more refined friction models and improving predictive models for earthquake physics.</div></div>","PeriodicalId":33804,"journal":{"name":"Applied Computing and Geosciences","volume":"25 ","pages":"Article 100231"},"PeriodicalIF":2.6,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143562771","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"X-ray Micro-CT based characterization of rock cuttings with deep learning","authors":"Nils Olsen , Yifeng Chen , Pascal Turberg , Alexandre Moreau , Alexandre Alahi","doi":"10.1016/j.acags.2025.100220","DOIUrl":"10.1016/j.acags.2025.100220","url":null,"abstract":"<div><div>Rock cuttings from destructive boreholes are a common and cheaper source of drilling materials that can be used to determine underground geology compared to rock core samples. Classifying manually the series of cuttings can be a long and tedious process and can also be prone to subjectivity leading to errors. In this paper, a framework for the classification of multiple types of rock structures is introduced based on rock cutting images from X-ray micro-CT technology. The classification is performed using a simple yet effective deep learning model (a ResNet-18 architecture) to categorize five different lithologies: micritic limestone, bioclastic limestone, oolithic limestone, molassic sandstone and gneiss. The proposed network is trained on 2 datasets (laboratory and borehole) both containing the five lithologies and comprise over 10 000 images. The laboratory dataset consists of a well-controlled experiments with homogeneous samples and the borehole dataset with heterogeneous samples corresponding to a real case application. Among all the considered models, including ResNet-34, and SPP-CNN and human experts manual classification, ResNet-18 demonstrates superior performance across multiple evaluation metrics, including precision, recall, and F1-score. It is to our best knowledge, the first time a test comparing deep neural network and human performance is performed for this task. To optimize the performance of the proposed model, the transfer learning method is implemented. Furthermore, the experiments demonstrate that when employing transfer learning, the size of the dataset significantly impacts the performance of the model. In the studied design, the experimental results confirm that the proposed approach is a cost-effective and efficient method for automated rock cutting classification using the micro-CT technique, and it can be easily modified to adapt the rock cutting classification from various types and sources.</div></div>","PeriodicalId":33804,"journal":{"name":"Applied Computing and Geosciences","volume":"25 ","pages":"Article 100220"},"PeriodicalIF":2.6,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143165478","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wenjia Li, Weilin Chen, Jiyin Zhang, Chenhao Li, Xiaogang Ma
{"title":"Geological object recognition in legacy maps through data augmentation and transfer learning techniques","authors":"Wenjia Li, Weilin Chen, Jiyin Zhang, Chenhao Li, Xiaogang Ma","doi":"10.1016/j.acags.2025.100233","DOIUrl":"10.1016/j.acags.2025.100233","url":null,"abstract":"<div><div>Maps are crucial tools in geosciences, providing detailed representations of the spatial distribution and relationships among geological features. Accurate recognition and classification of geological objects within these maps are essential for applications in resource exploration, environmental management, and geological hazard assessment. Along the years, many legacy geological maps have been accumulated, and many of them are not in data formats ready for machines to read and analyze. The inherent diversity and complexity of geological features, combined with the labor-intensive process of manual annotation, pose significant challenges in the usage of those maps. This study addresses these challenges by proposing an innovative approach that leverages legend data for data augmentation and employs transfer learning techniques to improve the quality of object recognition. Legend data from geological maps offer standardized symbols and annotations. Using them to augment existing datasets increases the diversity and volume of training data, thereby enhances the model's ability to generalize across various geological contexts. A deep learning model called EfficientNet is then fine-tuned using the augmented dataset to recognize and classify geological features more accurately. The model's performance is evaluated based on accuracy, recall, and F1-score, with results showing significant improvements, particularly for datasets with texture-rich information. The proposed method demonstrates that the combination of data augmentation and transfer learning significantly enhances the accuracy and efficiency of geological object recognition. This approach not only reduces the manual effort needed for geological object recognition but also contributes to the advancement of geological mapping and analysis.</div></div>","PeriodicalId":33804,"journal":{"name":"Applied Computing and Geosciences","volume":"25 ","pages":"Article 100233"},"PeriodicalIF":2.6,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143578333","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jiyin Zhang, Cory Clairmont, Xiang Que, Wenjia Li, Weilin Chen, Chenhao Li, Xiaogang Ma
{"title":"Streamlining geoscience data analysis with an LLM-driven workflow","authors":"Jiyin Zhang, Cory Clairmont, Xiang Que, Wenjia Li, Weilin Chen, Chenhao Li, Xiaogang Ma","doi":"10.1016/j.acags.2024.100218","DOIUrl":"10.1016/j.acags.2024.100218","url":null,"abstract":"<div><div>Large Language Models (LLMs) have made significant advancements in natural language processing and human-like response generation. However, training and fine-tuning an LLM to fit the strict requirements in the scope of academic research, such as geoscience, still requires significant computational resources and human expert alignment to ensure the quality and reliability of the generated content. The challenges highlight the need for a more flexible and reliable LLM workflow to meet domain-specific analysis needs. This study proposes an LLM-driven workflow that addresses the challenges of utilizing LLMs in geoscience data analysis. The work was built upon the open data API (application programming interface) of Mindat, one of the largest databases in mineralogy. We designed and developed an open-source LLM-driven workflow that processes natural language requests and automatically utilizes the Mindat API, mineral co-occurrence network analysis, and locality distribution heat map visualization to conduct geoscience data analysis tasks. Using prompt engineering techniques, we developed a supervisor-based agentic framework that enables LLM agents to not only interpret context information but also autonomously addressing complex geoscience analysis tasks, bridging the gap between automated workflows and human expertise. This agentic design emphasizes autonomy, allowing the workflow to adapt seamlessly to future advancements in LLM capabilities without requiring additional fine-tuning or domain-specific embedding. By providing the comprehensive context of the task in the workflow and the professional tool, we ensure the quality of LLM-generated content without the need to embed geoscience knowledge into LLMs through fine-tuning or human alignment. Our approach integrates LLMs into geoscience data analysis, addressing the need for specialized tools while reducing the learning curve through LLM-driven interactions between users and APIs. This streamlined workflow enhances the efficiency of exploratory data analysis, as demonstrated by the several use cases presented. In our future work we will explore the scalability of this workflow through the integration of additional agents and diverse geoscience data sources.</div></div>","PeriodicalId":33804,"journal":{"name":"Applied Computing and Geosciences","volume":"25 ","pages":"Article 100218"},"PeriodicalIF":2.6,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143166138","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J.V. Ratnam, Swadhin K. Behera, Masami Nonaka, Kalpesh R. Patil
{"title":"Skillful prediction of Indian Ocean Dipole index using machine learning models","authors":"J.V. Ratnam, Swadhin K. Behera, Masami Nonaka, Kalpesh R. Patil","doi":"10.1016/j.acags.2025.100228","DOIUrl":"10.1016/j.acags.2025.100228","url":null,"abstract":"<div><div>In this study, we evaluated six machine learning models for their skill in predicting the Indian Ocean Dipole (IOD). The results based on the IOD index predictions at 1–8 month lead time indicate that the AdaBoost model with Multi-Layer Perceptron as the base estimator, AdaBoost(MLP), to perform better than the other five models in predicting the IOD index at all lead times. Interestingly, the IOD predictions of AdaBoost(MLP) had an anomaly correlation coefficient above 0.6 at almost all lead times. The results suggest that the AdaBoost(MLP) machine learning model to be a promising tool for predicting the IOD index with a long lead time of 8 months. Analysis revealed that the machine learning model predictions are aided by the signals from the Pacific region, owing to co-occurrences of some of the IODs with El Nino-Southern Oscillations.</div></div>","PeriodicalId":33804,"journal":{"name":"Applied Computing and Geosciences","volume":"25 ","pages":"Article 100228"},"PeriodicalIF":2.6,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143377477","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Integrating empirical analysis and deep learning for accurate monsoon prediction in Kerala, India","authors":"Yajnaseni Dash, Ajith Abraham","doi":"10.1016/j.acags.2024.100211","DOIUrl":"10.1016/j.acags.2024.100211","url":null,"abstract":"<div><div>Kerala, a coastal state in India characterized by its humid tropical monsoon climate, is profoundly influenced by the Western Ghats and the Arabian Sea. Kerala receives significant rainfall during both the southwest monsoon (June to September, JJAS) and the northeast monsoon (October to December, OND) seasons. Given the substantial impact of rainfall on the state's economy and livelihoods, accurate precipitation forecasting is of critical importance. Although Kerala's annual rainfall is approximately 2.5 times higher than the national average, the state frequently experiences water scarcity due to rapid runoff into the Arabian Sea. This study builds upon previous research concerning Kerala's rainfall patterns and introduces a novel approach to improving rainfall predictions. Usage of a hybrid model that integrates Empirical Mode Decomposition (EMD) with Detrended Fluctuation Analysis (DFA) and deep Long Short-Term Memory (LSTM) neural networks, demonstrates enhanced precision in forecasting. Thus, by integrating empirical data analysis with advanced deep learning techniques, this research offers a robust framework for predicting rainfall in Kerala, making a significant contribution to the field of climate informatics and providing practical benefits for the region's economy and environmental management.</div></div>","PeriodicalId":33804,"journal":{"name":"Applied Computing and Geosciences","volume":"24 ","pages":"Article 100211"},"PeriodicalIF":2.6,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143137714","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}