{"title":"Enhancing SAM-based digital rock image segmentation via edge-semantics fusion","authors":"Ziqiang Wang , Zhiyu Hou , Danping Cao","doi":"10.1016/j.acags.2025.100292","DOIUrl":null,"url":null,"abstract":"<div><div>The Segment Anything Model (SAM) demonstrates strong segmentation capabilities. However, its application to digital rock images faces challenges from subtle transitions between matrix minerals and pore structures, as well as inherent heterogeneity, which result in mis-segmentation and discontinuities that affect petrophysical characterization and numerical modeling of subsurface reservoir properties. To address these challenges, we introduce ESF-SAM (Edge-Semantics Fusion-SAM), a novel approach that enhances SAM's segmentation fidelity by integrating edge and semantic features. Specifically, in ESF-SAM, semantic features from SAM's image encoder are processed through an edge decoder enhanced by progressive dilated convolutions to extract detailed structural boundaries. The resulting edge and original semantic features are adaptively fused through a dual-attention mechanism, where spatial gating attention dynamically balances their contributions across locations, and channel attention recalibrates feature importance to enrich the representation. This spatial–channel attention framework enriches feature representations, enabling targeted fine-tuning within the SAM decoder and thereby preserving global segmentation capability while significantly improving local boundary delineation in two-phase segmentation tasks. Experimental results demonstrate that ESF-SAM improves segmentation detail, leading to more accurate derivation of key rock properties such as elastic modulus and pore geometry parameters, with results that more closely align with labeled data compared to the original SAM. Trained on only a small number of annotated sandstone images, ESF-SAM effectively adapts to the target domain and exhibits strong generalization when applied to carbonate rock images without additional fine-tuning. This work exemplifies how integrating priors into foundation models can substantially enhance their applicability to complex scientific imaging tasks.</div></div>","PeriodicalId":33804,"journal":{"name":"Applied Computing and Geosciences","volume":"28 ","pages":"Article 100292"},"PeriodicalIF":3.2000,"publicationDate":"2025-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Computing and Geosciences","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2590197425000746","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0
Abstract
The Segment Anything Model (SAM) demonstrates strong segmentation capabilities. However, its application to digital rock images faces challenges from subtle transitions between matrix minerals and pore structures, as well as inherent heterogeneity, which result in mis-segmentation and discontinuities that affect petrophysical characterization and numerical modeling of subsurface reservoir properties. To address these challenges, we introduce ESF-SAM (Edge-Semantics Fusion-SAM), a novel approach that enhances SAM's segmentation fidelity by integrating edge and semantic features. Specifically, in ESF-SAM, semantic features from SAM's image encoder are processed through an edge decoder enhanced by progressive dilated convolutions to extract detailed structural boundaries. The resulting edge and original semantic features are adaptively fused through a dual-attention mechanism, where spatial gating attention dynamically balances their contributions across locations, and channel attention recalibrates feature importance to enrich the representation. This spatial–channel attention framework enriches feature representations, enabling targeted fine-tuning within the SAM decoder and thereby preserving global segmentation capability while significantly improving local boundary delineation in two-phase segmentation tasks. Experimental results demonstrate that ESF-SAM improves segmentation detail, leading to more accurate derivation of key rock properties such as elastic modulus and pore geometry parameters, with results that more closely align with labeled data compared to the original SAM. Trained on only a small number of annotated sandstone images, ESF-SAM effectively adapts to the target domain and exhibits strong generalization when applied to carbonate rock images without additional fine-tuning. This work exemplifies how integrating priors into foundation models can substantially enhance their applicability to complex scientific imaging tasks.