Mohammed Yaqoob, Mohammed Ishaq, Mohammed Yusuf Ansari, Venkata Ram Sagar Konagandla, Tamim Al Tamimi, Stefano Tavani, Amerigo Corradetti, Thomas Daniel Seers
{"title":"GeoCrack: A High-Resolution Dataset For Segmentation of Fracture Edges in Geological Outcrops.","authors":"Mohammed Yaqoob, Mohammed Ishaq, Mohammed Yusuf Ansari, Venkata Ram Sagar Konagandla, Tamim Al Tamimi, Stefano Tavani, Amerigo Corradetti, Thomas Daniel Seers","doi":"10.1038/s41597-024-04107-0","DOIUrl":null,"url":null,"abstract":"<p><p>GeoCrack is the first large-scale open source annotated dataset of fracture traces from geological outcrops, enabling deep learning-based fracture segmentation, setting a new standard for natural fracture characterization datasets. GeoCrack contains images from photogrammetric surveys of fractured rock exposures across 11 sites in Europe and the Middle East, capturing diverse lithologies and tectonic settings. Each image was cleaned, normalized, and manually segmented, followed by a recursive annotation vetting process to ensure the quality and accuracy of the digitized fracture edges. The processed images and corresponding binary masks were divided into 224 × 224 patches, yielding 12,158 pairs. GeoCrack captures representive real-world challenges in fracture edge annotation, such as contrast variations between fracture traces and the host medium due to geological and geomorphological factors like aperture dilation, host rock composition, outcrop weathering, and groundwater staining. Physical occlusions like shadows and vegetation are also considered to minimize false positives. GeoCrack was validated using a U-Net implementation for fracture segmentation, achieving satisfactory IoU of 85%. GeoCrack holds strong potential to advance deep fracture segmentation in geological applications, effectively tackling the diverse challenges of real-world fracture identification.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"11 1","pages":"1318"},"PeriodicalIF":5.8000,"publicationDate":"2024-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11615390/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Scientific Data","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.1038/s41597-024-04107-0","RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 0
Abstract
GeoCrack is the first large-scale open source annotated dataset of fracture traces from geological outcrops, enabling deep learning-based fracture segmentation, setting a new standard for natural fracture characterization datasets. GeoCrack contains images from photogrammetric surveys of fractured rock exposures across 11 sites in Europe and the Middle East, capturing diverse lithologies and tectonic settings. Each image was cleaned, normalized, and manually segmented, followed by a recursive annotation vetting process to ensure the quality and accuracy of the digitized fracture edges. The processed images and corresponding binary masks were divided into 224 × 224 patches, yielding 12,158 pairs. GeoCrack captures representive real-world challenges in fracture edge annotation, such as contrast variations between fracture traces and the host medium due to geological and geomorphological factors like aperture dilation, host rock composition, outcrop weathering, and groundwater staining. Physical occlusions like shadows and vegetation are also considered to minimize false positives. GeoCrack was validated using a U-Net implementation for fracture segmentation, achieving satisfactory IoU of 85%. GeoCrack holds strong potential to advance deep fracture segmentation in geological applications, effectively tackling the diverse challenges of real-world fracture identification.
期刊介绍:
Scientific Data is an open-access journal focused on data, publishing descriptions of research datasets and articles on data sharing across natural sciences, medicine, engineering, and social sciences. Its goal is to enhance the sharing and reuse of scientific data, encourage broader data sharing, and acknowledge those who share their data.
The journal primarily publishes Data Descriptors, which offer detailed descriptions of research datasets, including data collection methods and technical analyses validating data quality. These descriptors aim to facilitate data reuse rather than testing hypotheses or presenting new interpretations, methods, or in-depth analyses.