A large-scale, high-quality dataset for lithology identification: Construction and applications

IF 6.1 1区 工程技术 Q2 ENERGY & FUELS
Jia-Yu Li , Ji-Zhou Tang , Xian-Zheng Zhao , Bo Fan , Wen-Ya Jiang , Shun-Yao Song , Jian-Bing Li , Kai-Da Chen , Zheng-Guang Zhao
{"title":"A large-scale, high-quality dataset for lithology identification: Construction and applications","authors":"Jia-Yu Li ,&nbsp;Ji-Zhou Tang ,&nbsp;Xian-Zheng Zhao ,&nbsp;Bo Fan ,&nbsp;Wen-Ya Jiang ,&nbsp;Shun-Yao Song ,&nbsp;Jian-Bing Li ,&nbsp;Kai-Da Chen ,&nbsp;Zheng-Guang Zhao","doi":"10.1016/j.petsci.2025.04.013","DOIUrl":null,"url":null,"abstract":"<div><div>Lithology identification is a critical aspect of geoenergy exploration, including geothermal energy development, gas hydrate extraction, and gas storage. In recent years, artificial intelligence techniques based on drill core images have made significant strides in lithology identification, achieving high accuracy. However, the current demand for advanced lithology identification models remains unmet due to the lack of high-quality drill core image datasets. This study successfully constructs and publicly releases the first open-source Drill Core Image Dataset (DCID), addressing the need for large-scale, high-quality datasets in lithology characterization tasks within geological engineering and establishing a standard dataset for model evaluation. DCID consists of 35 lithology categories and a total of 98,000 high-resolution images (512 × 512 pixels), making it the most comprehensive drill core image dataset in terms of lithology categories, image quantity, and resolution. This study also provides lithology identification accuracy benchmarks for popular convolutional neural networks (CNNs) such as VGG, ResNet, DenseNet, MobileNet, as well as for the Vision Transformer (ViT) and MLP-Mixer, based on DCID. Additionally, the sensitivity of model performance to various parameters and image resolution is evaluated. In response to real-world challenges, we propose a real-world data augmentation (RWDA) method, leveraging slightly defective images from DCID to enhance model robustness. The study also explores the impact of real-world lighting conditions on the performance of lithology identification models. Finally, we demonstrate how to rapidly evaluate model performance across multiple dimensions using low-resolution datasets, advancing the application and development of new lithology identification models for geoenergy exploration.</div></div>","PeriodicalId":19938,"journal":{"name":"Petroleum Science","volume":"22 8","pages":"Pages 3207-3228"},"PeriodicalIF":6.1000,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Petroleum Science","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1995822625001402","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENERGY & FUELS","Score":null,"Total":0}
引用次数: 0

Abstract

Lithology identification is a critical aspect of geoenergy exploration, including geothermal energy development, gas hydrate extraction, and gas storage. In recent years, artificial intelligence techniques based on drill core images have made significant strides in lithology identification, achieving high accuracy. However, the current demand for advanced lithology identification models remains unmet due to the lack of high-quality drill core image datasets. This study successfully constructs and publicly releases the first open-source Drill Core Image Dataset (DCID), addressing the need for large-scale, high-quality datasets in lithology characterization tasks within geological engineering and establishing a standard dataset for model evaluation. DCID consists of 35 lithology categories and a total of 98,000 high-resolution images (512 × 512 pixels), making it the most comprehensive drill core image dataset in terms of lithology categories, image quantity, and resolution. This study also provides lithology identification accuracy benchmarks for popular convolutional neural networks (CNNs) such as VGG, ResNet, DenseNet, MobileNet, as well as for the Vision Transformer (ViT) and MLP-Mixer, based on DCID. Additionally, the sensitivity of model performance to various parameters and image resolution is evaluated. In response to real-world challenges, we propose a real-world data augmentation (RWDA) method, leveraging slightly defective images from DCID to enhance model robustness. The study also explores the impact of real-world lighting conditions on the performance of lithology identification models. Finally, we demonstrate how to rapidly evaluate model performance across multiple dimensions using low-resolution datasets, advancing the application and development of new lithology identification models for geoenergy exploration.
用于岩性识别的大规模高质量数据集:构建和应用
岩性识别是地热能开发、天然气水合物开采和天然气储存等地球能源勘探的一个重要方面。近年来,基于岩心图像的人工智能技术在岩性识别方面取得了重大进展,实现了较高的岩性识别精度。然而,由于缺乏高质量的岩心图像数据集,目前对先进岩性识别模型的需求仍未得到满足。本研究成功构建并公开发布了首个开源岩心图像数据集(DCID),解决了地质工程中岩性表征任务对大规模、高质量数据集的需求,并为模型评估建立了标准数据集。DCID包括35个岩性类别,共9.8万张高分辨率图像(512 × 512像素),是岩性类别、图像数量和分辨率最全面的岩心图像数据集。该研究还为流行的卷积神经网络(cnn)(如VGG, ResNet, DenseNet, MobileNet)以及基于DCID的视觉变压器(ViT)和MLP-Mixer提供了岩性识别精度基准。此外,还评估了模型性能对各种参数和图像分辨率的敏感性。为了应对现实世界的挑战,我们提出了一种现实世界数据增强(RWDA)方法,利用DCID的轻微缺陷图像来增强模型的鲁棒性。该研究还探讨了现实世界光照条件对岩性识别模型性能的影响。最后,我们展示了如何使用低分辨率数据集在多个维度上快速评估模型的性能,从而推进了新岩性识别模型在地球能源勘探中的应用和开发。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Petroleum Science
Petroleum Science 地学-地球化学与地球物理
CiteScore
7.70
自引率
16.10%
发文量
311
审稿时长
63 days
期刊介绍: Petroleum Science is the only English journal in China on petroleum science and technology that is intended for professionals engaged in petroleum science research and technical applications all over the world, as well as the managerial personnel of oil companies. It covers petroleum geology, petroleum geophysics, petroleum engineering, petrochemistry & chemical engineering, petroleum mechanics, and economic management. It aims to introduce the latest results in oil industry research in China, promote cooperation in petroleum science research between China and the rest of the world, and build a bridge for scientific communication between China and the world.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信