Jia-Yu Li , Ji-Zhou Tang , Xian-Zheng Zhao , Bo Fan , Wen-Ya Jiang , Shun-Yao Song , Jian-Bing Li , Kai-Da Chen , Zheng-Guang Zhao
{"title":"用于岩性识别的大规模高质量数据集:构建和应用","authors":"Jia-Yu Li , Ji-Zhou Tang , Xian-Zheng Zhao , Bo Fan , Wen-Ya Jiang , Shun-Yao Song , Jian-Bing Li , Kai-Da Chen , Zheng-Guang Zhao","doi":"10.1016/j.petsci.2025.04.013","DOIUrl":null,"url":null,"abstract":"<div><div>Lithology identification is a critical aspect of geoenergy exploration, including geothermal energy development, gas hydrate extraction, and gas storage. In recent years, artificial intelligence techniques based on drill core images have made significant strides in lithology identification, achieving high accuracy. However, the current demand for advanced lithology identification models remains unmet due to the lack of high-quality drill core image datasets. This study successfully constructs and publicly releases the first open-source Drill Core Image Dataset (DCID), addressing the need for large-scale, high-quality datasets in lithology characterization tasks within geological engineering and establishing a standard dataset for model evaluation. DCID consists of 35 lithology categories and a total of 98,000 high-resolution images (512 × 512 pixels), making it the most comprehensive drill core image dataset in terms of lithology categories, image quantity, and resolution. This study also provides lithology identification accuracy benchmarks for popular convolutional neural networks (CNNs) such as VGG, ResNet, DenseNet, MobileNet, as well as for the Vision Transformer (ViT) and MLP-Mixer, based on DCID. Additionally, the sensitivity of model performance to various parameters and image resolution is evaluated. In response to real-world challenges, we propose a real-world data augmentation (RWDA) method, leveraging slightly defective images from DCID to enhance model robustness. The study also explores the impact of real-world lighting conditions on the performance of lithology identification models. Finally, we demonstrate how to rapidly evaluate model performance across multiple dimensions using low-resolution datasets, advancing the application and development of new lithology identification models for geoenergy exploration.</div></div>","PeriodicalId":19938,"journal":{"name":"Petroleum Science","volume":"22 8","pages":"Pages 3207-3228"},"PeriodicalIF":6.1000,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A large-scale, high-quality dataset for lithology identification: Construction and applications\",\"authors\":\"Jia-Yu Li , Ji-Zhou Tang , Xian-Zheng Zhao , Bo Fan , Wen-Ya Jiang , Shun-Yao Song , Jian-Bing Li , Kai-Da Chen , Zheng-Guang Zhao\",\"doi\":\"10.1016/j.petsci.2025.04.013\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Lithology identification is a critical aspect of geoenergy exploration, including geothermal energy development, gas hydrate extraction, and gas storage. In recent years, artificial intelligence techniques based on drill core images have made significant strides in lithology identification, achieving high accuracy. However, the current demand for advanced lithology identification models remains unmet due to the lack of high-quality drill core image datasets. This study successfully constructs and publicly releases the first open-source Drill Core Image Dataset (DCID), addressing the need for large-scale, high-quality datasets in lithology characterization tasks within geological engineering and establishing a standard dataset for model evaluation. DCID consists of 35 lithology categories and a total of 98,000 high-resolution images (512 × 512 pixels), making it the most comprehensive drill core image dataset in terms of lithology categories, image quantity, and resolution. This study also provides lithology identification accuracy benchmarks for popular convolutional neural networks (CNNs) such as VGG, ResNet, DenseNet, MobileNet, as well as for the Vision Transformer (ViT) and MLP-Mixer, based on DCID. Additionally, the sensitivity of model performance to various parameters and image resolution is evaluated. In response to real-world challenges, we propose a real-world data augmentation (RWDA) method, leveraging slightly defective images from DCID to enhance model robustness. The study also explores the impact of real-world lighting conditions on the performance of lithology identification models. Finally, we demonstrate how to rapidly evaluate model performance across multiple dimensions using low-resolution datasets, advancing the application and development of new lithology identification models for geoenergy exploration.</div></div>\",\"PeriodicalId\":19938,\"journal\":{\"name\":\"Petroleum Science\",\"volume\":\"22 8\",\"pages\":\"Pages 3207-3228\"},\"PeriodicalIF\":6.1000,\"publicationDate\":\"2025-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Petroleum Science\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1995822625001402\",\"RegionNum\":1,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ENERGY & FUELS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Petroleum Science","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1995822625001402","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENERGY & FUELS","Score":null,"Total":0}
A large-scale, high-quality dataset for lithology identification: Construction and applications
Lithology identification is a critical aspect of geoenergy exploration, including geothermal energy development, gas hydrate extraction, and gas storage. In recent years, artificial intelligence techniques based on drill core images have made significant strides in lithology identification, achieving high accuracy. However, the current demand for advanced lithology identification models remains unmet due to the lack of high-quality drill core image datasets. This study successfully constructs and publicly releases the first open-source Drill Core Image Dataset (DCID), addressing the need for large-scale, high-quality datasets in lithology characterization tasks within geological engineering and establishing a standard dataset for model evaluation. DCID consists of 35 lithology categories and a total of 98,000 high-resolution images (512 × 512 pixels), making it the most comprehensive drill core image dataset in terms of lithology categories, image quantity, and resolution. This study also provides lithology identification accuracy benchmarks for popular convolutional neural networks (CNNs) such as VGG, ResNet, DenseNet, MobileNet, as well as for the Vision Transformer (ViT) and MLP-Mixer, based on DCID. Additionally, the sensitivity of model performance to various parameters and image resolution is evaluated. In response to real-world challenges, we propose a real-world data augmentation (RWDA) method, leveraging slightly defective images from DCID to enhance model robustness. The study also explores the impact of real-world lighting conditions on the performance of lithology identification models. Finally, we demonstrate how to rapidly evaluate model performance across multiple dimensions using low-resolution datasets, advancing the application and development of new lithology identification models for geoenergy exploration.
期刊介绍:
Petroleum Science is the only English journal in China on petroleum science and technology that is intended for professionals engaged in petroleum science research and technical applications all over the world, as well as the managerial personnel of oil companies. It covers petroleum geology, petroleum geophysics, petroleum engineering, petrochemistry & chemical engineering, petroleum mechanics, and economic management. It aims to introduce the latest results in oil industry research in China, promote cooperation in petroleum science research between China and the rest of the world, and build a bridge for scientific communication between China and the world.