Botao Lin , Yan Jin , Qianwen Cao , Han Meng , Huiwen Pang , Shiming Wei
{"title":"开发油气相关岩石力学的大型语言模型:进展与挑战","authors":"Botao Lin , Yan Jin , Qianwen Cao , Han Meng , Huiwen Pang , Shiming Wei","doi":"10.1016/j.ngib.2025.03.007","DOIUrl":null,"url":null,"abstract":"<div><div>In recent years, large language models (LLMs) have demonstrated immense potential in practical applications to enhance work efficiency and decision-making capabilities. However, specialized LLMs in the oil and gas engineering area are rarely developed. To aid in exploring and developing deep and ultra-deep unconventional reservoirs, there is a call for a personalized LLM on oil- and gas-related rock mechanics, which may handle complex professional data and make intelligent predictions and decisions. To that end, herein, we overview general and industry-specific LLMs. Then, a systematic workflow is proposed for building this domain-specific LLM for oil and gas engineering, including data collection and processing, model construction and training, model validation, and implementation in the specific domain. Moreover, three application scenarios are investigated: knowledge extraction from textural resources, field operation with multidisciplinary integration, and intelligent decision assistance. Finally, several challenges in developing this domain-specific LLM are highlighted. Our key findings are that geological surveys, laboratory experiments, field tests, and numerical simulations form the four original sources of rock mechanics data. Those data must flow through collection, storage, processing, and governance before being fed into LLM training. This domain-specific LLM can be trained by fine-tuning a general open-source LLM with professional data and constraints such as rock mechanics datasets and principles. The LLM can then follow the commonly used training and validation processes before being implemented in the oil and gas field. However, there are three primary challenges in building this domain-specific LLM: data standardization, data security and access, and striking a compromise between physics and data when building the model structure. Some of these challenges are administrative rather than technical, and overcoming those requires close collaboration between the different interested parties and various professional practitioners.</div></div>","PeriodicalId":37116,"journal":{"name":"Natural Gas Industry B","volume":"12 2","pages":"Pages 110-122"},"PeriodicalIF":4.2000,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Developing a large language model for oil- and gas-related rock mechanics: Progress and challenges\",\"authors\":\"Botao Lin , Yan Jin , Qianwen Cao , Han Meng , Huiwen Pang , Shiming Wei\",\"doi\":\"10.1016/j.ngib.2025.03.007\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>In recent years, large language models (LLMs) have demonstrated immense potential in practical applications to enhance work efficiency and decision-making capabilities. However, specialized LLMs in the oil and gas engineering area are rarely developed. To aid in exploring and developing deep and ultra-deep unconventional reservoirs, there is a call for a personalized LLM on oil- and gas-related rock mechanics, which may handle complex professional data and make intelligent predictions and decisions. To that end, herein, we overview general and industry-specific LLMs. Then, a systematic workflow is proposed for building this domain-specific LLM for oil and gas engineering, including data collection and processing, model construction and training, model validation, and implementation in the specific domain. Moreover, three application scenarios are investigated: knowledge extraction from textural resources, field operation with multidisciplinary integration, and intelligent decision assistance. Finally, several challenges in developing this domain-specific LLM are highlighted. Our key findings are that geological surveys, laboratory experiments, field tests, and numerical simulations form the four original sources of rock mechanics data. Those data must flow through collection, storage, processing, and governance before being fed into LLM training. This domain-specific LLM can be trained by fine-tuning a general open-source LLM with professional data and constraints such as rock mechanics datasets and principles. The LLM can then follow the commonly used training and validation processes before being implemented in the oil and gas field. However, there are three primary challenges in building this domain-specific LLM: data standardization, data security and access, and striking a compromise between physics and data when building the model structure. Some of these challenges are administrative rather than technical, and overcoming those requires close collaboration between the different interested parties and various professional practitioners.</div></div>\",\"PeriodicalId\":37116,\"journal\":{\"name\":\"Natural Gas Industry B\",\"volume\":\"12 2\",\"pages\":\"Pages 110-122\"},\"PeriodicalIF\":4.2000,\"publicationDate\":\"2025-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Natural Gas Industry B\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S235285402500021X\",\"RegionNum\":3,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ENERGY & FUELS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Natural Gas Industry B","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S235285402500021X","RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENERGY & FUELS","Score":null,"Total":0}
Developing a large language model for oil- and gas-related rock mechanics: Progress and challenges
In recent years, large language models (LLMs) have demonstrated immense potential in practical applications to enhance work efficiency and decision-making capabilities. However, specialized LLMs in the oil and gas engineering area are rarely developed. To aid in exploring and developing deep and ultra-deep unconventional reservoirs, there is a call for a personalized LLM on oil- and gas-related rock mechanics, which may handle complex professional data and make intelligent predictions and decisions. To that end, herein, we overview general and industry-specific LLMs. Then, a systematic workflow is proposed for building this domain-specific LLM for oil and gas engineering, including data collection and processing, model construction and training, model validation, and implementation in the specific domain. Moreover, three application scenarios are investigated: knowledge extraction from textural resources, field operation with multidisciplinary integration, and intelligent decision assistance. Finally, several challenges in developing this domain-specific LLM are highlighted. Our key findings are that geological surveys, laboratory experiments, field tests, and numerical simulations form the four original sources of rock mechanics data. Those data must flow through collection, storage, processing, and governance before being fed into LLM training. This domain-specific LLM can be trained by fine-tuning a general open-source LLM with professional data and constraints such as rock mechanics datasets and principles. The LLM can then follow the commonly used training and validation processes before being implemented in the oil and gas field. However, there are three primary challenges in building this domain-specific LLM: data standardization, data security and access, and striking a compromise between physics and data when building the model structure. Some of these challenges are administrative rather than technical, and overcoming those requires close collaboration between the different interested parties and various professional practitioners.