{"title":"基于主题建模的油气项目智能对策分析","authors":"Ehab Elhosary , Osama Moselhi","doi":"10.1016/j.jlp.2025.105751","DOIUrl":null,"url":null,"abstract":"<div><div>The oil and gas industry is inherently complex and high-risk, with potential fires, explosions, and releases of hazardous substances posing significant safety challenges. Despite robust safety management systems, accidents persist, highlighting the importance of learning from past incidents and hazard reports. Historical Hazard and Operability (HAZOP) reports generate valuable countermeasures—safeguards and recommendations—that inform the design of protection systems to enhance safety management. However, the sheer volume of countermeasures produced makes addressing each one prohibitively expensive and time-consuming. Additionally, current HAZOP literature and software tools lack automation of these countermeasures, impeding the efficient dissemination of information to the appropriate departments for detailed design. This paper introduces categorizing countermeasures utilizing the BERTopic algorithm in natural language processing (NLP). The methodology comprises data preprocessing, SBERT (a modification of the Bidirectional Encoder Representations from Transformers) for generating embeddings, Uniform manifold approximation and projection (UMAP) for dimensionality reduction, hierarchical density-based spatial clustering of applications with noise (HDBSCAN) for clustering, and KeyBERT for topic representation. Applied to 1574 records from a HAZOP report of an oil pump station, the BERTopic model achieved 84.6 % coherence score and 90.7 % topic diversity score, resulting in 15 final topics, outperforming Latent Dirichlet Allocation (LDA) (45.3 % and 84.7 %) and Latent Semantic Analysis (LSA) (53 % and 96 %). The study identified included and excluded topics for each node and the most frequent topics by risk rate. The generated safety systems (SS) were validated against API RP 750 and RP 752 standards and the Countermeasures Breakdown Structure (CBS) was introduced to organize safety systems hierarchically. The developed model was tested on another dataset of an oil and gas production facility, comprising 512 records and 21 nodes, achieving 85.29 % coherence and 98.33 % topic diversity, confirming its robustness and consistency. This research benefits HAZOP participants by improving hazard identification, emphasizing key preventative actions, and assigning them to relevant departments for design-stage deployment.</div></div>","PeriodicalId":16291,"journal":{"name":"Journal of Loss Prevention in The Process Industries","volume":"98 ","pages":"Article 105751"},"PeriodicalIF":4.2000,"publicationDate":"2025-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Intelligent countermeasures analysis in oil and gas projects utilizing topic modeling\",\"authors\":\"Ehab Elhosary , Osama Moselhi\",\"doi\":\"10.1016/j.jlp.2025.105751\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>The oil and gas industry is inherently complex and high-risk, with potential fires, explosions, and releases of hazardous substances posing significant safety challenges. Despite robust safety management systems, accidents persist, highlighting the importance of learning from past incidents and hazard reports. Historical Hazard and Operability (HAZOP) reports generate valuable countermeasures—safeguards and recommendations—that inform the design of protection systems to enhance safety management. However, the sheer volume of countermeasures produced makes addressing each one prohibitively expensive and time-consuming. Additionally, current HAZOP literature and software tools lack automation of these countermeasures, impeding the efficient dissemination of information to the appropriate departments for detailed design. This paper introduces categorizing countermeasures utilizing the BERTopic algorithm in natural language processing (NLP). The methodology comprises data preprocessing, SBERT (a modification of the Bidirectional Encoder Representations from Transformers) for generating embeddings, Uniform manifold approximation and projection (UMAP) for dimensionality reduction, hierarchical density-based spatial clustering of applications with noise (HDBSCAN) for clustering, and KeyBERT for topic representation. Applied to 1574 records from a HAZOP report of an oil pump station, the BERTopic model achieved 84.6 % coherence score and 90.7 % topic diversity score, resulting in 15 final topics, outperforming Latent Dirichlet Allocation (LDA) (45.3 % and 84.7 %) and Latent Semantic Analysis (LSA) (53 % and 96 %). The study identified included and excluded topics for each node and the most frequent topics by risk rate. The generated safety systems (SS) were validated against API RP 750 and RP 752 standards and the Countermeasures Breakdown Structure (CBS) was introduced to organize safety systems hierarchically. The developed model was tested on another dataset of an oil and gas production facility, comprising 512 records and 21 nodes, achieving 85.29 % coherence and 98.33 % topic diversity, confirming its robustness and consistency. This research benefits HAZOP participants by improving hazard identification, emphasizing key preventative actions, and assigning them to relevant departments for design-stage deployment.</div></div>\",\"PeriodicalId\":16291,\"journal\":{\"name\":\"Journal of Loss Prevention in The Process Industries\",\"volume\":\"98 \",\"pages\":\"Article 105751\"},\"PeriodicalIF\":4.2000,\"publicationDate\":\"2025-08-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Loss Prevention in The Process Industries\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0950423025002098\",\"RegionNum\":3,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ENGINEERING, CHEMICAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Loss Prevention in The Process Industries","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0950423025002098","RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, CHEMICAL","Score":null,"Total":0}
引用次数: 0
摘要
石油和天然气行业本质上是复杂和高风险的,潜在的火灾、爆炸和有害物质的释放构成了重大的安全挑战。尽管有健全的安全管理系统,事故仍然存在,这突出了从过去的事件和危险报告中学习的重要性。历史危害和可操作性(HAZOP)报告产生了有价值的对策-保障措施和建议-为保护系统的设计提供信息,以加强安全管理。然而,所产生的大量对策使得解决每一个对策都非常昂贵和耗时。此外,目前的HAZOP文献和软件工具缺乏这些对策的自动化,阻碍了信息有效地传播到适当的部门进行详细设计。本文介绍了自然语言处理(NLP)中BERTopic算法的分类对策。该方法包括数据预处理,用于生成嵌入的SBERT(来自变压器的双向编码器表示的修改),用于降维的均匀歧形近似和投影(UMAP),用于聚类的基于噪声应用的分层密度的空间聚类(HDBSCAN),以及用于主题表示的KeyBERT。应用于某油站HAZOP报告的1574条记录,BERTopic模型获得了84.6%的一致性得分和90.7%的主题多样性得分,最终得出15个主题,优于潜狄利let分配(LDA)(45.3%和84.7%)和潜语义分析(LSA)(53%和96%)。该研究根据风险率确定了每个节点的纳入和排除主题以及最常见的主题。根据API RP 750和RP 752标准对生成的安全系统(SS)进行了验证,并引入了对策分解结构(CBS)对安全系统进行分层组织。开发的模型在另一个油气生产设施数据集上进行了测试,该数据集包含512条记录和21个节点,实现了85.29%的一致性和98.33%的主题多样性,证实了其鲁棒性和一致性。本研究通过改进危害识别,强调关键预防措施,并将其分配给相关部门进行设计阶段部署,使HAZOP参与者受益。
Intelligent countermeasures analysis in oil and gas projects utilizing topic modeling
The oil and gas industry is inherently complex and high-risk, with potential fires, explosions, and releases of hazardous substances posing significant safety challenges. Despite robust safety management systems, accidents persist, highlighting the importance of learning from past incidents and hazard reports. Historical Hazard and Operability (HAZOP) reports generate valuable countermeasures—safeguards and recommendations—that inform the design of protection systems to enhance safety management. However, the sheer volume of countermeasures produced makes addressing each one prohibitively expensive and time-consuming. Additionally, current HAZOP literature and software tools lack automation of these countermeasures, impeding the efficient dissemination of information to the appropriate departments for detailed design. This paper introduces categorizing countermeasures utilizing the BERTopic algorithm in natural language processing (NLP). The methodology comprises data preprocessing, SBERT (a modification of the Bidirectional Encoder Representations from Transformers) for generating embeddings, Uniform manifold approximation and projection (UMAP) for dimensionality reduction, hierarchical density-based spatial clustering of applications with noise (HDBSCAN) for clustering, and KeyBERT for topic representation. Applied to 1574 records from a HAZOP report of an oil pump station, the BERTopic model achieved 84.6 % coherence score and 90.7 % topic diversity score, resulting in 15 final topics, outperforming Latent Dirichlet Allocation (LDA) (45.3 % and 84.7 %) and Latent Semantic Analysis (LSA) (53 % and 96 %). The study identified included and excluded topics for each node and the most frequent topics by risk rate. The generated safety systems (SS) were validated against API RP 750 and RP 752 standards and the Countermeasures Breakdown Structure (CBS) was introduced to organize safety systems hierarchically. The developed model was tested on another dataset of an oil and gas production facility, comprising 512 records and 21 nodes, achieving 85.29 % coherence and 98.33 % topic diversity, confirming its robustness and consistency. This research benefits HAZOP participants by improving hazard identification, emphasizing key preventative actions, and assigning them to relevant departments for design-stage deployment.
期刊介绍:
The broad scope of the journal is process safety. Process safety is defined as the prevention and mitigation of process-related injuries and damage arising from process incidents involving fire, explosion and toxic release. Such undesired events occur in the process industries during the use, storage, manufacture, handling, and transportation of highly hazardous chemicals.