Synthetic Data Generation Using Generative Adversarial Network (gan) for Burst Failure Risk Analysis of Oil and Gas Pipelines

IF 1.8 Q2 ENGINEERING, MULTIDISCIPLINARY
R. K. Mazumder, Gourav Modanwal, Yue Li
{"title":"Synthetic Data Generation Using Generative Adversarial Network (gan) for Burst Failure Risk Analysis of Oil and Gas Pipelines","authors":"R. K. Mazumder, Gourav Modanwal, Yue Li","doi":"10.1115/1.4062741","DOIUrl":null,"url":null,"abstract":"\n Despite the pipeline network being the safest mode of oil and gas transportation systems, the pipeline failure rate has increased significantly over the last decade, particularly for aging pipelines. Predicting failure risk and prioritizing the riskiest asset from a large set of pipelines is one of the demanding tasks for the utilities. Machine Learning (ML) application in pipeline failure risk prediction has recently shown promising results. However, due to safety and security concerns, obtaining sufficient operation and failure data to train ML models accurately is a significant challenge. This study employed a Generative Adversarial Network (GAN) based framework to generate synthetic pipeline data (DSyn, N=100) based on a subset (70%) of experimental burst test results data (DExp) compiled from the literature (N= 92) to overcome the limitation of accessing operational data. The proposed framework was tested on (1) real data, and (2) combined real and generated synthetic data. The burst failure risk of corroded oil and gas pipelines was determined using probabilistic approaches, and pipelines were classified into two classes: (1) low risk (pf:0-0.5) and (2) high risk (pf:>0.5). Two Random Forest (RF) models (MExp and MComb) were trained using a subset of actual experimental pipeline data (DExp, N=64) and combined data (DExp + DSyn, N=164). These models were validated on the remaining subset (30%) of experimental test data (N=28). The validation results reveal that adding synthetic data can further improve the performance of the ML models. The area under the ROC Curve was found to be 0.96 and 0.99 for real model (MExp) and combined model (MComb) data, respectively. The combined model with improved performance can be used in strategic oil and gas pipeline resilience improvement planning, which sets long-term critical decisions regarding maintenance and potential replacement of pipes.","PeriodicalId":44694,"journal":{"name":"ASCE-ASME Journal of Risk and Uncertainty in Engineering Systems Part B-Mechanical Engineering","volume":"1 1","pages":""},"PeriodicalIF":1.8000,"publicationDate":"2023-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ASCE-ASME Journal of Risk and Uncertainty in Engineering Systems Part B-Mechanical Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1115/1.4062741","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

Abstract

Despite the pipeline network being the safest mode of oil and gas transportation systems, the pipeline failure rate has increased significantly over the last decade, particularly for aging pipelines. Predicting failure risk and prioritizing the riskiest asset from a large set of pipelines is one of the demanding tasks for the utilities. Machine Learning (ML) application in pipeline failure risk prediction has recently shown promising results. However, due to safety and security concerns, obtaining sufficient operation and failure data to train ML models accurately is a significant challenge. This study employed a Generative Adversarial Network (GAN) based framework to generate synthetic pipeline data (DSyn, N=100) based on a subset (70%) of experimental burst test results data (DExp) compiled from the literature (N= 92) to overcome the limitation of accessing operational data. The proposed framework was tested on (1) real data, and (2) combined real and generated synthetic data. The burst failure risk of corroded oil and gas pipelines was determined using probabilistic approaches, and pipelines were classified into two classes: (1) low risk (pf:0-0.5) and (2) high risk (pf:>0.5). Two Random Forest (RF) models (MExp and MComb) were trained using a subset of actual experimental pipeline data (DExp, N=64) and combined data (DExp + DSyn, N=164). These models were validated on the remaining subset (30%) of experimental test data (N=28). The validation results reveal that adding synthetic data can further improve the performance of the ML models. The area under the ROC Curve was found to be 0.96 and 0.99 for real model (MExp) and combined model (MComb) data, respectively. The combined model with improved performance can be used in strategic oil and gas pipeline resilience improvement planning, which sets long-term critical decisions regarding maintenance and potential replacement of pipes.
基于生成对抗网络(gan)的油气管道突发失效风险分析综合数据生成
尽管管道网络是石油和天然气运输系统中最安全的模式,但在过去十年中,管道故障率显着增加,特别是老化的管道。对于公用事业公司来说,预测故障风险并优先考虑大量管道中风险最大的资产是一项艰巨的任务。近年来,机器学习在管道故障风险预测中的应用取得了可喜的成果。然而,出于安全和保障方面的考虑,获得足够的操作和故障数据来准确训练ML模型是一个重大挑战。本研究采用基于生成对抗网络(GAN)的框架,基于从文献(N= 92)中编译的实验爆炸测试结果数据(DExp)的子集(70%)生成合成管道数据(DSyn, N=100),以克服访问操作数据的限制。在(1)真实数据和(2)真实数据与生成的合成数据的结合上对该框架进行了测试。采用概率法确定了腐蚀油气管道的爆裂失效风险,并将管道分为低风险(pf:0 ~ 0.5)和高风险(pf:>0.5)两类。两个随机森林(RF)模型(MExp和MComb)使用实际实验管道数据(DExp, N=64)和组合数据(DExp + DSyn, N=164)的子集进行训练。这些模型在剩余子集(30%)的实验测试数据(N=28)上进行验证。验证结果表明,添加合成数据可以进一步提高机器学习模型的性能。真实模型(MExp)和组合模型(MComb)数据的ROC曲线下面积分别为0.96和0.99。该组合模型的性能得到了改善,可用于油气管道弹性改善战略规划,该规划可制定有关管道维护和潜在更换的长期关键决策。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
5.20
自引率
13.60%
发文量
34
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信