Overcoming Challenges for Quantitative Risk Modeling Using Machine Learned Data Correlations and Predictive Modeling

S. Biagiotti, Dan Williams, Sergiy Kondratyuk, Brett Johnson
{"title":"Overcoming Challenges for Quantitative Risk Modeling Using Machine Learned Data Correlations and Predictive Modeling","authors":"S. Biagiotti, Dan Williams, Sergiy Kondratyuk, Brett Johnson","doi":"10.1115/ipc2022-87236","DOIUrl":null,"url":null,"abstract":"\n After 20 years of learnings and successful risk reduction, the pipeline industry is striving to achieve the next step change in risk performance by migrating from relative index-based risk models toward probabilistic approaches across all threats and all pipeline segments for system-wide risk assessment. While the quantification of pipeline risk can be readily supported by in-line inspection (ILI) results coupled with probabilistic limit state modeling for certain threats and pipeline segments where this information is available, where such data is lacking for other pipeline segments or threats, it is necessary to apply a meaningful methodology to establish the probability of failure for all dynamic segments used to quantify risk.\n The process for establishing probability-based threat assessments for these other segments involves several stages: (1) identify correlations based on ILI historical results (i.e., create the “training” dataset); (2) leverage classification trees to identify statistically relevant data observations of key variables; (3) apply machine learning techniques to develop probabilistic and/or causal models that predict target outcomes from the combinations of key variables; and, (4) apply the relationships established in stages 1, 2 and 3 to all assets lacking ILI results. Although seemingly straightforward, several challenges exist for achieving seamless implementation.\n This paper will review each process step and provide guidance on preparing and tackling common data intelligence challenges. The paper will also propose strategies for classifying assets, other than indexing, for application in those situations where machine learning does not indicate statistically significant variable correlations or yield strong predictions of target outcomes. Although still early in the understanding of migrating from relative index-based to probabilistic risk algorithms, the value provided in this paper is the sharing of lessons learned regarding “how” to gather the “evidence” necessary to identify statistical dependencies, how to apply data confidence metrics within the decision process, the challenges in data preparation/QC and interpretation techniques, and suggestions for determining the necessary limitations that should be applied to the outcomes toward the objective of quantifying the risk.","PeriodicalId":264830,"journal":{"name":"Volume 2: Pipeline and Facilities Integrity","volume":"41 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Volume 2: Pipeline and Facilities Integrity","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1115/ipc2022-87236","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

After 20 years of learnings and successful risk reduction, the pipeline industry is striving to achieve the next step change in risk performance by migrating from relative index-based risk models toward probabilistic approaches across all threats and all pipeline segments for system-wide risk assessment. While the quantification of pipeline risk can be readily supported by in-line inspection (ILI) results coupled with probabilistic limit state modeling for certain threats and pipeline segments where this information is available, where such data is lacking for other pipeline segments or threats, it is necessary to apply a meaningful methodology to establish the probability of failure for all dynamic segments used to quantify risk. The process for establishing probability-based threat assessments for these other segments involves several stages: (1) identify correlations based on ILI historical results (i.e., create the “training” dataset); (2) leverage classification trees to identify statistically relevant data observations of key variables; (3) apply machine learning techniques to develop probabilistic and/or causal models that predict target outcomes from the combinations of key variables; and, (4) apply the relationships established in stages 1, 2 and 3 to all assets lacking ILI results. Although seemingly straightforward, several challenges exist for achieving seamless implementation. This paper will review each process step and provide guidance on preparing and tackling common data intelligence challenges. The paper will also propose strategies for classifying assets, other than indexing, for application in those situations where machine learning does not indicate statistically significant variable correlations or yield strong predictions of target outcomes. Although still early in the understanding of migrating from relative index-based to probabilistic risk algorithms, the value provided in this paper is the sharing of lessons learned regarding “how” to gather the “evidence” necessary to identify statistical dependencies, how to apply data confidence metrics within the decision process, the challenges in data preparation/QC and interpretation techniques, and suggestions for determining the necessary limitations that should be applied to the outcomes toward the objective of quantifying the risk.
利用机器学习数据关联和预测建模克服定量风险建模的挑战
经过20年的学习和成功的风险降低,管道行业正在努力实现风险表现的下一步变化,从基于相对指数的风险模型转向针对所有威胁和所有管道段的概率方法,以进行全系统风险评估。虽然管道风险的量化可以很容易地通过在线检查(ILI)结果以及对某些威胁和可用信息的管道段的概率极限状态建模来支持,但在其他管道段或威胁缺乏此类数据的情况下,有必要应用一种有意义的方法来建立用于量化风险的所有动态段的故障概率。为这些其他部分建立基于概率的威胁评估的过程包括几个阶段:(1)根据ILI历史结果确定相关性(即创建“训练”数据集);(2)利用分类树识别关键变量的统计相关数据观测值;(3)应用机器学习技术开发概率和/或因果模型,从关键变量的组合中预测目标结果;(4)将第1、2和3阶段建立的关系应用于所有缺乏ILI结果的资产。虽然看起来很简单,但实现无缝实现存在一些挑战。本文将回顾每个过程步骤,并提供准备和解决常见数据智能挑战的指导。本文还将提出除索引之外的资产分类策略,用于机器学习无法显示统计上显着的变量相关性或对目标结果产生强烈预测的情况。尽管从基于相对索引的风险算法向概率风险算法迁移的理解还处于早期阶段,但本文提供的价值是分享关于“如何”收集必要的“证据”以识别统计依赖性,如何在决策过程中应用数据置信度指标,数据准备/质量控制和解释技术中的挑战,以及确定必要限制的建议,这些限制应适用于实现风险量化目标的结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信