利用非层次聚类分析解决设施可靠性评估的数据缺口

Risk Management Pub Date : 2022-09-26 DOI:10.1115/ipc2022-87145

Ryan Stewart, Martin Di Blasi, T. Dessein

{"title":"利用非层次聚类分析解决设施可靠性评估的数据缺口","authors":"Ryan Stewart, Martin Di Blasi, T. Dessein","doi":"10.1115/ipc2022-87145","DOIUrl":null,"url":null,"abstract":"\n Performing reliability assessments for a large asset inventory of pipeline facility equipment, such as compressor station assets, requires a substantial dataset of attributes for a diverse range of equipment types. In many cases, equipment data inventories have gaps, with one or more required attributes unknown, such as diameter, wall thickness, operating pressure or material properties. The identification and collection of complete records is typically labor-intensive and time consuming, so data gaps are often filled with assumptions while ongoing data collection improves. A standard approach to fill these gaps is to use conservative assumptions for missing attributes. This results in missing data producing higher assessed risk than complete records. The benefit of this conservative approach is that it appropriately penalizes the incomplete records, driving action toward collecting the information where it matters. However, this approach is simple, does not leverage all the information available within the available dataset, and can produce a distorted representation of risk that may reduce the credibility of the risk assessment.\n This paper describes a process to use unsupervised machine learning algorithms to organize large asset inventories into groups and fill data gaps with reasonable, but conservative assumptions. We used a non-hierarchical clustering method to group asset records into clusters. Instead of using the most conservative value to fill data gaps across all records, gaps are filled using the most conservative value from similar records. This method provides estimates for data gaps that are more realistic while still maintaining conservatism, striking a balance between prioritizing equipment with confirmed attributes that indicate higher risk and equipment with little information.\n The approach described in this study relies on K-means clustering. We discuss the practical uses of dimensionality reduction, heuristic techniques for selecting the number of clusters, and sensitivity analysis.","PeriodicalId":21327,"journal":{"name":"Risk Management","volume":"1171 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Addressing Data Gaps for Facility Reliability Assessments Using Non-Hierarchical Cluster Analysis\",\"authors\":\"Ryan Stewart, Martin Di Blasi, T. Dessein\",\"doi\":\"10.1115/ipc2022-87145\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"\\n Performing reliability assessments for a large asset inventory of pipeline facility equipment, such as compressor station assets, requires a substantial dataset of attributes for a diverse range of equipment types. In many cases, equipment data inventories have gaps, with one or more required attributes unknown, such as diameter, wall thickness, operating pressure or material properties. The identification and collection of complete records is typically labor-intensive and time consuming, so data gaps are often filled with assumptions while ongoing data collection improves. A standard approach to fill these gaps is to use conservative assumptions for missing attributes. This results in missing data producing higher assessed risk than complete records. The benefit of this conservative approach is that it appropriately penalizes the incomplete records, driving action toward collecting the information where it matters. However, this approach is simple, does not leverage all the information available within the available dataset, and can produce a distorted representation of risk that may reduce the credibility of the risk assessment.\\n This paper describes a process to use unsupervised machine learning algorithms to organize large asset inventories into groups and fill data gaps with reasonable, but conservative assumptions. We used a non-hierarchical clustering method to group asset records into clusters. Instead of using the most conservative value to fill data gaps across all records, gaps are filled using the most conservative value from similar records. This method provides estimates for data gaps that are more realistic while still maintaining conservatism, striking a balance between prioritizing equipment with confirmed attributes that indicate higher risk and equipment with little information.\\n The approach described in this study relies on K-means clustering. We discuss the practical uses of dimensionality reduction, heuristic techniques for selecting the number of clusters, and sensitivity analysis.\",\"PeriodicalId\":21327,\"journal\":{\"name\":\"Risk Management\",\"volume\":\"1171 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-09-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Risk Management\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1115/ipc2022-87145\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Risk Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1115/ipc2022-87145","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

对大型管道设施设备(如压缩站资产)进行可靠性评估，需要大量不同设备类型的属性数据集。在许多情况下，设备数据清单存在空白，其中一个或多个所需属性未知，例如直径、壁厚、操作压力或材料特性。完整记录的识别和收集通常是劳动密集型和耗时的，因此在持续的数据收集得到改进的同时，数据缺口经常被假设所填补。填补这些空白的标准方法是对缺失的属性使用保守假设。这导致缺失的数据产生比完整记录更高的评估风险。这种保守方法的好处是，它适当地惩罚了不完整的记录，推动了在重要的地方收集信息的行动。然而，这种方法很简单，不能利用可用数据集中的所有可用信息，并且可能产生扭曲的风险表示，从而降低风险评估的可信度。本文描述了一个使用无监督机器学习算法将大型资产清单组织成组并使用合理但保守的假设填补数据空白的过程。我们使用非分层聚类方法对资产记录进行分组。不是使用最保守的值来填充所有记录中的数据空白，而是使用类似记录中最保守的值来填充空白。该方法提供了更现实的数据缺口估计，同时仍然保持保守性，在优先考虑具有确定属性的设备(表明风险较高)和信息较少的设备之间取得平衡。本研究中描述的方法依赖于K-means聚类。我们讨论了降维的实际用途，启发式技术选择簇的数量，和灵敏度分析。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Addressing Data Gaps for Facility Reliability Assessments Using Non-Hierarchical Cluster Analysis

Performing reliability assessments for a large asset inventory of pipeline facility equipment, such as compressor station assets, requires a substantial dataset of attributes for a diverse range of equipment types. In many cases, equipment data inventories have gaps, with one or more required attributes unknown, such as diameter, wall thickness, operating pressure or material properties. The identification and collection of complete records is typically labor-intensive and time consuming, so data gaps are often filled with assumptions while ongoing data collection improves. A standard approach to fill these gaps is to use conservative assumptions for missing attributes. This results in missing data producing higher assessed risk than complete records. The benefit of this conservative approach is that it appropriately penalizes the incomplete records, driving action toward collecting the information where it matters. However, this approach is simple, does not leverage all the information available within the available dataset, and can produce a distorted representation of risk that may reduce the credibility of the risk assessment. This paper describes a process to use unsupervised machine learning algorithms to organize large asset inventories into groups and fill data gaps with reasonable, but conservative assumptions. We used a non-hierarchical clustering method to group asset records into clusters. Instead of using the most conservative value to fill data gaps across all records, gaps are filled using the most conservative value from similar records. This method provides estimates for data gaps that are more realistic while still maintaining conservatism, striking a balance between prioritizing equipment with confirmed attributes that indicate higher risk and equipment with little information. The approach described in this study relies on K-means clustering. We discuss the practical uses of dimensionality reduction, heuristic techniques for selecting the number of clusters, and sensitivity analysis.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Risk Management

自引率

0.00%

发文量