Cognitive HSE Risk Prediction and Notification Tool Based on Natural Language Processing

Day 3 Thu, September 23, 2021 Pub Date : 2021-09-15 DOI:10.2118/205877-ms

Tharunya Danabal, Neethi Sarah John, Abhijeet Pramod Ghawade, Pranjal Ahire

{"title":"Cognitive HSE Risk Prediction and Notification Tool Based on Natural Language Processing","authors":"Tharunya Danabal, Neethi Sarah John, Abhijeet Pramod Ghawade, Pranjal Ahire","doi":"10.2118/205877-ms","DOIUrl":null,"url":null,"abstract":"\n The focus of this work is on developing a cognitive tool that predicts the most frequent HSE hazards with the highest potential severity levels. The tool identifies these risks using a natural language processing algorithm on HSE leading and lagging indicator reports submitted to an oilfield services company’s global HSE reporting system. The purpose of the tool is to prioritize proactive actions and provide focus to raise workforce awareness.\n A natural language processing algorithm was developed to identify priority HSE risks based on potential severity levels and frequency of occurrence. The algorithm uses vectorization, compression, and clustering methods to categorize the risks by potential severity and frequency using a formulated risk index methodology. In the pilot study, a user interface was developed to configure the frequency and the number of the prioritized HSE risks that are to be communicated from the tool to those employees who opted to receive the information in a given location.\n From this pilot study using data reported in the company’s online HSE reporting system, the algorithm successfully identified five priority HSE risks across different hazard categories based on the risk index. Using a high volume of reporting data, the risk index factored multiple coefficients such as severity levels, frequency and cluster tightness to prioritize the HSE risks. The observations at each stage of the developed algorithm are as follows:In the data cleaning stage, all stop words (such as a, and, the) were removed, followed by tokenization to divide text in the HSE reports into tokens and remove punctuation.In the vectorization stage, many vectors were formed using the Term Frequency - Inverse Document Frequency (TF-IDF) method.In the compression stage, an autoencoder removed the noise from the input data.In the agglomerative clustering stage, HSE reports with similar words were grouped into clusters and the number of clusters generated per category were in the range of three to five.\n The novelty of this approach is its ability to prioritize a location’s HSE risks using an algorithm containing natural language processing techniques. This cognitive tool treats reported HSE information as data to identify and flag priority HSE risks factoring in the frequency of similar reports and their associated severity levels. The proof of concept has demonstrated the potential ability of the tool. The next stage would be to test predictive capabilities for injury prevention.","PeriodicalId":10965,"journal":{"name":"Day 3 Thu, September 23, 2021","volume":"26 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2021-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Day 3 Thu, September 23, 2021","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2118/205877-ms","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

The focus of this work is on developing a cognitive tool that predicts the most frequent HSE hazards with the highest potential severity levels. The tool identifies these risks using a natural language processing algorithm on HSE leading and lagging indicator reports submitted to an oilfield services company’s global HSE reporting system. The purpose of the tool is to prioritize proactive actions and provide focus to raise workforce awareness. A natural language processing algorithm was developed to identify priority HSE risks based on potential severity levels and frequency of occurrence. The algorithm uses vectorization, compression, and clustering methods to categorize the risks by potential severity and frequency using a formulated risk index methodology. In the pilot study, a user interface was developed to configure the frequency and the number of the prioritized HSE risks that are to be communicated from the tool to those employees who opted to receive the information in a given location. From this pilot study using data reported in the company’s online HSE reporting system, the algorithm successfully identified five priority HSE risks across different hazard categories based on the risk index. Using a high volume of reporting data, the risk index factored multiple coefficients such as severity levels, frequency and cluster tightness to prioritize the HSE risks. The observations at each stage of the developed algorithm are as follows:In the data cleaning stage, all stop words (such as a, and, the) were removed, followed by tokenization to divide text in the HSE reports into tokens and remove punctuation.In the vectorization stage, many vectors were formed using the Term Frequency - Inverse Document Frequency (TF-IDF) method.In the compression stage, an autoencoder removed the noise from the input data.In the agglomerative clustering stage, HSE reports with similar words were grouped into clusters and the number of clusters generated per category were in the range of three to five. The novelty of this approach is its ability to prioritize a location’s HSE risks using an algorithm containing natural language processing techniques. This cognitive tool treats reported HSE information as data to identify and flag priority HSE risks factoring in the frequency of similar reports and their associated severity levels. The proof of concept has demonstrated the potential ability of the tool. The next stage would be to test predictive capabilities for injury prevention.

查看原文本刊更多论文

基于自然语言处理的认知HSE风险预测与通知工具

这项工作的重点是开发一种认知工具，预测最常见的、潜在严重程度最高的HSE危害。该工具使用自然语言处理算法对提交给油服公司全球HSE报告系统的HSE领先和滞后指标报告进行识别。该工具的目的是确定主动行动的优先级，并提供重点以提高员工意识。开发了一种自然语言处理算法，根据潜在的严重程度和发生频率来识别优先的HSE风险。该算法使用矢量化、压缩和聚类方法，利用制定的风险指数方法，根据潜在的严重程度和频率对风险进行分类。在初步研究中，开发了一个用户界面，用于配置从该工具向选择在给定位置接收信息的员工传达的优先HSE风险的频率和数量。通过使用公司在线HSE报告系统中报告的数据进行试点研究，该算法根据风险指数成功识别出不同危害类别中的五个优先HSE风险。使用大量的报告数据，风险指数考虑了多个系数，如严重程度、频率和聚类紧密度，以确定HSE风险的优先级。所开发算法在各个阶段的观察结果如下:在数据清洗阶段，移除所有停止词(如a、and、The)，然后进行分词，将HSE报告中的文本划分为分词并去除标点符号。在矢量化阶段，使用词频-逆文档频率(TF-IDF)方法形成许多向量。在压缩阶段，自动编码器从输入数据中去除噪声。在聚类聚类阶段，将单词相近的HSE报告聚为一类，每一类生成的聚类数量为3 ~ 5个。这种方法的新颖之处在于，它能够使用包含自然语言处理技术的算法来优先考虑某个地点的HSE风险。这种认知工具将报告的HSE信息作为数据来识别和标记优先的HSE风险，考虑到类似报告的频率及其相关的严重程度。概念验证证明了该工具的潜在能力。下一阶段将测试损伤预防的预测能力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Day 3 Thu, September 23, 2021

自引率

0.00%

发文量