Challenges to Opportunity: Getting Value Out of Unstructured Data Management

P. Kumar, A. Tveritnev, Salahuddin Abdullah Jan, Rauf Iqbal
{"title":"Challenges to Opportunity: Getting Value Out of Unstructured Data Management","authors":"P. Kumar, A. Tveritnev, Salahuddin Abdullah Jan, Rauf Iqbal","doi":"10.2118/214251-ms","DOIUrl":null,"url":null,"abstract":"\n The phrase unstructured data usually refers to information that doesn't reside in a traditional row-column database. The larger part of enterprise data nearly 80 %, is unstructured and has been much less accessible. From email, text documents, study reports, presentations, memos, to audios, videos and more, unstructured data is huge body of information. This paper proposes a work in progress model to deal with unstructured data management.\n In any E&P company, there is data lying in unstructured format including, local drives, network drives, share points, emails, etc. Data sensitivity plays an important role in classifying the data. Irrespective of the classification, it still holds a valuable information, which can be used for predicting business problems in analytical way. The way knowledge is shared among business through email, attachments, flat files, presentations, it requires a robust system/solution to manage the unstructured data. One of the examples could be, related to decision making. Business decision making happens over email or phone calls. There is a huge knowledge potential that exists in the emails of the business. There is a need to extract this information in a way that, it can be utilized in future for analytical decision making. Duplication is an important aspect of unstructured data managed which needs to be tackled. If we scan the current system, we can find various copies of same document, lying at different places in the organization. Same data keeps on circulating among the business users, thus causing the duplications. By having a system that controls the duplication of unstructured data in a meaningful way, will be beneficial for the organization.\n With the ongoing advancements in Machine learning and Natural language processing with combined analytical tools, time has come to extract value out of unstructured data. The proposed method will be to identify, gather and classify the unstructured data. Create and use a content management tool to organize and manage the unstructured data.Create a standard engine to deal with unstructured data, without having to convert it to structured data format. Apply an analytical engine at the top of this content and do prediction on the data. Whenever a new data comes into the content management, it gets ingested into the prediction analysis tool to assist business in decision making.","PeriodicalId":349960,"journal":{"name":"Day 2 Tue, March 14, 2023","volume":"44 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Day 2 Tue, March 14, 2023","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2118/214251-ms","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

The phrase unstructured data usually refers to information that doesn't reside in a traditional row-column database. The larger part of enterprise data nearly 80 %, is unstructured and has been much less accessible. From email, text documents, study reports, presentations, memos, to audios, videos and more, unstructured data is huge body of information. This paper proposes a work in progress model to deal with unstructured data management. In any E&P company, there is data lying in unstructured format including, local drives, network drives, share points, emails, etc. Data sensitivity plays an important role in classifying the data. Irrespective of the classification, it still holds a valuable information, which can be used for predicting business problems in analytical way. The way knowledge is shared among business through email, attachments, flat files, presentations, it requires a robust system/solution to manage the unstructured data. One of the examples could be, related to decision making. Business decision making happens over email or phone calls. There is a huge knowledge potential that exists in the emails of the business. There is a need to extract this information in a way that, it can be utilized in future for analytical decision making. Duplication is an important aspect of unstructured data managed which needs to be tackled. If we scan the current system, we can find various copies of same document, lying at different places in the organization. Same data keeps on circulating among the business users, thus causing the duplications. By having a system that controls the duplication of unstructured data in a meaningful way, will be beneficial for the organization. With the ongoing advancements in Machine learning and Natural language processing with combined analytical tools, time has come to extract value out of unstructured data. The proposed method will be to identify, gather and classify the unstructured data. Create and use a content management tool to organize and manage the unstructured data.Create a standard engine to deal with unstructured data, without having to convert it to structured data format. Apply an analytical engine at the top of this content and do prediction on the data. Whenever a new data comes into the content management, it gets ingested into the prediction analysis tool to assist business in decision making.
挑战与机遇:从非结构化数据管理中获取价值
短语非结构化数据通常是指不在传统的行-列数据库中存在的信息。企业数据的大部分(近80%)是非结构化的,不易访问。从电子邮件、文本文档、研究报告、演示文稿、备忘录到音频、视频等,非结构化数据是巨大的信息体。本文提出了一种处理非结构化数据管理的在制品模型。在任何勘探开发公司,都有非结构化格式的数据,包括本地驱动器、网络驱动器、共享点、电子邮件等。数据敏感性在数据分类中起着重要的作用。无论分类如何,它仍然包含有价值的信息,可以用于分析预测业务问题。企业之间通过电子邮件、附件、平面文件、演示文稿等方式共享知识,这需要一个强大的系统/解决方案来管理非结构化数据。其中一个例子可能与决策有关。商业决策是通过电子邮件或电话做出的。在企业的电子邮件中存在着巨大的知识潜力。有必要以一种可以在将来用于分析决策的方式提取这些信息。重复是需要解决的非结构化数据管理的一个重要方面。如果我们扫描当前的系统,我们可以发现相同文件的不同副本,在组织的不同地方。相同的数据在业务用户之间不断循环,从而导致重复。拥有一个以有意义的方式控制非结构化数据复制的系统,将对组织有益。随着机器学习和自然语言处理与组合分析工具的不断发展,从非结构化数据中提取价值的时代已经到来。提出的方法是对非结构化数据进行识别、收集和分类。创建并使用内容管理工具来组织和管理非结构化数据。创建一个标准引擎来处理非结构化数据,而不必将其转换为结构化数据格式。在此内容的顶部应用分析引擎并对数据进行预测。每当新数据进入内容管理时,它都会被吸收到预测分析工具中,以帮助业务决策。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信