Big Data Confidentiality: An Approach Toward Corporate Compliance Using a Rule-Based System.

IF 2.6 4区计算机科学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Big Data Pub Date : 2025-04-01 Epub Date: 2023-10-31 DOI:10.1089/big.2022.0201

Georgios Vranopoulos, Nathan Clarke, Shirley Atkinson

{"title":"Big Data Confidentiality: An Approach Toward Corporate Compliance Using a Rule-Based System.","authors":"Georgios Vranopoulos, Nathan Clarke, Shirley Atkinson","doi":"10.1089/big.2022.0201","DOIUrl":null,"url":null,"abstract":"<p><p>Organizations have been investing in analytics relying on internal and external data to gain a competitive advantage. However, the legal and regulatory acts imposed nationally and internationally have become a challenge, especially for highly regulated sectors such as health or finance/banking. Data handlers such as Facebook and Amazon have already sustained considerable fines or are under investigation due to violations of data governance. The era of big data has further intensified the challenges of minimizing the risk of data loss by introducing the dimensions of Volume, Velocity, and Variety into confidentiality. Although Volume and Velocity have been extensively researched, Variety, \"the ugly duckling\" of big data, is often neglected and difficult to solve, thus increasing the risk of data exposure and data loss. In mitigating the risk of data exposure and data loss in this article, a framework is proposed to utilize algorithmic classification and workflow capabilities to provide a consistent approach toward data evaluations across the organizations. A rule-based system, implementing the corporate data classification policy, will minimize the risk of exposure by facilitating users to identify the approved guidelines and enforce them quickly. The framework includes an exception handling process with appropriate approval for extenuating circumstances. The system was implemented in a proof of concept working prototype to showcase the capabilities and provide a hands-on experience. The information system was evaluated and accredited by a diverse audience of academics and senior business executives in the fields of security and data management. The audience had an average experience of ∼25 years and amasses a total experience of almost three centuries (294 years). The results confirmed that the 3Vs are of concern and that Variety, with a majority of 90% of the commentators, is the most troubling. In addition to that, with an approximate average of 60%, it was confirmed that appropriate policies, procedure, and prerequisites for classification are in place while implementation tools are lagging.</p>","PeriodicalId":51314,"journal":{"name":"Big Data","volume":" ","pages":"90-110"},"PeriodicalIF":2.6000,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Big Data","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1089/big.2022.0201","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2023/10/31 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

Abstract

Organizations have been investing in analytics relying on internal and external data to gain a competitive advantage. However, the legal and regulatory acts imposed nationally and internationally have become a challenge, especially for highly regulated sectors such as health or finance/banking. Data handlers such as Facebook and Amazon have already sustained considerable fines or are under investigation due to violations of data governance. The era of big data has further intensified the challenges of minimizing the risk of data loss by introducing the dimensions of Volume, Velocity, and Variety into confidentiality. Although Volume and Velocity have been extensively researched, Variety, "the ugly duckling" of big data, is often neglected and difficult to solve, thus increasing the risk of data exposure and data loss. In mitigating the risk of data exposure and data loss in this article, a framework is proposed to utilize algorithmic classification and workflow capabilities to provide a consistent approach toward data evaluations across the organizations. A rule-based system, implementing the corporate data classification policy, will minimize the risk of exposure by facilitating users to identify the approved guidelines and enforce them quickly. The framework includes an exception handling process with appropriate approval for extenuating circumstances. The system was implemented in a proof of concept working prototype to showcase the capabilities and provide a hands-on experience. The information system was evaluated and accredited by a diverse audience of academics and senior business executives in the fields of security and data management. The audience had an average experience of ∼25 years and amasses a total experience of almost three centuries (294 years). The results confirmed that the 3Vs are of concern and that Variety, with a majority of 90% of the commentators, is the most troubling. In addition to that, with an approximate average of 60%, it was confirmed that appropriate policies, procedure, and prerequisites for classification are in place while implementation tools are lagging.

查看原文本刊更多论文

大数据保密：使用基于规则的系统实现企业合规的方法。

组织一直在投资于依赖内部和外部数据的分析，以获得竞争优势。然而，国家和国际上实施的法律和监管法案已成为一项挑战，尤其是对卫生或金融/银行等高度监管的部门而言。脸书（Facebook）和亚马逊（Amazon）等数据处理公司已经因违反数据治理规定而被处以巨额罚款，或正在接受调查。大数据时代通过将Volume、Velocity和Variety等维度引入保密性，进一步加剧了将数据丢失风险降至最低的挑战。尽管Volume和Velocity已经得到了广泛的研究，但Variety这个大数据的“丑小鸭”却经常被忽视和难以解决，从而增加了数据暴露和数据丢失的风险。在本文中，为了降低数据暴露和数据丢失的风险，提出了一个框架，利用算法分类和工作流功能，为跨组织的数据评估提供一致的方法。一个基于规则的系统，实施公司数据分类政策，将通过方便用户识别批准的指导方针并迅速执行，将暴露风险降至最低。该框架包括一个例外处理程序，对情有可原的情况给予适当批准。该系统是在概念验证工作原型中实现的，以展示其能力并提供动手体验。安全和数据管理领域的学者和高级企业高管对该信息系统进行了评估和认可。观众平均经历了~25年，积累了近三个世纪（294年）的总经历。结果证实，3V令人担忧，而拥有90%评论员的《综艺》是最令人担忧的。除此之外，平均水平约为60%，证实了适当的分类政策、程序和先决条件已经到位，而实施工具却滞后。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Big Data COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS-COMPUTER SCIENCE, THEORY & METHODS

CiteScore

9.10

自引率

2.20%

发文量

期刊介绍： Big Data is the leading peer-reviewed journal covering the challenges and opportunities in collecting, analyzing, and disseminating vast amounts of data. The Journal addresses questions surrounding this powerful and growing field of data science and facilitates the efforts of researchers, business managers, analysts, developers, data scientists, physicists, statisticians, infrastructure developers, academics, and policymakers to improve operations, profitability, and communications within their businesses and institutions. Spanning a broad array of disciplines focusing on novel big data technologies, policies, and innovations, the Journal brings together the community to address current challenges and enforce effective efforts to organize, store, disseminate, protect, manipulate, and, most importantly, find the most effective strategies to make this incredible amount of information work to benefit society, industry, academia, and government. Big Data coverage includes: Big data industry standards, New technologies being developed specifically for big data, Data acquisition, cleaning, distribution, and best practices, Data protection, privacy, and policy, Business interests from research to product, The changing role of business intelligence, Visualization and design principles of big data infrastructures, Physical interfaces and robotics, Social networking advantages for Facebook, Twitter, Amazon, Google, etc, Opportunities around big data and how companies can harness it to their advantage.