Rigorous Measurement Model for Validity of Big Data: MEGA Approach

Proceedings of the 25th International Database Engineering & Applications Symposium Pub Date : 2021-07-14 DOI:10.1145/3472163.3472171

Dave Bhardwaj, O. Ormandjieva

{"title":"Rigorous Measurement Model for Validity of Big Data: MEGA Approach","authors":"Dave Bhardwaj, O. Ormandjieva","doi":"10.1145/3472163.3472171","DOIUrl":null,"url":null,"abstract":"Big Data is becoming a substantial part of the decision-making processes in both industry and academia, especially in areas where Big Data may have a profound impact on businesses and society. However, as more data is being processed, data quality is becoming a genuine issue that negatively affects credibility of the systems we build because of the lack of visibility and transparency of the underlying data. Therefore, Big Data quality measurement is becoming increasingly necessary in assessing whether data can serve its purpose in a particular context (such as Big Data analytics, for example). This research addresses Big Data quality measurement modelling and automation by proposing a novel quality measurement framework for Big Data (MEGA) that objectively assesses the underlying quality characteristics of Big Data (also known as the V's of Big Data) at each step of the Big Data Pipelines. Five of the Big Data V's (Volume, Variety, Velocity, Veracity and Validity) are currently automated by the MEGA framework. In this paper, a new theoretically valid quality measurement model is proposed for an essential quality characteristic of Big Data, called Validity. The proposed measurement information model for Validity of Big Data is a hierarchy of 4 derived measures / indicators and 5 based measures. Validity measurement is illustrated on a running example.","PeriodicalId":242683,"journal":{"name":"Proceedings of the 25th International Database Engineering & Applications Symposium","volume":"21 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 25th International Database Engineering & Applications Symposium","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3472163.3472171","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Big Data is becoming a substantial part of the decision-making processes in both industry and academia, especially in areas where Big Data may have a profound impact on businesses and society. However, as more data is being processed, data quality is becoming a genuine issue that negatively affects credibility of the systems we build because of the lack of visibility and transparency of the underlying data. Therefore, Big Data quality measurement is becoming increasingly necessary in assessing whether data can serve its purpose in a particular context (such as Big Data analytics, for example). This research addresses Big Data quality measurement modelling and automation by proposing a novel quality measurement framework for Big Data (MEGA) that objectively assesses the underlying quality characteristics of Big Data (also known as the V's of Big Data) at each step of the Big Data Pipelines. Five of the Big Data V's (Volume, Variety, Velocity, Veracity and Validity) are currently automated by the MEGA framework. In this paper, a new theoretically valid quality measurement model is proposed for an essential quality characteristic of Big Data, called Validity. The proposed measurement information model for Validity of Big Data is a hierarchy of 4 derived measures / indicators and 5 based measures. Validity measurement is illustrated on a running example.

查看原文本刊更多论文

大数据有效性的严格度量模型:MEGA方法

大数据正在成为工业界和学术界决策过程的重要组成部分，特别是在大数据可能对企业和社会产生深远影响的领域。然而，随着越来越多的数据被处理，由于底层数据缺乏可见性和透明度，数据质量正在成为一个真正的问题，它对我们构建的系统的可信度产生了负面影响。因此，在评估数据是否能够在特定环境中服务于其目的(例如，大数据分析)时，大数据质量测量变得越来越必要。本研究通过提出一种新的大数据质量测量框架(MEGA)来解决大数据质量测量建模和自动化问题，该框架在大数据管道的每个步骤中客观地评估大数据的潜在质量特征(也称为大数据的V)。五大大数据V (Volume, Variety, Velocity, Veracity和Validity)目前由MEGA框架自动化。本文针对大数据的本质质量特征——有效性，提出了一种新的理论上有效的质量度量模型。提出的大数据有效性度量信息模型是由4个派生度量/指标和5个基于度量组成的层次结构。通过运行实例说明了有效性度量。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 25th International Database Engineering & Applications Symposium

自引率

0.00%

发文量