Rigorous Measurement Model for Validity of Big Data: MEGA Approach

Dave Bhardwaj, O. Ormandjieva
{"title":"Rigorous Measurement Model for Validity of Big Data: MEGA Approach","authors":"Dave Bhardwaj, O. Ormandjieva","doi":"10.1145/3472163.3472171","DOIUrl":null,"url":null,"abstract":"Big Data is becoming a substantial part of the decision-making processes in both industry and academia, especially in areas where Big Data may have a profound impact on businesses and society. However, as more data is being processed, data quality is becoming a genuine issue that negatively affects credibility of the systems we build because of the lack of visibility and transparency of the underlying data. Therefore, Big Data quality measurement is becoming increasingly necessary in assessing whether data can serve its purpose in a particular context (such as Big Data analytics, for example). This research addresses Big Data quality measurement modelling and automation by proposing a novel quality measurement framework for Big Data (MEGA) that objectively assesses the underlying quality characteristics of Big Data (also known as the V's of Big Data) at each step of the Big Data Pipelines. Five of the Big Data V's (Volume, Variety, Velocity, Veracity and Validity) are currently automated by the MEGA framework. In this paper, a new theoretically valid quality measurement model is proposed for an essential quality characteristic of Big Data, called Validity. The proposed measurement information model for Validity of Big Data is a hierarchy of 4 derived measures / indicators and 5 based measures. Validity measurement is illustrated on a running example.","PeriodicalId":242683,"journal":{"name":"Proceedings of the 25th International Database Engineering & Applications Symposium","volume":"21 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 25th International Database Engineering & Applications Symposium","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3472163.3472171","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Big Data is becoming a substantial part of the decision-making processes in both industry and academia, especially in areas where Big Data may have a profound impact on businesses and society. However, as more data is being processed, data quality is becoming a genuine issue that negatively affects credibility of the systems we build because of the lack of visibility and transparency of the underlying data. Therefore, Big Data quality measurement is becoming increasingly necessary in assessing whether data can serve its purpose in a particular context (such as Big Data analytics, for example). This research addresses Big Data quality measurement modelling and automation by proposing a novel quality measurement framework for Big Data (MEGA) that objectively assesses the underlying quality characteristics of Big Data (also known as the V's of Big Data) at each step of the Big Data Pipelines. Five of the Big Data V's (Volume, Variety, Velocity, Veracity and Validity) are currently automated by the MEGA framework. In this paper, a new theoretically valid quality measurement model is proposed for an essential quality characteristic of Big Data, called Validity. The proposed measurement information model for Validity of Big Data is a hierarchy of 4 derived measures / indicators and 5 based measures. Validity measurement is illustrated on a running example.
大数据有效性的严格度量模型:MEGA方法
大数据正在成为工业界和学术界决策过程的重要组成部分,特别是在大数据可能对企业和社会产生深远影响的领域。然而,随着越来越多的数据被处理,由于底层数据缺乏可见性和透明度,数据质量正在成为一个真正的问题,它对我们构建的系统的可信度产生了负面影响。因此,在评估数据是否能够在特定环境中服务于其目的(例如,大数据分析)时,大数据质量测量变得越来越必要。本研究通过提出一种新的大数据质量测量框架(MEGA)来解决大数据质量测量建模和自动化问题,该框架在大数据管道的每个步骤中客观地评估大数据的潜在质量特征(也称为大数据的V)。五大大数据V (Volume, Variety, Velocity, Veracity和Validity)目前由MEGA框架自动化。本文针对大数据的本质质量特征——有效性,提出了一种新的理论上有效的质量度量模型。提出的大数据有效性度量信息模型是由4个派生度量/指标和5个基于度量组成的层次结构。通过运行实例说明了有效性度量。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信