{"title":"An Introduction to Dynamic Data Quality Challenges","authors":"Alan G. Labouseur, C. Matheus","doi":"10.1145/2998575","DOIUrl":null,"url":null,"abstract":"We live in an evolving world. As time passes, data changes in content and structure, and thus becomes dynamic. Data quality, therefore, also becomes dynamic because it is an aggregate characteristic of data itself. Thus, our evolving world and Internet of Things (IoT) presents renewed challenges in data quality. IoT data is teeming with multivendor and multiprovider applications, devices, microservices, and automated processes built on social media, public and private datasets, digitized records, sensor logs, web logs, and much more. From intelligent traffic systems to smart healthcare devices, modern enterprises are inundated with a daily deluge of dynamic big data. The primary characteristics of big data are volume, velocity, and variety [Abadi et al. 2014]. Techniques for managing volume and velocity have been under development for decades. While some work has been done on variety, integrating and analyzing data from diverse sources and formats still presents challenges. For example, much of the big data deluge is structured and much of it is not. This single dimension of variety inherent in today’s IoT clearly illustrates there is no “silver bullet” and one size does not fit all [Abadi et al. 2014; Stonebraker and Cetintemel 2005, 2015]. It is important to note there are many other dimensions of variety beyond structure. We must consider possibilities arising from analyzing data in a dizzying range of data types found in varying time frames of differing granularity from diverse sources in our evolving and streaming world. Structure is but one example illustrative of many more general challenges that we use in this article to introduce dynamic data quality.","PeriodicalId":15582,"journal":{"name":"Journal of Data and Information Quality (JDIQ)","volume":"35 1","pages":"1 - 3"},"PeriodicalIF":0.0000,"publicationDate":"2017-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Data and Information Quality (JDIQ)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2998575","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 11
Abstract
We live in an evolving world. As time passes, data changes in content and structure, and thus becomes dynamic. Data quality, therefore, also becomes dynamic because it is an aggregate characteristic of data itself. Thus, our evolving world and Internet of Things (IoT) presents renewed challenges in data quality. IoT data is teeming with multivendor and multiprovider applications, devices, microservices, and automated processes built on social media, public and private datasets, digitized records, sensor logs, web logs, and much more. From intelligent traffic systems to smart healthcare devices, modern enterprises are inundated with a daily deluge of dynamic big data. The primary characteristics of big data are volume, velocity, and variety [Abadi et al. 2014]. Techniques for managing volume and velocity have been under development for decades. While some work has been done on variety, integrating and analyzing data from diverse sources and formats still presents challenges. For example, much of the big data deluge is structured and much of it is not. This single dimension of variety inherent in today’s IoT clearly illustrates there is no “silver bullet” and one size does not fit all [Abadi et al. 2014; Stonebraker and Cetintemel 2005, 2015]. It is important to note there are many other dimensions of variety beyond structure. We must consider possibilities arising from analyzing data in a dizzying range of data types found in varying time frames of differing granularity from diverse sources in our evolving and streaming world. Structure is but one example illustrative of many more general challenges that we use in this article to introduce dynamic data quality.
我们生活在一个不断发展的世界。随着时间的推移,数据的内容和结构会发生变化,从而变得动态。因此,数据质量也变得动态,因为它是数据本身的集合特征。因此,我们不断发展的世界和物联网(IoT)在数据质量方面提出了新的挑战。物联网数据充满了多供应商和多提供商的应用程序、设备、微服务,以及建立在社交媒体、公共和私有数据集、数字化记录、传感器日志、web日志等基础上的自动化流程。从智能交通系统到智能医疗设备,现代企业每天都被海量的动态大数据所淹没。大数据的主要特征是数量、速度和多样性[Abadi et al. 2014]。管理体积和速度的技术已经发展了几十年。虽然在多样性方面已经做了一些工作,但整合和分析来自不同来源和格式的数据仍然存在挑战。例如,大量的大数据有很多是结构化的,也有很多不是。当今物联网固有的单一维度多样性清楚地表明,没有“银弹”,一种尺寸并不适合所有[Abadi et al. 2014;Stonebraker and Cetintemel 2005,2015]。重要的是要注意,除了结构之外,还有许多其他方面的变化。我们必须考虑在我们不断发展和流动的世界中,在不同时间框架、不同粒度的不同来源中分析令人眼花缭乱的数据类型所产生的可能性。结构只是我们在本文中用来介绍动态数据质量的许多常见挑战的一个例子。