摆脱数据

Journal of Data and Information Quality (JDIQ) Pub Date : 2019-11-11 DOI:10.1145/3326920

T. Milo

{"title":"摆脱数据","authors":"T. Milo","doi":"10.1145/3326920","DOIUrl":null,"url":null,"abstract":"We are experiencing an amazing data-centered revolution. Incredible amounts of data are collected, integrated, and analyzed, leading to key breakthroughs in science and society. This well of knowledge, however, is at a great risk if we do not dispense with some of the data flood. First, the amount of generated data grows exponentially and already at 2020 is expected to be more than twice the available storage. Second, even disregarding storage constraints, uncontrolled data retention risks privacy and security, as recognized, e.g., by the recent EU Data Protection reform. Data disposal policies must be developed to benefit and protect organizations and individuals. Retaining the knowledge hidden in the data while respecting storage, processing, and regulatory constraints is a great challenge. The difficulty stems from the distinct, intricate requirements entailed by each type of constraint, the scale and velocity of data, and the constantly evolving needs. While multiple data sketching, summarization, and deletion techniques were developed to address specific aspects of the problem, we are still very far from a comprehensive solution. Every organization has to battle the same tough challenges with ad hoc solutions that are application-specific and rarely sharable. In this article, we will discuss the logical, algorithmic, and methodological foundations required for the systematic disposal of large-scale data, for constraints enforcement and for the development of applications over the retained information. In particular, we will overview relevant related work, highlighting new research challenges and potential reuse of existing techniques.","PeriodicalId":15582,"journal":{"name":"Journal of Data and Information Quality (JDIQ)","volume":"10 1","pages":"1 - 7"},"PeriodicalIF":0.0000,"publicationDate":"2019-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":"{\"title\":\"Getting Rid of Data\",\"authors\":\"T. Milo\",\"doi\":\"10.1145/3326920\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We are experiencing an amazing data-centered revolution. Incredible amounts of data are collected, integrated, and analyzed, leading to key breakthroughs in science and society. This well of knowledge, however, is at a great risk if we do not dispense with some of the data flood. First, the amount of generated data grows exponentially and already at 2020 is expected to be more than twice the available storage. Second, even disregarding storage constraints, uncontrolled data retention risks privacy and security, as recognized, e.g., by the recent EU Data Protection reform. Data disposal policies must be developed to benefit and protect organizations and individuals. Retaining the knowledge hidden in the data while respecting storage, processing, and regulatory constraints is a great challenge. The difficulty stems from the distinct, intricate requirements entailed by each type of constraint, the scale and velocity of data, and the constantly evolving needs. While multiple data sketching, summarization, and deletion techniques were developed to address specific aspects of the problem, we are still very far from a comprehensive solution. Every organization has to battle the same tough challenges with ad hoc solutions that are application-specific and rarely sharable. In this article, we will discuss the logical, algorithmic, and methodological foundations required for the systematic disposal of large-scale data, for constraints enforcement and for the development of applications over the retained information. In particular, we will overview relevant related work, highlighting new research challenges and potential reuse of existing techniques.\",\"PeriodicalId\":15582,\"journal\":{\"name\":\"Journal of Data and Information Quality (JDIQ)\",\"volume\":\"10 1\",\"pages\":\"1 - 7\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-11-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"11\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Data and Information Quality (JDIQ)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3326920\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Data and Information Quality (JDIQ)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3326920","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 11

摘要

我们正在经历一场惊人的以数据为中心的革命。大量的数据被收集、整合和分析，导致科学和社会的重大突破。然而，如果我们不放弃一些数据洪流，这口知识之井将面临巨大的风险。首先，生成的数据量呈指数级增长，预计到2020年将超过可用存储的两倍。其次，正如最近欧盟数据保护改革所认识到的那样，即使不考虑存储约束，不受控制的数据保留也会给隐私和安全带来风险。必须制定数据处理策略以使组织和个人受益并得到保护。在尊重存储、处理和监管约束的同时保留隐藏在数据中的知识是一个巨大的挑战。这种困难源于每种类型的约束所带来的独特而复杂的需求、数据的规模和速度以及不断变化的需求。虽然开发了多种数据草图、摘要和删除技术来解决问题的特定方面，但我们离全面的解决方案还很远。每个组织都必须使用特定于应用程序且很少共享的临时解决方案来应对同样严峻的挑战。在本文中，我们将讨论系统地处理大规模数据、执行约束以及在保留的信息上开发应用程序所需的逻辑、算法和方法基础。特别是，我们将概述相关工作，强调新的研究挑战和现有技术的潜在重用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Getting Rid of Data

We are experiencing an amazing data-centered revolution. Incredible amounts of data are collected, integrated, and analyzed, leading to key breakthroughs in science and society. This well of knowledge, however, is at a great risk if we do not dispense with some of the data flood. First, the amount of generated data grows exponentially and already at 2020 is expected to be more than twice the available storage. Second, even disregarding storage constraints, uncontrolled data retention risks privacy and security, as recognized, e.g., by the recent EU Data Protection reform. Data disposal policies must be developed to benefit and protect organizations and individuals. Retaining the knowledge hidden in the data while respecting storage, processing, and regulatory constraints is a great challenge. The difficulty stems from the distinct, intricate requirements entailed by each type of constraint, the scale and velocity of data, and the constantly evolving needs. While multiple data sketching, summarization, and deletion techniques were developed to address specific aspects of the problem, we are still very far from a comprehensive solution. Every organization has to battle the same tough challenges with ad hoc solutions that are application-specific and rarely sharable. In this article, we will discuss the logical, algorithmic, and methodological foundations required for the systematic disposal of large-scale data, for constraints enforcement and for the development of applications over the retained information. In particular, we will overview relevant related work, highlighting new research challenges and potential reuse of existing techniques.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Data and Information Quality (JDIQ)

自引率

0.00%

发文量