Shang Liu, Hao Du, Yang Cao, Bo Yan, Jinfei Liu, Masatoshi Yoshikawa
{"title":"PGB: Benchmarking Differentially Private Synthetic Graph Generation Algorithms","authors":"Shang Liu, Hao Du, Yang Cao, Bo Yan, Jinfei Liu, Masatoshi Yoshikawa","doi":"arxiv-2408.02928","DOIUrl":"https://doi.org/arxiv-2408.02928","url":null,"abstract":"Differentially private graph analysis is a powerful tool for deriving\u0000insights from diverse graph data while protecting individual information.\u0000Designing private analytic algorithms for different graph queries often\u0000requires starting from scratch. In contrast, differentially private synthetic\u0000graph generation offers a general paradigm that supports one-time generation\u0000for multiple queries. Although a rich set of differentially private graph\u0000generation algorithms has been proposed, comparing them effectively remains\u0000challenging due to various factors, including differing privacy definitions,\u0000diverse graph datasets, varied privacy requirements, and multiple utility\u0000metrics. To this end, we propose PGB (Private Graph Benchmark), a comprehensive\u0000benchmark designed to enable researchers to compare differentially private\u0000graph generation algorithms fairly. We begin by identifying four essential\u0000elements of existing works as a 4-tuple: mechanisms, graph datasets, privacy\u0000requirements, and utility metrics. We discuss principles regarding these\u0000elements to ensure the comprehensiveness of a benchmark. Next, we present a\u0000benchmark instantiation that adheres to all principles, establishing a new\u0000method to evaluate existing and newly proposed graph generation algorithms.\u0000Through extensive theoretical and empirical analysis, we gain valuable insights\u0000into the strengths and weaknesses of prior algorithms. Our results indicate\u0000that there is no universal solution for all possible cases. Finally, we provide\u0000guidelines to help researchers select appropriate mechanisms for various\u0000scenarios.","PeriodicalId":501123,"journal":{"name":"arXiv - CS - Databases","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141938143","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xinwei Lin, Jing Zhao, Peng Di, Chuan Xiao, Rui Mao, Yan Ji, Makoto Onizuka, Zishuo Ding, Weiyi Shang, Jianbin Qin
{"title":"Automatic String Data Validation with Pattern Discovery","authors":"Xinwei Lin, Jing Zhao, Peng Di, Chuan Xiao, Rui Mao, Yan Ji, Makoto Onizuka, Zishuo Ding, Weiyi Shang, Jianbin Qin","doi":"arxiv-2408.03005","DOIUrl":"https://doi.org/arxiv-2408.03005","url":null,"abstract":"In enterprise data pipelines, data insertions occur periodically and may\u0000impact downstream services if data quality issues are not addressed. Typically,\u0000such problems can be investigated and fixed by on-call engineers, but locating\u0000the cause of such problems and fixing errors are often time-consuming.\u0000Therefore, automatic data validation is a better solution to defend the system\u0000and downstream services by enabling early detection of errors and providing\u0000detailed error messages for quick resolution. This paper proposes a\u0000self-validate data management system with automatic pattern discovery\u0000techniques to verify the correctness of semi-structural string data in\u0000enterprise data pipelines. Our solution extracts patterns from historical data\u0000and detects erroneous incoming data in a top-down fashion. High-level\u0000information of historical data is analyzed to discover the format skeleton of\u0000correct values. Fine-grained semantic patterns are then extracted to strike a\u0000balance between generalization and specification of the discovered pattern,\u0000thus covering as many correct values as possible while avoiding over-fitting.\u0000To tackle cold start and rapid data growth, we propose an incremental update\u0000strategy and example generalization strategy. Experiments on large-scale\u0000industrial and public datasets demonstrate the effectiveness and efficiency of\u0000our method compared to alternative solutions. Furthermore, a case study on an\u0000industrial platform (Ant Group Inc.) with thousands of applications shows that\u0000our system captures meaningful data patterns in daily operations and helps\u0000engineers quickly identify errors.","PeriodicalId":501123,"journal":{"name":"arXiv - CS - Databases","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141938147","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"NeurDB: On the Design and Implementation of an AI-powered Autonomous Database","authors":"Zhanhao Zhao, Shaofeng Cai, Haotian Gao, Hexiang Pan, Siqi Xiang, Naili Xing, Gang Chen, Beng Chin Ooi, Yanyan Shen, Yuncheng Wu, Meihui Zhang","doi":"arxiv-2408.03013","DOIUrl":"https://doi.org/arxiv-2408.03013","url":null,"abstract":"Databases are increasingly embracing AI to provide autonomous system\u0000optimization and intelligent in-database analytics, aiming to relieve end-user\u0000burdens across various industry sectors. Nonetheless, most existing approaches\u0000fail to account for the dynamic nature of databases, which renders them\u0000ineffective for real-world applications characterized by evolving data and\u0000workloads. This paper introduces NeurDB, an AI-powered autonomous database that\u0000deepens the fusion of AI and databases with adaptability to data and workload\u0000drift. NeurDB establishes a new in-database AI ecosystem that seamlessly\u0000integrates AI workflows within the database. This integration enables efficient\u0000and effective in-database AI analytics and fast-adaptive learned system\u0000components. Empirical evaluations demonstrate that NeurDB substantially\u0000outperforms existing solutions in managing AI analytics tasks, with the\u0000proposed learned components more effectively handling environmental dynamism\u0000than state-of-the-art approaches.","PeriodicalId":501123,"journal":{"name":"arXiv - CS - Databases","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141938142","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Self-Enhancing Video Data Management System for Compositional Events with Large Language Models [Technical Report]","authors":"Enhao Zhang, Nicole Sullivan, Brandon Haynes, Ranjay Krishna, Magdalena Balazinska","doi":"arxiv-2408.02243","DOIUrl":"https://doi.org/arxiv-2408.02243","url":null,"abstract":"Complex video queries can be answered by decomposing them into modular\u0000subtasks. However, existing video data management systems assume the existence\u0000of predefined modules for each subtask. We introduce VOCAL-UDF, a novel\u0000self-enhancing system that supports compositional queries over videos without\u0000the need for predefined modules. VOCAL-UDF automatically identifies and\u0000constructs missing modules and encapsulates them as user-defined functions\u0000(UDFs), thus expanding its querying capabilities. To achieve this, we formulate\u0000a unified UDF model that leverages large language models (LLMs) to aid in new\u0000UDF generation. VOCAL-UDF handles a wide range of concepts by supporting both\u0000program-based UDFs (i.e., Python functions generated by LLMs) and\u0000distilled-model UDFs (lightweight vision models distilled from strong\u0000pretrained models). To resolve the inherent ambiguity in user intent, VOCAL-UDF\u0000generates multiple candidate UDFs and uses active learning to efficiently\u0000select the best one. With the self-enhancing capability, VOCAL-UDF\u0000significantly improves query performance across three video datasets.","PeriodicalId":501123,"journal":{"name":"arXiv - CS - Databases","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141938145","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yiyan Li, Haoyang Li, Zhao Pu, Jing Zhang, Xinyi Zhang, Tao Ji, Luming Sun, Cuiping Li, Hong Chen
{"title":"Is Large Language Model Good at Database Knob Tuning? A Comprehensive Experimental Evaluation","authors":"Yiyan Li, Haoyang Li, Zhao Pu, Jing Zhang, Xinyi Zhang, Tao Ji, Luming Sun, Cuiping Li, Hong Chen","doi":"arxiv-2408.02213","DOIUrl":"https://doi.org/arxiv-2408.02213","url":null,"abstract":"Knob tuning plays a crucial role in optimizing databases by adjusting knobs\u0000to enhance database performance. However, traditional tuning methods often\u0000follow a Try-Collect-Adjust approach, proving inefficient and\u0000database-specific. Moreover, these methods are often opaque, making it\u0000challenging for DBAs to grasp the underlying decision-making process. The emergence of large language models (LLMs) like GPT-4 and Claude-3 has\u0000excelled in complex natural language tasks, yet their potential in database\u0000knob tuning remains largely unexplored. This study harnesses LLMs as\u0000experienced DBAs for knob-tuning tasks with carefully designed prompts. We\u0000identify three key subtasks in the tuning system: knob pruning, model\u0000initialization, and knob recommendation, proposing LLM-driven solutions to\u0000replace conventional methods for each subtask. We conduct extensive experiments to compare LLM-driven approaches against\u0000traditional methods across the subtasks to evaluate LLMs' efficacy in the knob\u0000tuning domain. Furthermore, we explore the adaptability of LLM-based solutions\u0000in diverse evaluation settings, encompassing new benchmarks, database engines,\u0000and hardware environments. Our findings reveal that LLMs not only match or\u0000surpass traditional methods but also exhibit notable interpretability by\u0000generating responses in a coherent ``chain-of-thought'' manner. We further\u0000observe that LLMs exhibit remarkable generalizability through simple\u0000adjustments in prompts, eliminating the necessity for additional training or\u0000extensive code modifications. Drawing insights from our experimental findings, we identify several\u0000opportunities for future research aimed at advancing the utilization of LLMs in\u0000the realm of database management.","PeriodicalId":501123,"journal":{"name":"arXiv - CS - Databases","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141938146","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Masayoshi KozaiPolar Environment Data Science Center, Joint Support-Center for Data Science Research, Research Organization of Information and Systems, Tachikawa, Japan, Yoshimasa TanakaPolar Environment Data Science Center, Joint Support-Center for Data Science Research, Research Organization of Information and Systems, Tachikawa, JapanNational Institute of Polar Research, Research Organization of Information and Systems, Tachikawa, Japan, Shuji AbeInternational Research Center for Space and Planetary Environmental Science, Kyushu University, Fukuoka, Japan, Yasuyuki MinamiyamaResearch Center for Open Science and Data Platform, National Institute of Informatics, Research Organization of Information and Systems, Tokyo, Japan, Atsuki ShinboriInstitute for Space and Earth Environmental Research Center for Integrated Data Science, Nagoya University, Nagoya, Japan, Akira KadokuraPolar Environment Data Science Center, Joint Support-Center for Data Science Research, Research Organization of Information and Systems, Tachikawa, Japan
{"title":"AMIDER: A Multidisciplinary Research Database and Its Application to Promote Open Science","authors":"Masayoshi KozaiPolar Environment Data Science Center, Joint Support-Center for Data Science Research, Research Organization of Information and Systems, Tachikawa, Japan, Yoshimasa TanakaPolar Environment Data Science Center, Joint Support-Center for Data Science Research, Research Organization of Information and Systems, Tachikawa, JapanNational Institute of Polar Research, Research Organization of Information and Systems, Tachikawa, Japan, Shuji AbeInternational Research Center for Space and Planetary Environmental Science, Kyushu University, Fukuoka, Japan, Yasuyuki MinamiyamaResearch Center for Open Science and Data Platform, National Institute of Informatics, Research Organization of Information and Systems, Tokyo, Japan, Atsuki ShinboriInstitute for Space and Earth Environmental Research Center for Integrated Data Science, Nagoya University, Nagoya, Japan, Akira KadokuraPolar Environment Data Science Center, Joint Support-Center for Data Science Research, Research Organization of Information and Systems, Tachikawa, Japan","doi":"arxiv-2408.02246","DOIUrl":"https://doi.org/arxiv-2408.02246","url":null,"abstract":"The AMIDER, Advanced Multidisciplinary Integrated-Database for Exploring new\u0000Research, is a newly developed research data catalog to demonstrate an advanced\u0000database application. AMIDER is characterized as a multidisciplinary database\u0000equipped with a user-friendly web application. Its catalog view displays\u0000diverse research data at once beyond any limitation of each individual\u0000discipline. Some useful functions, such as a selectable data download, data\u0000format conversion, and display of data visual information, are also\u0000implemented. Further advanced functions, such as visualization of dataset\u0000mutual relationship, are also implemented as a preliminary trial. These\u0000characteristics and functions are expected to enhance the accessibility to\u0000individual research data, even from non-expertized users, and be helpful for\u0000collaborations among diverse scientific fields beyond individual disciplines.\u0000Multidisciplinary data management is also one of AMIDER's uniqueness, where\u0000various metadata schemas can be mapped to a uniform metadata table, and\u0000standardized and self-describing data formats are adopted. AMIDER website\u0000(https://amider.rois.ac.jp/) had been launched in April 2024. As of July 2024,\u0000over 15,000 metadata in various research fields of polar science have been\u0000registered in the database, and approximately 500 visitors are viewing the\u0000website every day on average. Expansion of the database to further\u0000multidisciplinary scientific fields, not only polar science, is planned, and\u0000advanced attempts, such as applying Natural Language Processing (NLP) to\u0000metadata, have also been considered.","PeriodicalId":501123,"journal":{"name":"arXiv - CS - Databases","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141938144","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
David Montero, Guido Kraemer, Anca Anghelea, César Aybar, Gunnar Brandt, Gustau Camps-Valls, Felix Cremer, Ida Flik, Fabian Gans, Sarah Habershon, Chaonan Ji, Teja Kattenborn, Laura Martínez-Ferrer, Francesco Martinuzzi, Martin Reinhardt, Maximilian Söchting, Khalil Teber, Miguel D. Mahecha
{"title":"Earth System Data Cubes: Avenues for advancing Earth system research","authors":"David Montero, Guido Kraemer, Anca Anghelea, César Aybar, Gunnar Brandt, Gustau Camps-Valls, Felix Cremer, Ida Flik, Fabian Gans, Sarah Habershon, Chaonan Ji, Teja Kattenborn, Laura Martínez-Ferrer, Francesco Martinuzzi, Martin Reinhardt, Maximilian Söchting, Khalil Teber, Miguel D. Mahecha","doi":"arxiv-2408.02348","DOIUrl":"https://doi.org/arxiv-2408.02348","url":null,"abstract":"Recent advancements in Earth system science have been marked by the\u0000exponential increase in the availability of diverse, multivariate datasets\u0000characterised by moderate to high spatio-temporal resolutions. Earth System\u0000Data Cubes (ESDCs) have emerged as one suitable solution for transforming this\u0000flood of data into a simple yet robust data structure. ESDCs achieve this by\u0000organising data into an analysis-ready format aligned with a spatio-temporal\u0000grid, facilitating user-friendly analysis and diminishing the need for\u0000extensive technical data processing knowledge. Despite these significant\u0000benefits, the completion of the entire ESDC life cycle remains a challenging\u0000task. Obstacles are not only of a technical nature but also relate to\u0000domain-specific problems in Earth system research. There exist barriers to\u0000realising the full potential of data collections in light of novel cloud-based\u0000technologies, particularly in curating data tailored for specific application\u0000domains. These include transforming data to conform to a spatio-temporal grid\u0000with minimum distortions and managing complexities such as spatio-temporal\u0000autocorrelation issues. Addressing these challenges is pivotal for the\u0000effective application of Artificial Intelligence (AI) approaches. Furthermore,\u0000adhering to open science principles for data dissemination, reproducibility,\u0000visualisation, and reuse is crucial for fostering sustainable research.\u0000Overcoming these challenges offers a substantial opportunity to advance\u0000data-driven Earth system research, unlocking the full potential of an\u0000integrated, multidimensional view of Earth system processes. This is\u0000particularly true when such research is coupled with innovative research\u0000paradigms and technological progress.","PeriodicalId":501123,"journal":{"name":"arXiv - CS - Databases","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141938149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Mining Path Association Rules in Large Property Graphs (with Appendix)","authors":"Yuya Sasaki, Panagiotis Karras","doi":"arxiv-2408.02029","DOIUrl":"https://doi.org/arxiv-2408.02029","url":null,"abstract":"How can we mine frequent path regularities from a graph with edge labels and\u0000vertex attributes? The task of association rule mining successfully discovers\u0000regular patterns in item sets and substructures. Still, to our best knowledge,\u0000this concept has not yet been extended to path patterns in large property\u0000graphs. In this paper, we introduce the problem of path association rule mining\u0000(PARM). Applied to any emph{reachability path} between two vertices within a\u0000large graph, PARM discovers regular ways in which path patterns, identified by\u0000vertex attributes and edge labels, co-occur with each other. We develop an\u0000efficient and scalable algorithm PIONEER that exploits an anti-monotonicity\u0000property to effectively prune the search space. Further, we devise\u0000approximation techniques and employ parallelization to achieve scalable path\u0000association rule mining. Our experimental study using real-world graph data\u0000verifies the significance of path association rules and the efficiency of our\u0000solutions.","PeriodicalId":501123,"journal":{"name":"arXiv - CS - Databases","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141938148","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Marcelo Arenas, Timo Camillo Merkl, Reinhard Pichler, Cristian Riveros
{"title":"Towards Tractability of the Diversity of Query Answers: Ultrametrics to the Rescue","authors":"Marcelo Arenas, Timo Camillo Merkl, Reinhard Pichler, Cristian Riveros","doi":"arxiv-2408.01657","DOIUrl":"https://doi.org/arxiv-2408.01657","url":null,"abstract":"The set of answers to a query may be very large, potentially overwhelming\u0000users when presented with the entire set. In such cases, presenting only a\u0000small subset of the answers to the user may be preferable. A natural\u0000requirement for this subset is that it should be as diverse as possible to\u0000reflect the variety of the entire population. To achieve this, the diversity of\u0000a subset is measured using a metric that determines how different two solutions\u0000are and a diversity function that extends this metric from pairs to sets. In\u0000the past, several studies have shown that finding a diverse subset from an\u0000explicitly given set is intractable even for simple metrics (like Hamming\u0000distance) and simple diversity functions (like summing all pairwise distances).\u0000This complexity barrier becomes even more challenging when trying to output a\u0000diverse subset from a set that is only implicitly given such as the query\u0000answers of a query and a database. Until now, tractable cases have been found\u0000only for restricted problems and particular diversity functions. To overcome these limitations, we focus on the notion of ultrametrics, which\u0000have been widely studied and used in many applications. Starting from any\u0000ultrametric $d$ and a diversity function $delta$ extending $d$, we provide\u0000sufficient conditions over $delta$ for having polynomial-time algorithms to\u0000construct diverse answers. To the best of our knowledge, these conditions are\u0000satisfied by all diversity functions considered in the literature. Moreover, we\u0000complement these results with lower bounds that show specific cases when these\u0000conditions are not satisfied and finding diverse subsets becomes intractable.\u0000We conclude by applying these results to the evaluation of conjunctive queries,\u0000demonstrating efficient algorithms for finding a diverse subset of solutions\u0000for acyclic conjunctive queries when the attribute order is used to measure\u0000diversity.","PeriodicalId":501123,"journal":{"name":"arXiv - CS - Databases","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141938254","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Controlling Dataflows with a Bolt-on Data Escrow","authors":"Zhiru Zhu, Raul Castro Fernandez","doi":"arxiv-2408.01580","DOIUrl":"https://doi.org/arxiv-2408.01580","url":null,"abstract":"The data-driven economy has created tremendous value in our society.\u0000Individuals share their data with platforms in exchange for services such as\u0000search, social networks, and health recommendations. Platforms use the data to\u0000provide those services and create other revenue-generating opportunities, e.g.,\u0000selling the data to data brokers. With the ever-expanding data economy comes\u0000the growing concern about potential data misuse. While most platforms give\u0000individuals certain control over their data (i.e., what data is being shared),\u0000individuals do not know how the data will be used once shared; they cannot\u0000control the purpose. In this paper, we introduce a data escrow design that permits individuals to\u0000observe all dataflows - not just what is shared but for what purpose. Rather\u0000than data flowing to the platform, the platform delegates their computation to\u0000the escrow, where individuals can observe and manage their data. To make the\u0000data escrow practical, we design and implement a prototype that works alongside\u0000the Apple ecosystem; specifically, we retrofit the Apple SDKs with a\u0000programming interface to enable delegated computation. Our solution does not\u0000depend on Apple's software and can be applied to other platforms, but building\u0000for Apple lets us study the main hypothesis of our work: whether such a data\u0000escrow solution is a feasible alternative to today's data governance. We show\u0000that our escrow prototype implementation is efficient, and we analyze the\u0000dataflows in real-world apps and show that the escrow's programming interface\u0000supports implementing a wide range of dataflows.","PeriodicalId":501123,"journal":{"name":"arXiv - CS - Databases","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141969048","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}