{"title":"可编程数据流:数据共享的抽象和编程模型","authors":"Siyuan Xia, Chris Zhu, Tapan Srivastava, Bridget Fahey, Raul Castro Fernandez","doi":"arxiv-2408.04092","DOIUrl":null,"url":null,"abstract":"Data sharing is central to a wide variety of applications such as fraud\ndetection, ad matching, and research. The lack of data sharing abstractions\nmakes the solution to each data sharing problem bespoke and cost-intensive,\nhampering value generation. In this paper, we first introduce a data sharing\nmodel to represent every data sharing problem with a sequence of dataflows.\nFrom the model, we distill an abstraction, the contract, which agents use to\ncommunicate the intent of a dataflow and evaluate its consequences, before the\ndataflow takes place. This helps agents move towards a common sharing goal\nwithout violating any regulatory and privacy constraints. Then, we design and\nimplement the contract programming model (CPM), which allows agents to program\ndata sharing applications catered to each problem's needs. Contracts permit data sharing, but their interactive nature may introduce\ninefficiencies. To mitigate those inefficiencies, we extend the CPM so that it\ncan save intermediate outputs of dataflows, and skip computation if a dataflow\ntries to access data that it does not have access to. In our evaluation, we\nshow that 1) the contract abstraction is general enough to represent a wide\nrange of sharing problems, 2) we can write programs for complex data sharing\nproblems and exhibit qualitative improvements over other alternate\ntechnologies, and 3) quantitatively, our optimizations make sharing programs\nwritten with the CPM efficient.","PeriodicalId":501123,"journal":{"name":"arXiv - CS - Databases","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Programmable Dataflows: Abstraction and Programming Model for Data Sharing\",\"authors\":\"Siyuan Xia, Chris Zhu, Tapan Srivastava, Bridget Fahey, Raul Castro Fernandez\",\"doi\":\"arxiv-2408.04092\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Data sharing is central to a wide variety of applications such as fraud\\ndetection, ad matching, and research. The lack of data sharing abstractions\\nmakes the solution to each data sharing problem bespoke and cost-intensive,\\nhampering value generation. In this paper, we first introduce a data sharing\\nmodel to represent every data sharing problem with a sequence of dataflows.\\nFrom the model, we distill an abstraction, the contract, which agents use to\\ncommunicate the intent of a dataflow and evaluate its consequences, before the\\ndataflow takes place. This helps agents move towards a common sharing goal\\nwithout violating any regulatory and privacy constraints. Then, we design and\\nimplement the contract programming model (CPM), which allows agents to program\\ndata sharing applications catered to each problem's needs. Contracts permit data sharing, but their interactive nature may introduce\\ninefficiencies. To mitigate those inefficiencies, we extend the CPM so that it\\ncan save intermediate outputs of dataflows, and skip computation if a dataflow\\ntries to access data that it does not have access to. In our evaluation, we\\nshow that 1) the contract abstraction is general enough to represent a wide\\nrange of sharing problems, 2) we can write programs for complex data sharing\\nproblems and exhibit qualitative improvements over other alternate\\ntechnologies, and 3) quantitatively, our optimizations make sharing programs\\nwritten with the CPM efficient.\",\"PeriodicalId\":501123,\"journal\":{\"name\":\"arXiv - CS - Databases\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-08-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Databases\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2408.04092\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Databases","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.04092","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Programmable Dataflows: Abstraction and Programming Model for Data Sharing
Data sharing is central to a wide variety of applications such as fraud
detection, ad matching, and research. The lack of data sharing abstractions
makes the solution to each data sharing problem bespoke and cost-intensive,
hampering value generation. In this paper, we first introduce a data sharing
model to represent every data sharing problem with a sequence of dataflows.
From the model, we distill an abstraction, the contract, which agents use to
communicate the intent of a dataflow and evaluate its consequences, before the
dataflow takes place. This helps agents move towards a common sharing goal
without violating any regulatory and privacy constraints. Then, we design and
implement the contract programming model (CPM), which allows agents to program
data sharing applications catered to each problem's needs. Contracts permit data sharing, but their interactive nature may introduce
inefficiencies. To mitigate those inefficiencies, we extend the CPM so that it
can save intermediate outputs of dataflows, and skip computation if a dataflow
tries to access data that it does not have access to. In our evaluation, we
show that 1) the contract abstraction is general enough to represent a wide
range of sharing problems, 2) we can write programs for complex data sharing
problems and exhibit qualitative improvements over other alternate
technologies, and 3) quantitatively, our optimizations make sharing programs
written with the CPM efficient.