SODA: A Set of Fast Oblivious Algorithms in Distributed Secure Data Analytics

Proc. VLDB Endow. Pub Date : 2023-03-01 DOI:10.14778/3587136.3587142

Xiang Li, Nuozhou Sun, Yunqian Luo, M. Gao

{"title":"SODA: A Set of Fast Oblivious Algorithms in Distributed Secure Data Analytics","authors":"Xiang Li, Nuozhou Sun, Yunqian Luo, M. Gao","doi":"10.14778/3587136.3587142","DOIUrl":null,"url":null,"abstract":"Cloud systems are now a prevalent platform to host large-scale big-data analytics applications such as machine learning and relational database. However, data privacy remains as a critical concern for public cloud systems. Existing trusted hardware could provide an isolated execution domain on an untrusted platform, but also suffers from access-pattern-based side channels at various levels including memory, disks, and networking. Oblivious algorithms can address these vulnerabilities by hiding the program data access patterns. Unfortunately, current oblivious algorithms for data analytics are limited to single-machine execution, only support simple operations, and/or suffer from significant performance overheads due to the use of expensive global sort and excessive data padding.\n In this work, we propose SODA, a set of efficient and oblivious algorithms for distributed data analytics operators, including filter, aggregate, and binary equi-join. To improve performance, SODA completely avoids the expensive oblivious global sort primitive, and minimizes the data padding overheads. SODA makes use of low-cost (pseudo-)random communication instead of expensive global sort to ensure uniform data traffic in oblivious filter and aggregate. It also adopts a novel two-level bin-packing approach in oblivious join to alleviate both input redistribution and join product skewness, thus minimizing necessary data padding. Compared to the state-of-the-art system, SODA not only extends the functionality but also improves the performance. It achieves 1.1× to 14.6× speedups on complex multi-operator data analytics workloads.","PeriodicalId":20467,"journal":{"name":"Proc. VLDB Endow.","volume":"31 1","pages":"1671-1684"},"PeriodicalIF":0.0000,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proc. VLDB Endow.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.14778/3587136.3587142","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Cloud systems are now a prevalent platform to host large-scale big-data analytics applications such as machine learning and relational database. However, data privacy remains as a critical concern for public cloud systems. Existing trusted hardware could provide an isolated execution domain on an untrusted platform, but also suffers from access-pattern-based side channels at various levels including memory, disks, and networking. Oblivious algorithms can address these vulnerabilities by hiding the program data access patterns. Unfortunately, current oblivious algorithms for data analytics are limited to single-machine execution, only support simple operations, and/or suffer from significant performance overheads due to the use of expensive global sort and excessive data padding. In this work, we propose SODA, a set of efficient and oblivious algorithms for distributed data analytics operators, including filter, aggregate, and binary equi-join. To improve performance, SODA completely avoids the expensive oblivious global sort primitive, and minimizes the data padding overheads. SODA makes use of low-cost (pseudo-)random communication instead of expensive global sort to ensure uniform data traffic in oblivious filter and aggregate. It also adopts a novel two-level bin-packing approach in oblivious join to alleviate both input redistribution and join product skewness, thus minimizing necessary data padding. Compared to the state-of-the-art system, SODA not only extends the functionality but also improves the performance. It achieves 1.1× to 14.6× speedups on complex multi-operator data analytics workloads.

查看原文本刊更多论文

SODA:分布式安全数据分析中的一组快速遗忘算法

云系统现在是托管大规模大数据分析应用程序(如机器学习和关系数据库)的流行平台。然而，数据隐私仍然是公共云系统的一个关键问题。现有的可信硬件可以在不可信的平台上提供隔离的执行域，但也会受到各种级别(包括内存、磁盘和网络)上基于访问模式的侧通道的影响。遗忘算法可以通过隐藏程序数据访问模式来解决这些漏洞。不幸的是，当前用于数据分析的遗忘算法仅限于单机执行，只支持简单的操作，并且/或者由于使用昂贵的全局排序和过多的数据填充而遭受显著的性能开销。在这项工作中，我们提出了SODA，这是一组用于分布式数据分析运算符的高效且无关紧要的算法，包括过滤，聚合和二进制等连接。为了提高性能，SODA完全避免了昂贵的遗忘全局排序原语，并最小化了数据填充开销。SODA利用低成本(伪)随机通信代替昂贵的全局排序，以确保在遗忘过滤和聚合中数据流量一致。在遗忘连接中采用了一种新颖的两级装箱方法，既减轻了输入重分配，又减轻了连接产品的偏度，从而最大限度地减少了必要的数据填充。与最先进的系统相比，SODA不仅扩展了功能，而且提高了性能。它在复杂的多操作员数据分析工作负载上实现了1.1到14.6倍的加速。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proc. VLDB Endow.

自引率

0.00%

发文量