Apache SystemDS中的联邦数据准备，学习和调试

Proceedings of the 31st ACM International Conference on Information & Knowledge Management Pub Date : 2022-10-17 DOI:10.1145/3511808.3557162

Sebastian Baunsgaard, Matthias Boehm, Kevin Innerebner, Mito Kehayov, F. Lackner, Olga Ovcharenko, Arnab Phani, Tobias Rieger, David Weissteiner, Sebastian Benjamin Wrede

{"title":"Apache SystemDS中的联邦数据准备，学习和调试","authors":"Sebastian Baunsgaard, Matthias Boehm, Kevin Innerebner, Mito Kehayov, F. Lackner, Olga Ovcharenko, Arnab Phani, Tobias Rieger, David Weissteiner, Sebastian Benjamin Wrede","doi":"10.1145/3511808.3557162","DOIUrl":null,"url":null,"abstract":"Federated learning allows training machine learning (ML) models without central consolidation of the raw data. Variants of such federated learning systems enable privacy-preserving ML, and address data ownership and/or sharing constraints. However, existing work mostly adopt data-parallel parameter-server architectures for mini-batch training, require manual construction of federated runtime plans, and largely ignore the broad variety of data preparation, ML algorithms, and model debugging. Over the last years, we extended Apache SystemDS by an additional federated runtime backend for federated linear-algebra programs, federated parameter servers, and federated data preparation. In this paper, we share the system-level compiler and runtime integration, new features such as multi-tenant federated learning, selected federated primitives, multi-key homomorphic encryption, and our monitoring infrastructure. Our demonstrator showcases how composite ML pipelines can be compiled into federated runtime plans with low overhead.","PeriodicalId":389624,"journal":{"name":"Proceedings of the 31st ACM International Conference on Information & Knowledge Management","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Federated Data Preparation, Learning, and Debugging in Apache SystemDS\",\"authors\":\"Sebastian Baunsgaard, Matthias Boehm, Kevin Innerebner, Mito Kehayov, F. Lackner, Olga Ovcharenko, Arnab Phani, Tobias Rieger, David Weissteiner, Sebastian Benjamin Wrede\",\"doi\":\"10.1145/3511808.3557162\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Federated learning allows training machine learning (ML) models without central consolidation of the raw data. Variants of such federated learning systems enable privacy-preserving ML, and address data ownership and/or sharing constraints. However, existing work mostly adopt data-parallel parameter-server architectures for mini-batch training, require manual construction of federated runtime plans, and largely ignore the broad variety of data preparation, ML algorithms, and model debugging. Over the last years, we extended Apache SystemDS by an additional federated runtime backend for federated linear-algebra programs, federated parameter servers, and federated data preparation. In this paper, we share the system-level compiler and runtime integration, new features such as multi-tenant federated learning, selected federated primitives, multi-key homomorphic encryption, and our monitoring infrastructure. Our demonstrator showcases how composite ML pipelines can be compiled into federated runtime plans with low overhead.\",\"PeriodicalId\":389624,\"journal\":{\"name\":\"Proceedings of the 31st ACM International Conference on Information & Knowledge Management\",\"volume\":\"11 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-10-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 31st ACM International Conference on Information & Knowledge Management\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3511808.3557162\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 31st ACM International Conference on Information & Knowledge Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3511808.3557162","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

联邦学习允许在不集中整合原始数据的情况下训练机器学习(ML)模型。这种联邦学习系统的变体支持隐私保护ML，并解决数据所有权和/或共享约束。然而，现有的工作大多采用数据并行参数服务器架构进行小批量训练，需要手动构建联邦运行时计划，并且在很大程度上忽略了各种各样的数据准备、ML算法和模型调试。在过去的几年中，我们通过一个额外的联邦运行时后端扩展了Apache SystemDS，用于联邦线性代数程序、联邦参数服务器和联邦数据准备。在本文中，我们将分享系统级编译器和运行时集成、新特性，如多租户联邦学习、选定的联邦原语、多密钥同态加密和我们的监控基础设施。我们的演示程序展示了如何以低开销将复合ML管道编译成联邦运行时计划。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Federated Data Preparation, Learning, and Debugging in Apache SystemDS

Federated learning allows training machine learning (ML) models without central consolidation of the raw data. Variants of such federated learning systems enable privacy-preserving ML, and address data ownership and/or sharing constraints. However, existing work mostly adopt data-parallel parameter-server architectures for mini-batch training, require manual construction of federated runtime plans, and largely ignore the broad variety of data preparation, ML algorithms, and model debugging. Over the last years, we extended Apache SystemDS by an additional federated runtime backend for federated linear-algebra programs, federated parameter servers, and federated data preparation. In this paper, we share the system-level compiler and runtime integration, new features such as multi-tenant federated learning, selected federated primitives, multi-key homomorphic encryption, and our monitoring infrastructure. Our demonstrator showcases how composite ML pipelines can be compiled into federated runtime plans with low overhead.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 31st ACM International Conference on Information & Knowledge Management

自引率

0.00%

发文量