{"title":"Practical Federated Learning Infrastructure for Privacy-Preserving Scientific Computing","authors":"Lesi Wang, Dongfang Zhao","doi":"10.1109/AI4S56813.2022.00012","DOIUrl":null,"url":null,"abstract":"Federated learning (FL) is deemed a promising paradigm for privacy-preserving data analytics in collaborative scientific computing. However, there lacks an effective and easy-to-use FL infrastructure for scientific computing and high-performance computing (HPC) environments. The objective of this position paper is two-fold. Firstly, we identify three missing pieces of a scientific FL infrastructure: (i) a native MPI programming interface that can be seamlessly integrated into existing scientific applications, (ii) an independent data layer for the FL system such that the user can pick the persistent medium for her own choice, such as parallel file systems and multidimensional databases, and (iii) efficient encryption protocols that are optimized for scientific workflows. The second objective of this paper is to present a work-in-progress FL infrastructure, namely MPI-FL, which is implemented with PyTorch and MPI4py. We deploy MPI-FL on 1,000 CPU cores and evaluate it with four standard benchmarks: MNIST, Fashion-MNIST, CIFAR-10, and SVHN-extra. It is our hope that the scientific computing and HPC community would find MPI-FL useful for their FL-related projects.","PeriodicalId":262536,"journal":{"name":"2022 IEEE/ACM International Workshop on Artificial Intelligence and Machine Learning for Scientific Applications (AI4S)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE/ACM International Workshop on Artificial Intelligence and Machine Learning for Scientific Applications (AI4S)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AI4S56813.2022.00012","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
Federated learning (FL) is deemed a promising paradigm for privacy-preserving data analytics in collaborative scientific computing. However, there lacks an effective and easy-to-use FL infrastructure for scientific computing and high-performance computing (HPC) environments. The objective of this position paper is two-fold. Firstly, we identify three missing pieces of a scientific FL infrastructure: (i) a native MPI programming interface that can be seamlessly integrated into existing scientific applications, (ii) an independent data layer for the FL system such that the user can pick the persistent medium for her own choice, such as parallel file systems and multidimensional databases, and (iii) efficient encryption protocols that are optimized for scientific workflows. The second objective of this paper is to present a work-in-progress FL infrastructure, namely MPI-FL, which is implemented with PyTorch and MPI4py. We deploy MPI-FL on 1,000 CPU cores and evaluate it with four standard benchmarks: MNIST, Fashion-MNIST, CIFAR-10, and SVHN-extra. It is our hope that the scientific computing and HPC community would find MPI-FL useful for their FL-related projects.