End-to-end Privacy Preserving Training and Inference for Air Pollution Forecasting with Data from Rival Fleets

IACR Cryptol. ePrint Arch. Pub Date : 2023-10-01 DOI:10.56553/popets-2023-0118

Gauri Gupta, Krithika Ramesh, Anwesh Bhattacharya, Divya Gupta, Rahul Sharma, Nishanth Chandran, Rijurekha Sen

{"title":"End-to-end Privacy Preserving Training and Inference for Air Pollution Forecasting with Data from Rival Fleets","authors":"Gauri Gupta, Krithika Ramesh, Anwesh Bhattacharya, Divya Gupta, Rahul Sharma, Nishanth Chandran, Rijurekha Sen","doi":"10.56553/popets-2023-0118","DOIUrl":null,"url":null,"abstract":"Privacy-preserving machine learning (PPML) promises to train machine learning (ML) models by combining data spread across multiple data silos. Theoretically, secure multiparty computation (MPC) allows multiple data owners to train models on their joint data without revealing the data to each other. However, the prior implementations of this secure training using MPC have three limitations: they have only been evaluated on CNNs, and LSTMs have been ignored; fixed point approximations have affected training accuracies compared to training in floating point; and due to significant latency overheads of secure training via MPC, its relevance for practical tasks with streaming data remains unclear. The motivation of this work is to report our experience of addressing the practical problem of secure training and inference of models for urban sensing problems, e.g., traffic congestion estimation, or air pollution monitoring in large cities, where data can be contributed by rival fleet companies while balancing the privacy-accuracy trade-offs using MPC-based techniques.Our first contribution is to design a custom ML model for this task that can be efficiently trained with MPC within a desirable latency. In particular, we design a GCN-LSTM and securely train it on time-series sensor data for accurate forecasting, within 7 minutes per epoch. As our second contribution, we build an end-to-end system of private training and inference that provably matches the training accuracy of cleartext ML training. This work is the first to securely train a model with LSTM cells. Third, this trained model is kept secret-shared between the fleet companies and allows clients to make sensitive queries to this model while carefully handling potentially invalid queries. Our custom protocols allow clients to query predictions from privately trained models in milliseconds, all the while maintaining accuracy and cryptographic security.","PeriodicalId":13158,"journal":{"name":"IACR Cryptol. ePrint Arch.","volume":"480 1","pages":"1010"},"PeriodicalIF":0.0000,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IACR Cryptol. ePrint Arch.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.56553/popets-2023-0118","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Privacy-preserving machine learning (PPML) promises to train machine learning (ML) models by combining data spread across multiple data silos. Theoretically, secure multiparty computation (MPC) allows multiple data owners to train models on their joint data without revealing the data to each other. However, the prior implementations of this secure training using MPC have three limitations: they have only been evaluated on CNNs, and LSTMs have been ignored; fixed point approximations have affected training accuracies compared to training in floating point; and due to significant latency overheads of secure training via MPC, its relevance for practical tasks with streaming data remains unclear. The motivation of this work is to report our experience of addressing the practical problem of secure training and inference of models for urban sensing problems, e.g., traffic congestion estimation, or air pollution monitoring in large cities, where data can be contributed by rival fleet companies while balancing the privacy-accuracy trade-offs using MPC-based techniques.Our first contribution is to design a custom ML model for this task that can be efficiently trained with MPC within a desirable latency. In particular, we design a GCN-LSTM and securely train it on time-series sensor data for accurate forecasting, within 7 minutes per epoch. As our second contribution, we build an end-to-end system of private training and inference that provably matches the training accuracy of cleartext ML training. This work is the first to securely train a model with LSTM cells. Third, this trained model is kept secret-shared between the fleet companies and allows clients to make sensitive queries to this model while carefully handling potentially invalid queries. Our custom protocols allow clients to query predictions from privately trained models in milliseconds, all the while maintaining accuracy and cryptographic security.

查看原文本刊更多论文

基于竞争对手机队数据的空气污染预测的端到端隐私保护训练和推理

隐私保护机器学习(PPML)有望通过组合跨多个数据孤岛的数据来训练机器学习(ML)模型。从理论上讲，安全多方计算(MPC)允许多个数据所有者在不向彼此透露数据的情况下，在他们的联合数据上训练模型。然而，先前使用MPC实现这种安全训练有三个局限性:它们只在cnn上进行了评估，而lstm被忽略了;与浮点训练相比，定点近似会影响训练精度;由于通过MPC进行安全训练的显著延迟开销，其与流数据的实际任务的相关性尚不清楚。这项工作的动机是报告我们在解决城市传感问题模型的安全训练和推理的实际问题方面的经验，例如，交通拥堵估计或大城市的空气污染监测，其中数据可以由竞争对手车队公司提供，同时使用基于mpc的技术平衡隐私-准确性权衡。我们的第一个贡献是为该任务设计一个自定义ML模型，该模型可以在理想的延迟内有效地使用MPC进行训练。特别是，我们设计了一个GCN-LSTM，并在时间序列传感器数据上对其进行安全训练，以便在每个历元7分钟内进行准确预测。作为我们的第二个贡献，我们建立了一个端到端的私人训练和推理系统，该系统可以证明与明文ML训练的训练精度相匹配。这项工作是第一次使用LSTM细胞安全地训练模型。第三，这个经过训练的模型在车队公司之间保密共享，并允许客户对该模型进行敏感查询，同时小心处理可能无效的查询。我们的自定义协议允许客户在毫秒内从私人训练的模型中查询预测，同时保持准确性和加密安全性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IACR Cryptol. ePrint Arch.

自引率

0.00%

发文量