协方差信息的可访问性在联合学习框架中造成漏洞。

IF 5.4 3区生物学 Q1 BIOCHEMICAL RESEARCH METHODS

Bioinformatics Pub Date : 2023-09-02 DOI:10.1093/bioinformatics/btad531

Manuel Huth, Jonas Arruda, Roy Gusinow, Lorenzo Contento, Evelina Tacconelli, Jan Hasenauer

{"title":"协方差信息的可访问性在联合学习框架中造成漏洞。","authors":"Manuel Huth, Jonas Arruda, Roy Gusinow, Lorenzo Contento, Evelina Tacconelli, Jan Hasenauer","doi":"10.1093/bioinformatics/btad531","DOIUrl":null,"url":null,"abstract":"Motivation: Federated Learning (FL) is gaining traction in various fields as it enables integrative data analysis without sharing sensitive data, such as in healthcare. However, the risk of data leakage caused by malicious attacks must be considered. In this study, we introduce a novel attack algorithm that relies on being able to compute sample means, sample covariances, and construct known linearly independent vectors on the data owner side.Results: We show that these basic functionalities, which are available in several established FL frameworks, are sufficient to reconstruct privacy-protected data. Additionally, the attack algorithm is robust to defense strategies that involve adding random noise. We demonstrate the limitations of existing frameworks and propose potential defense strategies analyzing the implications of using differential privacy. The novel insights presented in this study will aid in the improvement of FL frameworks.Availability and implementation: The code examples are provided at GitHub (https://github.com/manuhuth/Data-Leakage-From-Covariances.git). The CNSIM1 dataset, which we used in the manuscript, is available within the DSData R package (https://github.com/datashield/DSData/tree/main/data).","PeriodicalId":8903,"journal":{"name":"Bioinformatics","volume":" ","pages":""},"PeriodicalIF":5.4000,"publicationDate":"2023-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10516515/pdf/","citationCount":"0","resultStr":"{\"title\":\"Accessibility of covariance information creates vulnerability in Federated Learning frameworks.\",\"authors\":\"Manuel Huth, Jonas Arruda, Roy Gusinow, Lorenzo Contento, Evelina Tacconelli, Jan Hasenauer\",\"doi\":\"10.1093/bioinformatics/btad531\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Motivation: Federated Learning (FL) is gaining traction in various fields as it enables integrative data analysis without sharing sensitive data, such as in healthcare. However, the risk of data leakage caused by malicious attacks must be considered. In this study, we introduce a novel attack algorithm that relies on being able to compute sample means, sample covariances, and construct known linearly independent vectors on the data owner side.Results: We show that these basic functionalities, which are available in several established FL frameworks, are sufficient to reconstruct privacy-protected data. Additionally, the attack algorithm is robust to defense strategies that involve adding random noise. We demonstrate the limitations of existing frameworks and propose potential defense strategies analyzing the implications of using differential privacy. The novel insights presented in this study will aid in the improvement of FL frameworks.Availability and implementation: The code examples are provided at GitHub (https://github.com/manuhuth/Data-Leakage-From-Covariances.git). The CNSIM1 dataset, which we used in the manuscript, is available within the DSData R package (https://github.com/datashield/DSData/tree/main/data).\",\"PeriodicalId\":8903,\"journal\":{\"name\":\"Bioinformatics\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":5.4000,\"publicationDate\":\"2023-09-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10516515/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Bioinformatics\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1093/bioinformatics/btad531\",\"RegionNum\":3,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"BIOCHEMICAL RESEARCH METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/bioinformatics/btad531","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}

引用次数: 0

摘要

动机：联合学习（FL）在各个领域都越来越受欢迎，因为它能够在不共享敏感数据的情况下进行综合数据分析，例如在医疗保健领域。但是，必须考虑恶意攻击导致数据泄露的风险。在这项研究中，我们介绍了一种新的攻击算法，该算法依赖于能够计算样本均值、样本协方差，并在数据所有者侧构造已知的线性无关向量。结果：我们表明，这些基本功能在几个已建立的FL框架中可用，足以重建受隐私保护的数据。此外，该攻击算法对涉及添加随机噪声的防御策略是鲁棒的。我们展示了现有框架的局限性，并提出了潜在的防御策略，分析了使用差异隐私的含义。本研究中提出的新见解将有助于FL框架的改进。可用性和实现：代码示例在GitHub上提供(https://github.com/manuhuth/Data-Leakage-From-Covariances.git)。我们在手稿中使用的CNSIM1数据集在DSData R包中可用(https://github.com/datashield/DSData/tree/main/data)。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Accessibility of covariance information creates vulnerability in Federated Learning frameworks.

查看原文本刊更多论文

Accessibility of covariance information creates vulnerability in Federated Learning frameworks.

Motivation: Federated Learning (FL) is gaining traction in various fields as it enables integrative data analysis without sharing sensitive data, such as in healthcare. However, the risk of data leakage caused by malicious attacks must be considered. In this study, we introduce a novel attack algorithm that relies on being able to compute sample means, sample covariances, and construct known linearly independent vectors on the data owner side.

Results: We show that these basic functionalities, which are available in several established FL frameworks, are sufficient to reconstruct privacy-protected data. Additionally, the attack algorithm is robust to defense strategies that involve adding random noise. We demonstrate the limitations of existing frameworks and propose potential defense strategies analyzing the implications of using differential privacy. The novel insights presented in this study will aid in the improvement of FL frameworks.

Availability and implementation: The code examples are provided at GitHub (https://github.com/manuhuth/Data-Leakage-From-Covariances.git). The CNSIM1 dataset, which we used in the manuscript, is available within the DSData R package (https://github.com/datashield/DSData/tree/main/data).

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Bioinformatics 生物-生化研究方法

CiteScore

11.20

自引率

5.20%

发文量

753

审稿时长

2.1 months

期刊介绍： The leading journal in its field, Bioinformatics publishes the highest quality scientific papers and review articles of interest to academic and industrial researchers. Its main focus is on new developments in genome bioinformatics and computational biology. Two distinct sections within the journal - Discovery Notes and Application Notes- focus on shorter papers; the former reporting biologically interesting discoveries using computational methods, the latter exploring the applications used for experiments.