A Distributed Generative Adversarial Network for Data Augmentation Under Vertical Federated Learning

IF 7.5 3区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Transactions on Big Data Pub Date : 2024-03-12 DOI:10.1109/TBDATA.2024.3375150

Yunpeng Xiao;Xufeng Li;Tun Li;Rong Wang;Yucai Pang;Guoyin Wang

{"title":"A Distributed Generative Adversarial Network for Data Augmentation Under Vertical Federated Learning","authors":"Yunpeng Xiao;Xufeng Li;Tun Li;Rong Wang;Yucai Pang;Guoyin Wang","doi":"10.1109/TBDATA.2024.3375150","DOIUrl":null,"url":null,"abstract":"Vertical federated learning can aggregate participant data features. To address the issue of insufficient overlapping data in vertical federated learning, this study presents a generative adversarial network model that allows distributed data augmentation. First, this study proposes a distributed generative adversarial network FeCGAN for multiple participants with insufficient overlapping data, considering the fact that the generative adversarial network can generate simulation samples. This network is suitable for multiple data sources and can augment participants’ local data. Second, to address the problem of learning divergence caused by different local distributions of multiple data sources, this study proposes the aggregation algorithm FedKL. It aggregates the feedback of the local discriminator to interact with the generator and learns the local data distribution more accurately. Finally, given the problem of data waste caused by the unavailability of nonoverlapping data, this study proposes a data augmentation method called VFeDA. It uses FeCGAN to generate pseudo features and expands more overlapping data, thereby improving the data use. Experiments showed that the proposed model is suitable for multiple data sources and can generate high-quality data.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 1","pages":"74-85"},"PeriodicalIF":7.5000,"publicationDate":"2024-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Big Data","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10463181/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Vertical federated learning can aggregate participant data features. To address the issue of insufficient overlapping data in vertical federated learning, this study presents a generative adversarial network model that allows distributed data augmentation. First, this study proposes a distributed generative adversarial network FeCGAN for multiple participants with insufficient overlapping data, considering the fact that the generative adversarial network can generate simulation samples. This network is suitable for multiple data sources and can augment participants’ local data. Second, to address the problem of learning divergence caused by different local distributions of multiple data sources, this study proposes the aggregation algorithm FedKL. It aggregates the feedback of the local discriminator to interact with the generator and learns the local data distribution more accurately. Finally, given the problem of data waste caused by the unavailability of nonoverlapping data, this study proposes a data augmentation method called VFeDA. It uses FeCGAN to generate pseudo features and expands more overlapping data, thereby improving the data use. Experiments showed that the proposed model is suitable for multiple data sources and can generate high-quality data.

查看原文本刊更多论文

垂直联邦学习下数据增强的分布式生成对抗网络

垂直联合学习可以聚合参与者的数据特征。为了解决垂直联邦学习中重叠数据不足的问题，本研究提出了一种允许分布式数据增强的生成对抗网络模型。首先，考虑到生成式对抗网络可以生成模拟样本，本研究针对数据重叠不足的多参与者，提出了分布式生成式对抗网络FeCGAN。该网络适用于多个数据源，可以增强参与者的本地数据。其次，针对多个数据源局部分布不同导致的学习发散问题，本文提出了聚合算法FedKL。它将局部鉴别器的反馈聚合起来与生成器交互，更准确地学习到局部数据的分布。最后，针对非重叠数据不可用导致的数据浪费问题，本研究提出了一种数据增强方法VFeDA。利用FeCGAN生成伪特征，扩展更多重叠数据，提高数据利用率。实验表明，该模型适用于多数据源，能够生成高质量的数据。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Big Data Multiple-

CiteScore

11.80

自引率

2.80%

发文量

114

期刊介绍： The IEEE Transactions on Big Data publishes peer-reviewed articles focusing on big data. These articles present innovative research ideas and application results across disciplines, including novel theories, algorithms, and applications. Research areas cover a wide range, such as big data analytics, visualization, curation, management, semantics, infrastructure, standards, performance analysis, intelligence extraction, scientific discovery, security, privacy, and legal issues specific to big data. The journal also prioritizes applications of big data in fields generating massive datasets.