Yunpeng Xiao;Xufeng Li;Tun Li;Rong Wang;Yucai Pang;Guoyin Wang
{"title":"A Distributed Generative Adversarial Network for Data Augmentation Under Vertical Federated Learning","authors":"Yunpeng Xiao;Xufeng Li;Tun Li;Rong Wang;Yucai Pang;Guoyin Wang","doi":"10.1109/TBDATA.2024.3375150","DOIUrl":null,"url":null,"abstract":"Vertical federated learning can aggregate participant data features. To address the issue of insufficient overlapping data in vertical federated learning, this study presents a generative adversarial network model that allows distributed data augmentation. First, this study proposes a distributed generative adversarial network FeCGAN for multiple participants with insufficient overlapping data, considering the fact that the generative adversarial network can generate simulation samples. This network is suitable for multiple data sources and can augment participants’ local data. Second, to address the problem of learning divergence caused by different local distributions of multiple data sources, this study proposes the aggregation algorithm FedKL. It aggregates the feedback of the local discriminator to interact with the generator and learns the local data distribution more accurately. Finally, given the problem of data waste caused by the unavailability of nonoverlapping data, this study proposes a data augmentation method called VFeDA. It uses FeCGAN to generate pseudo features and expands more overlapping data, thereby improving the data use. Experiments showed that the proposed model is suitable for multiple data sources and can generate high-quality data.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 1","pages":"74-85"},"PeriodicalIF":7.5000,"publicationDate":"2024-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Big Data","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10463181/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Vertical federated learning can aggregate participant data features. To address the issue of insufficient overlapping data in vertical federated learning, this study presents a generative adversarial network model that allows distributed data augmentation. First, this study proposes a distributed generative adversarial network FeCGAN for multiple participants with insufficient overlapping data, considering the fact that the generative adversarial network can generate simulation samples. This network is suitable for multiple data sources and can augment participants’ local data. Second, to address the problem of learning divergence caused by different local distributions of multiple data sources, this study proposes the aggregation algorithm FedKL. It aggregates the feedback of the local discriminator to interact with the generator and learns the local data distribution more accurately. Finally, given the problem of data waste caused by the unavailability of nonoverlapping data, this study proposes a data augmentation method called VFeDA. It uses FeCGAN to generate pseudo features and expands more overlapping data, thereby improving the data use. Experiments showed that the proposed model is suitable for multiple data sources and can generate high-quality data.
期刊介绍:
The IEEE Transactions on Big Data publishes peer-reviewed articles focusing on big data. These articles present innovative research ideas and application results across disciplines, including novel theories, algorithms, and applications. Research areas cover a wide range, such as big data analytics, visualization, curation, management, semantics, infrastructure, standards, performance analysis, intelligence extraction, scientific discovery, security, privacy, and legal issues specific to big data. The journal also prioritizes applications of big data in fields generating massive datasets.