Privacy-Preserving Vertical Federated Learning With Tensor Decomposition for Data Missing Features

IF 6.3 1区计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS

IEEE Transactions on Information Forensics and Security Pub Date : 2025-03-18 DOI:10.1109/TIFS.2025.3552033

Tianchi Liao;Lele Fu;Lei Zhang;Lei Yang;Chuan Chen;Michael K. Ng;Huawei Huang;Zibin Zheng

{"title":"Privacy-Preserving Vertical Federated Learning With Tensor Decomposition for Data Missing Features","authors":"Tianchi Liao;Lele Fu;Lei Zhang;Lei Yang;Chuan Chen;Michael K. Ng;Huawei Huang;Zibin Zheng","doi":"10.1109/TIFS.2025.3552033","DOIUrl":null,"url":null,"abstract":"Vertical federated learning (VFL) allows parties to build robust shared machine learning models based on learning from distributed features of the same samples, without exposing their own data. However, current VFL solutions are limited in their ability to perform inference on non-overlapping samples, and data stored on clients is often subject to loss due to various unavoidable factors. This leads to incomplete client data, where client missing features (MF) are frequently overlooked in VFL. The main aim of this paper is to propose a VFL framework to handle missing features (MFVFL), which is a tensor decomposition network-based approach that can effectively learn intra- and inter-client feature information from client data with missing features to improve VFL performance. In the proposed MFVFL method each client imputes missing values and encodes features to learn intra-feature information, and the server collects the uploaded feature embeddings as input to our developed low-rank tensor decomposition network to learn inter-feature information. Finally, the server aggregates the representations from tensor decomposition to train a global classifier. In the paper, we theoretically guarantee the convergence of MFVFL. In addition, differential privacy (DP) for data privacy protection is always used, and the proposed framework (MFVFL-DP) can deal with such degraded data by using a tensor robust PCA to alleviate the impact of noise while preserving data privacy. We conduct extensive experiments on six datasets of different sample sizes and feature dimensions, and demonstrate that MFVFL significantly outperforms state-of-the-art methods, especially under high missing ratios. The experimental results also show that MFVFL-DP possesses excellent denoising capabilities and illustrate that the noisy effect by the DP mechanism can be alleviated.","PeriodicalId":13492,"journal":{"name":"IEEE Transactions on Information Forensics and Security","volume":"20 ","pages":"3445-3460"},"PeriodicalIF":6.3000,"publicationDate":"2025-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Information Forensics and Security","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10929023/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

Vertical federated learning (VFL) allows parties to build robust shared machine learning models based on learning from distributed features of the same samples, without exposing their own data. However, current VFL solutions are limited in their ability to perform inference on non-overlapping samples, and data stored on clients is often subject to loss due to various unavoidable factors. This leads to incomplete client data, where client missing features (MF) are frequently overlooked in VFL. The main aim of this paper is to propose a VFL framework to handle missing features (MFVFL), which is a tensor decomposition network-based approach that can effectively learn intra- and inter-client feature information from client data with missing features to improve VFL performance. In the proposed MFVFL method each client imputes missing values and encodes features to learn intra-feature information, and the server collects the uploaded feature embeddings as input to our developed low-rank tensor decomposition network to learn inter-feature information. Finally, the server aggregates the representations from tensor decomposition to train a global classifier. In the paper, we theoretically guarantee the convergence of MFVFL. In addition, differential privacy (DP) for data privacy protection is always used, and the proposed framework (MFVFL-DP) can deal with such degraded data by using a tensor robust PCA to alleviate the impact of noise while preserving data privacy. We conduct extensive experiments on six datasets of different sample sizes and feature dimensions, and demonstrate that MFVFL significantly outperforms state-of-the-art methods, especially under high missing ratios. The experimental results also show that MFVFL-DP possesses excellent denoising capabilities and illustrate that the noisy effect by the DP mechanism can be alleviated.

查看原文本刊更多论文

面向数据缺失特征的张量分解垂直联邦学习

垂直联邦学习（VFL）允许各方基于从相同样本的分布式特征中学习来构建健壮的共享机器学习模型，而无需暴露自己的数据。然而，目前的VFL解决方案在非重叠样本上进行推断的能力有限，并且由于各种不可避免的因素，存储在客户端的数据经常会丢失。这导致客户端数据不完整，客户端缺失特征（MF）在VFL中经常被忽略。本文的主要目的是提出一种VFL框架来处理缺失特征（MFVFL），该框架是一种基于张量分解网络的方法，可以有效地从缺失特征的客户端数据中学习客户端内部和客户端之间的特征信息，从而提高VFL性能。在本文提出的MFVFL方法中，每个客户端输入缺失值并编码特征以学习特征内部信息，服务器端收集上传的特征嵌入作为输入到我们开发的低秩张量分解网络中以学习特征间信息。最后，服务器聚合来自张量分解的表示以训练全局分类器。从理论上保证了MFVFL的收敛性。此外，差分隐私（DP）一直被用于数据隐私保护，所提出的框架（MFVFL-DP）可以通过使用张量鲁棒PCA来处理这种退化的数据，以减轻噪声的影响，同时保持数据隐私。我们在六个不同样本大小和特征维度的数据集上进行了广泛的实验，并证明MFVFL显著优于最先进的方法，特别是在高缺失率下。实验结果还表明，MFVFL-DP具有良好的去噪能力，并说明了DP机制可以减轻噪声影响。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Information Forensics and Security 工程技术-工程：电子与电气

CiteScore

14.40

自引率

7.40%

发文量

234

审稿时长

6.5 months

期刊介绍： The IEEE Transactions on Information Forensics and Security covers the sciences, technologies, and applications relating to information forensics, information security, biometrics, surveillance and systems applications that incorporate these features