Tianchi Liao;Lele Fu;Lei Zhang;Lei Yang;Chuan Chen;Michael K. Ng;Huawei Huang;Zibin Zheng
{"title":"面向数据缺失特征的张量分解垂直联邦学习","authors":"Tianchi Liao;Lele Fu;Lei Zhang;Lei Yang;Chuan Chen;Michael K. Ng;Huawei Huang;Zibin Zheng","doi":"10.1109/TIFS.2025.3552033","DOIUrl":null,"url":null,"abstract":"Vertical federated learning (VFL) allows parties to build robust shared machine learning models based on learning from distributed features of the same samples, without exposing their own data. However, current VFL solutions are limited in their ability to perform inference on non-overlapping samples, and data stored on clients is often subject to loss due to various unavoidable factors. This leads to incomplete client data, where client missing features (MF) are frequently overlooked in VFL. The main aim of this paper is to propose a VFL framework to handle missing features (MFVFL), which is a tensor decomposition network-based approach that can effectively learn intra- and inter-client feature information from client data with missing features to improve VFL performance. In the proposed MFVFL method each client imputes missing values and encodes features to learn intra-feature information, and the server collects the uploaded feature embeddings as input to our developed low-rank tensor decomposition network to learn inter-feature information. Finally, the server aggregates the representations from tensor decomposition to train a global classifier. In the paper, we theoretically guarantee the convergence of MFVFL. In addition, differential privacy (DP) for data privacy protection is always used, and the proposed framework (MFVFL-DP) can deal with such degraded data by using a tensor robust PCA to alleviate the impact of noise while preserving data privacy. We conduct extensive experiments on six datasets of different sample sizes and feature dimensions, and demonstrate that MFVFL significantly outperforms state-of-the-art methods, especially under high missing ratios. The experimental results also show that MFVFL-DP possesses excellent denoising capabilities and illustrate that the noisy effect by the DP mechanism can be alleviated.","PeriodicalId":13492,"journal":{"name":"IEEE Transactions on Information Forensics and Security","volume":"20 ","pages":"3445-3460"},"PeriodicalIF":6.3000,"publicationDate":"2025-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Privacy-Preserving Vertical Federated Learning With Tensor Decomposition for Data Missing Features\",\"authors\":\"Tianchi Liao;Lele Fu;Lei Zhang;Lei Yang;Chuan Chen;Michael K. Ng;Huawei Huang;Zibin Zheng\",\"doi\":\"10.1109/TIFS.2025.3552033\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Vertical federated learning (VFL) allows parties to build robust shared machine learning models based on learning from distributed features of the same samples, without exposing their own data. However, current VFL solutions are limited in their ability to perform inference on non-overlapping samples, and data stored on clients is often subject to loss due to various unavoidable factors. This leads to incomplete client data, where client missing features (MF) are frequently overlooked in VFL. The main aim of this paper is to propose a VFL framework to handle missing features (MFVFL), which is a tensor decomposition network-based approach that can effectively learn intra- and inter-client feature information from client data with missing features to improve VFL performance. In the proposed MFVFL method each client imputes missing values and encodes features to learn intra-feature information, and the server collects the uploaded feature embeddings as input to our developed low-rank tensor decomposition network to learn inter-feature information. Finally, the server aggregates the representations from tensor decomposition to train a global classifier. In the paper, we theoretically guarantee the convergence of MFVFL. In addition, differential privacy (DP) for data privacy protection is always used, and the proposed framework (MFVFL-DP) can deal with such degraded data by using a tensor robust PCA to alleviate the impact of noise while preserving data privacy. We conduct extensive experiments on six datasets of different sample sizes and feature dimensions, and demonstrate that MFVFL significantly outperforms state-of-the-art methods, especially under high missing ratios. The experimental results also show that MFVFL-DP possesses excellent denoising capabilities and illustrate that the noisy effect by the DP mechanism can be alleviated.\",\"PeriodicalId\":13492,\"journal\":{\"name\":\"IEEE Transactions on Information Forensics and Security\",\"volume\":\"20 \",\"pages\":\"3445-3460\"},\"PeriodicalIF\":6.3000,\"publicationDate\":\"2025-03-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Information Forensics and Security\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10929023/\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, THEORY & METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Information Forensics and Security","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10929023/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
Privacy-Preserving Vertical Federated Learning With Tensor Decomposition for Data Missing Features
Vertical federated learning (VFL) allows parties to build robust shared machine learning models based on learning from distributed features of the same samples, without exposing their own data. However, current VFL solutions are limited in their ability to perform inference on non-overlapping samples, and data stored on clients is often subject to loss due to various unavoidable factors. This leads to incomplete client data, where client missing features (MF) are frequently overlooked in VFL. The main aim of this paper is to propose a VFL framework to handle missing features (MFVFL), which is a tensor decomposition network-based approach that can effectively learn intra- and inter-client feature information from client data with missing features to improve VFL performance. In the proposed MFVFL method each client imputes missing values and encodes features to learn intra-feature information, and the server collects the uploaded feature embeddings as input to our developed low-rank tensor decomposition network to learn inter-feature information. Finally, the server aggregates the representations from tensor decomposition to train a global classifier. In the paper, we theoretically guarantee the convergence of MFVFL. In addition, differential privacy (DP) for data privacy protection is always used, and the proposed framework (MFVFL-DP) can deal with such degraded data by using a tensor robust PCA to alleviate the impact of noise while preserving data privacy. We conduct extensive experiments on six datasets of different sample sizes and feature dimensions, and demonstrate that MFVFL significantly outperforms state-of-the-art methods, especially under high missing ratios. The experimental results also show that MFVFL-DP possesses excellent denoising capabilities and illustrate that the noisy effect by the DP mechanism can be alleviated.
期刊介绍:
The IEEE Transactions on Information Forensics and Security covers the sciences, technologies, and applications relating to information forensics, information security, biometrics, surveillance and systems applications that incorporate these features