利用对比实例相似性和动态平衡池改进自监督垂直联邦学习

IF 6.2 2区计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS

Future Generation Computer Systems-The International Journal of Escience Pub Date : 2025-05-02 DOI:10.1016/j.future.2025.107884

Shuai Chen , Wenyu Zhang , Xiaoling Huang , Cheng Zhang , Qingjun Mao

{"title":"利用对比实例相似性和动态平衡池改进自监督垂直联邦学习","authors":"Shuai Chen , Wenyu Zhang , Xiaoling Huang , Cheng Zhang , Qingjun Mao","doi":"10.1016/j.future.2025.107884","DOIUrl":null,"url":null,"abstract":"<div><div>Vertical Federated Learning (VFL) enables multiple parties with distinct feature spaces to train a joint VFL model collaboratively without exposing their original private data. In realistic scenarios, the scarcity of aligned and labeled samples among collaborating participants limits the effectiveness of traditional VFL approaches for model training. Current VFL frameworks attempt to leverage abundant unlabeled data using Contrastive Self-Supervised Learning (CSSL). However, the simplistic incorporation of CSSL methods cannot address severe domain shift in VFL. In addition, CSSL methods typically conflict with general regularization approaches designed to alleviate domain shift, thereby significantly limiting the potential of the self-supervised learning framework in VFL. To address these challenges, this study proposes an Improved Self-Supervised Vertical Federated Learning (ISSVFL) framework for VFL in label-scarce scenarios under the semi-honest and no-collusion assumption. ISSVFL merges CSSL with instance-wise similarity to resolve regularization conflicts and captures more significant inter-domain knowledge in the representations from different participants, effectively alleviating domain shift. In addition, a new dynamical balance pool is proposed to fine-tune the pre-trained models for downstream supervised tasks by dynamically balancing inter-domain and intra-domain knowledge. Extensive empirical experiments on image and tabular datasets demonstrate that ISSVFL achieves an average performance improvement of 3.3 % compared with state-of-the-art baselines.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"172 ","pages":"Article 107884"},"PeriodicalIF":6.2000,"publicationDate":"2025-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Improving self-supervised vertical federated learning with contrastive instance-wise similarity and dynamical balance pool\",\"authors\":\"Shuai Chen , Wenyu Zhang , Xiaoling Huang , Cheng Zhang , Qingjun Mao\",\"doi\":\"10.1016/j.future.2025.107884\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Vertical Federated Learning (VFL) enables multiple parties with distinct feature spaces to train a joint VFL model collaboratively without exposing their original private data. In realistic scenarios, the scarcity of aligned and labeled samples among collaborating participants limits the effectiveness of traditional VFL approaches for model training. Current VFL frameworks attempt to leverage abundant unlabeled data using Contrastive Self-Supervised Learning (CSSL). However, the simplistic incorporation of CSSL methods cannot address severe domain shift in VFL. In addition, CSSL methods typically conflict with general regularization approaches designed to alleviate domain shift, thereby significantly limiting the potential of the self-supervised learning framework in VFL. To address these challenges, this study proposes an Improved Self-Supervised Vertical Federated Learning (ISSVFL) framework for VFL in label-scarce scenarios under the semi-honest and no-collusion assumption. ISSVFL merges CSSL with instance-wise similarity to resolve regularization conflicts and captures more significant inter-domain knowledge in the representations from different participants, effectively alleviating domain shift. In addition, a new dynamical balance pool is proposed to fine-tune the pre-trained models for downstream supervised tasks by dynamically balancing inter-domain and intra-domain knowledge. Extensive empirical experiments on image and tabular datasets demonstrate that ISSVFL achieves an average performance improvement of 3.3 % compared with state-of-the-art baselines.</div></div>\",\"PeriodicalId\":55132,\"journal\":{\"name\":\"Future Generation Computer Systems-The International Journal of Escience\",\"volume\":\"172 \",\"pages\":\"Article 107884\"},\"PeriodicalIF\":6.2000,\"publicationDate\":\"2025-05-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Future Generation Computer Systems-The International Journal of Escience\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0167739X25001797\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, THEORY & METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Future Generation Computer Systems-The International Journal of Escience","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167739X25001797","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}

引用次数: 0

摘要

垂直联邦学习（VFL）使具有不同特征空间的多方能够在不暴露其原始私有数据的情况下协作训练联合VFL模型。在现实场景中，协作参与者之间对齐和标记样本的稀缺性限制了传统VFL方法用于模型训练的有效性。当前的VFL框架尝试使用对比自监督学习（CSSL）来利用大量未标记的数据。然而，简单地结合CSSL方法不能解决VFL中严重的域移位问题。此外，CSSL方法通常与旨在缓解域移位的一般正则化方法相冲突，从而极大地限制了自监督学习框架在VFL中的潜力。为了解决这些挑战，本研究提出了一种改进的自监督垂直联邦学习（ISSVFL）框架，用于半诚实和无勾结假设下标签稀缺场景下的VFL。ISSVFL将CSSL与实例相似性相结合，解决了正则化冲突，并在不同参与者的表示中捕获了更重要的领域间知识，有效缓解了领域转移。此外，提出了一个新的动态平衡池，通过动态平衡域间和域内知识来微调下游监督任务的预训练模型。在图像和表格数据集上进行的大量实证实验表明，与最先进的基线相比，ISSVFL的平均性能提高了3.3%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Improving self-supervised vertical federated learning with contrastive instance-wise similarity and dynamical balance pool

Vertical Federated Learning (VFL) enables multiple parties with distinct feature spaces to train a joint VFL model collaboratively without exposing their original private data. In realistic scenarios, the scarcity of aligned and labeled samples among collaborating participants limits the effectiveness of traditional VFL approaches for model training. Current VFL frameworks attempt to leverage abundant unlabeled data using Contrastive Self-Supervised Learning (CSSL). However, the simplistic incorporation of CSSL methods cannot address severe domain shift in VFL. In addition, CSSL methods typically conflict with general regularization approaches designed to alleviate domain shift, thereby significantly limiting the potential of the self-supervised learning framework in VFL. To address these challenges, this study proposes an Improved Self-Supervised Vertical Federated Learning (ISSVFL) framework for VFL in label-scarce scenarios under the semi-honest and no-collusion assumption. ISSVFL merges CSSL with instance-wise similarity to resolve regularization conflicts and captures more significant inter-domain knowledge in the representations from different participants, effectively alleviating domain shift. In addition, a new dynamical balance pool is proposed to fine-tune the pre-trained models for downstream supervised tasks by dynamically balancing inter-domain and intra-domain knowledge. Extensive empirical experiments on image and tabular datasets demonstrate that ISSVFL achieves an average performance improvement of 3.3 % compared with state-of-the-art baselines.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Future Generation Computer Systems-The International Journal of Escience 工程技术-计算机：理论方法

CiteScore

19.90

自引率

2.70%

发文量

376

审稿时长

10.6 months

期刊介绍： Computing infrastructures and systems are constantly evolving, resulting in increasingly complex and collaborative scientific applications. To cope with these advancements, there is a growing need for collaborative tools that can effectively map, control, and execute these applications. Furthermore, with the explosion of Big Data, there is a requirement for innovative methods and infrastructures to collect, analyze, and derive meaningful insights from the vast amount of data generated. This necessitates the integration of computational and storage capabilities, databases, sensors, and human collaboration. Future Generation Computer Systems aims to pioneer advancements in distributed systems, collaborative environments, high-performance computing, and Big Data analytics. It strives to stay at the forefront of developments in grids, clouds, and the Internet of Things (IoT) to effectively address the challenges posed by these wide-area, fully distributed sensing and computing systems.