Efficiently approaching vertical federated learning by combining data reduction and conditional computation techniques

IF 6.4 2区计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS

Journal of Big Data Pub Date : 2024-05-28 DOI:10.1186/s40537-024-00933-6

Francesco Folino, Gianluigi Folino, Francesco Sergio Pisani, Luigi Pontieri, Pietro Sabatino

{"title":"Efficiently approaching vertical federated learning by combining data reduction and conditional computation techniques","authors":"Francesco Folino, Gianluigi Folino, Francesco Sergio Pisani, Luigi Pontieri, Pietro Sabatino","doi":"10.1186/s40537-024-00933-6","DOIUrl":null,"url":null,"abstract":"<p>In this paper, a framework based on a sparse Mixture of Experts (MoE) architecture is proposed for the federated learning and application of a distributed classification model in domains (like cybersecurity and healthcare) where different parties of the federation store different subsets of features for a number of data instances. The framework is designed to limit the risk of information leakage and computation/communication costs in both model training (through data sampling) and application (leveraging the conditional-computation abilities of sparse MoEs). Experiments on real data have shown the proposed approach to ensure a better balance between efficiency and model accuracy, compared to other VFL-based solutions. Notably, in a real-life cybersecurity case study focused on malware classification (the KronoDroid dataset), the proposed method surpasses competitors even though it utilizes only 50% and 75% of the training set, which is fully utilized by the other approaches in the competition. This method achieves reductions in the rate of false positives by 16.9% and 18.2%, respectively, and also delivers satisfactory results on the other evaluation metrics. These results showcase our framework’s potential to significantly enhance cybersecurity threat detection and prevention in a collaborative yet secure manner.</p>","PeriodicalId":15158,"journal":{"name":"Journal of Big Data","volume":"23 1","pages":""},"PeriodicalIF":6.4000,"publicationDate":"2024-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Big Data","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1186/s40537-024-00933-6","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

In this paper, a framework based on a sparse Mixture of Experts (MoE) architecture is proposed for the federated learning and application of a distributed classification model in domains (like cybersecurity and healthcare) where different parties of the federation store different subsets of features for a number of data instances. The framework is designed to limit the risk of information leakage and computation/communication costs in both model training (through data sampling) and application (leveraging the conditional-computation abilities of sparse MoEs). Experiments on real data have shown the proposed approach to ensure a better balance between efficiency and model accuracy, compared to other VFL-based solutions. Notably, in a real-life cybersecurity case study focused on malware classification (the KronoDroid dataset), the proposed method surpasses competitors even though it utilizes only 50% and 75% of the training set, which is fully utilized by the other approaches in the competition. This method achieves reductions in the rate of false positives by 16.9% and 18.2%, respectively, and also delivers satisfactory results on the other evaluation metrics. These results showcase our framework’s potential to significantly enhance cybersecurity threat detection and prevention in a collaborative yet secure manner.

Abstract Image

查看原文本刊更多论文

结合数据缩减和条件计算技术，高效实现垂直联合学习

本文提出了一个基于稀疏专家混合物（MoE）架构的框架，用于分布式分类模型在不同领域（如网络安全和医疗保健）的联合学习和应用，在这些领域中，联合体的不同成员为大量数据实例存储了不同的特征子集。该框架旨在限制模型训练（通过数据采样）和应用（利用稀疏 MoE 的条件计算能力）中的信息泄露风险和计算/通信成本。对真实数据的实验表明，与其他基于 VFL 的解决方案相比，所提出的方法能确保在效率和模型准确性之间取得更好的平衡。值得注意的是，在以恶意软件分类为重点的真实网络安全案例研究（KronoDroid 数据集）中，所提出的方法超越了竞争对手，尽管它只使用了训练集的 50%和 75%，而其他方法在竞争中已经充分利用了训练集。该方法的误报率分别降低了 16.9% 和 18.2%，在其他评估指标上也取得了令人满意的结果。这些结果表明，我们的框架具有以协作而安全的方式显著提高网络安全威胁检测和预防能力的潜力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Big Data Computer Science-Information Systems

CiteScore

17.80

自引率

3.70%

发文量

105

审稿时长

13 weeks

期刊介绍： The Journal of Big Data publishes high-quality, scholarly research papers, methodologies, and case studies covering a broad spectrum of topics, from big data analytics to data-intensive computing and all applications of big data research. It addresses challenges facing big data today and in the future, including data capture and storage, search, sharing, analytics, technologies, visualization, architectures, data mining, machine learning, cloud computing, distributed systems, and scalable storage. The journal serves as a seminal source of innovative material for academic researchers and practitioners alike.