Split Learning on Segmented Healthcare Data

IF 5.7 3区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Transactions on Big Data Pub Date : 2025-03-31 DOI:10.1109/TBDATA.2025.3556639

Ling Hu;Tongqing Zhou;Zhihuang Liu;Fang Liu;Zhiping Cai

{"title":"Split Learning on Segmented Healthcare Data","authors":"Ling Hu;Tongqing Zhou;Zhihuang Liu;Fang Liu;Zhiping Cai","doi":"10.1109/TBDATA.2025.3556639","DOIUrl":null,"url":null,"abstract":"Sequential data learning is vital to harnessing the encompassed rich knowledge for diverse downstream tasks, particularly in healthcare (e.g., disease prediction). Considering data sensitiveness, privacy-preserving learning methods, based on federated learning (FL) and split learning (SL), have been widely investigated. Yet, this work identifies, for the first time, existing methods overlook that sequential data are generated by different patients at different times and stored in different hospitals, failing to learn the sequential correlations between different temporal segments. To fill this void, a novel distributed learning framework <monospace>STSL</monospace> is proposed by training a model on the segments in order. Considering that patients have different visit sequences, <monospace>STSL</monospace> first implements privacy-preserving visit ordering based on a secure multi-party computation mechanism. Then batch scheduling participates patients with similar visit (sub-)sequences into the same training batch, facilitating subsequent split learning on batches. The scheduling process is formulated as an NP-hard optimization problem on balancing learning loss and efficiency and a greedy-based solution is presented. Theoretical analysis proves the privacy preservation property of <monospace>STSL</monospace>. Experimental results on real-world eICU data show its superior performance compared with FL and SL (<inline-formula><tex-math>$5\\% \\sim 28\\%$</tex-math></inline-formula> better accuracy) and effectiveness (a remarkable 75% reduction in communication costs).","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 5","pages":"2749-2763"},"PeriodicalIF":5.7000,"publicationDate":"2025-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Big Data","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10946173/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Sequential data learning is vital to harnessing the encompassed rich knowledge for diverse downstream tasks, particularly in healthcare (e.g., disease prediction). Considering data sensitiveness, privacy-preserving learning methods, based on federated learning (FL) and split learning (SL), have been widely investigated. Yet, this work identifies, for the first time, existing methods overlook that sequential data are generated by different patients at different times and stored in different hospitals, failing to learn the sequential correlations between different temporal segments. To fill this void, a novel distributed learning framework STSL is proposed by training a model on the segments in order. Considering that patients have different visit sequences, STSL first implements privacy-preserving visit ordering based on a secure multi-party computation mechanism. Then batch scheduling participates patients with similar visit (sub-)sequences into the same training batch, facilitating subsequent split learning on batches. The scheduling process is formulated as an NP-hard optimization problem on balancing learning loss and efficiency and a greedy-based solution is presented. Theoretical analysis proves the privacy preservation property of STSL. Experimental results on real-world eICU data show its superior performance compared with FL and SL (

$5\% \sim 28\%$

better accuracy) and effectiveness (a remarkable 75% reduction in communication costs).

查看原文本刊更多论文

分段医疗保健数据的分割学习

顺序数据学习对于利用所包含的丰富知识完成各种下游任务至关重要，特别是在医疗保健领域（例如，疾病预测）。考虑到数据敏感性，基于联邦学习（FL）和分裂学习（SL）的隐私保护学习方法得到了广泛的研究。然而，这项工作首次发现，现有的方法忽略了顺序数据是由不同的患者在不同的时间产生的，并存储在不同的医院，未能学习不同时间段之间的顺序相关性。为了填补这一空白，提出了一种新的分布式学习框架STSL，该框架通过在分段上按顺序训练模型来实现。考虑到患者就诊顺序不同，STSL首先实现了基于安全多方计算机制的保密性就诊排序。然后，批调度将具有相似就诊（子）序列的患者参与到同一训练批次中，便于后续批次上的分裂学习。将调度过程描述为一个平衡学习损失和效率的NP-hard优化问题，并提出了一种基于贪婪的求解方法。理论分析证明了STSL的隐私保护特性。在实际eICU数据上的实验结果表明，与FL和SL相比，其性能优越（准确率提高5%），效率提高28 %（通信成本显著降低75%）。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Big Data Multiple-

CiteScore

11.80

自引率

2.80%

发文量

114

期刊介绍： The IEEE Transactions on Big Data publishes peer-reviewed articles focusing on big data. These articles present innovative research ideas and application results across disciplines, including novel theories, algorithms, and applications. Research areas cover a wide range, such as big data analytics, visualization, curation, management, semantics, infrastructure, standards, performance analysis, intelligence extraction, scientific discovery, security, privacy, and legal issues specific to big data. The journal also prioritizes applications of big data in fields generating massive datasets.