Ling Hu;Tongqing Zhou;Zhihuang Liu;Fang Liu;Zhiping Cai
{"title":"Split Learning on Segmented Healthcare Data","authors":"Ling Hu;Tongqing Zhou;Zhihuang Liu;Fang Liu;Zhiping Cai","doi":"10.1109/TBDATA.2025.3556639","DOIUrl":null,"url":null,"abstract":"Sequential data learning is vital to harnessing the encompassed rich knowledge for diverse downstream tasks, particularly in healthcare (e.g., disease prediction). Considering data sensitiveness, privacy-preserving learning methods, based on federated learning (FL) and split learning (SL), have been widely investigated. Yet, this work identifies, for the first time, existing methods overlook that sequential data are generated by different patients at different times and stored in different hospitals, failing to learn the sequential correlations between different temporal segments. To fill this void, a novel distributed learning framework <monospace>STSL</monospace> is proposed by training a model on the segments in order. Considering that patients have different visit sequences, <monospace>STSL</monospace> first implements privacy-preserving visit ordering based on a secure multi-party computation mechanism. Then batch scheduling participates patients with similar visit (sub-)sequences into the same training batch, facilitating subsequent split learning on batches. The scheduling process is formulated as an NP-hard optimization problem on balancing learning loss and efficiency and a greedy-based solution is presented. Theoretical analysis proves the privacy preservation property of <monospace>STSL</monospace>. Experimental results on real-world eICU data show its superior performance compared with FL and SL (<inline-formula><tex-math>$5\\% \\sim 28\\%$</tex-math></inline-formula> better accuracy) and effectiveness (a remarkable 75% reduction in communication costs).","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 5","pages":"2749-2763"},"PeriodicalIF":5.7000,"publicationDate":"2025-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Big Data","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10946173/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Sequential data learning is vital to harnessing the encompassed rich knowledge for diverse downstream tasks, particularly in healthcare (e.g., disease prediction). Considering data sensitiveness, privacy-preserving learning methods, based on federated learning (FL) and split learning (SL), have been widely investigated. Yet, this work identifies, for the first time, existing methods overlook that sequential data are generated by different patients at different times and stored in different hospitals, failing to learn the sequential correlations between different temporal segments. To fill this void, a novel distributed learning framework STSL is proposed by training a model on the segments in order. Considering that patients have different visit sequences, STSL first implements privacy-preserving visit ordering based on a secure multi-party computation mechanism. Then batch scheduling participates patients with similar visit (sub-)sequences into the same training batch, facilitating subsequent split learning on batches. The scheduling process is formulated as an NP-hard optimization problem on balancing learning loss and efficiency and a greedy-based solution is presented. Theoretical analysis proves the privacy preservation property of STSL. Experimental results on real-world eICU data show its superior performance compared with FL and SL ($5\% \sim 28\%$ better accuracy) and effectiveness (a remarkable 75% reduction in communication costs).
期刊介绍:
The IEEE Transactions on Big Data publishes peer-reviewed articles focusing on big data. These articles present innovative research ideas and application results across disciplines, including novel theories, algorithms, and applications. Research areas cover a wide range, such as big data analytics, visualization, curation, management, semantics, infrastructure, standards, performance analysis, intelligence extraction, scientific discovery, security, privacy, and legal issues specific to big data. The journal also prioritizes applications of big data in fields generating massive datasets.