{"title":"医疗保健应用中并行分割学习的数据分布感知聚类","authors":"Md. Tanvir Arafat , Md. Abdur Razzaque , Abdulhameed Alelaiwi , Md. Zia Uddin , Mohammad Mehedi Hassan","doi":"10.1016/j.future.2025.107911","DOIUrl":null,"url":null,"abstract":"<div><div>Split learning, a promising approach in privacy-preserving machine learning, decentralizes model training by dividing it among client devices and a central server. However, split learning has exhibited a certain level of slowness in its vanilla approach, mainly due to the serial processing of devices. Recent research endeavors have addressed this challenge by introducing parallelism and thus accelerating the split learning process. However, the existing split learning methodologies often overlook the critical aspect of data distribution among client devices.</div><div>This paper introduces a Data Distribution Aware Clustering-based Parallel Split Learning (DCSL), a scheme purposefully crafted to address the complexities stemming from non-identically and non-independently distributed (non-IID) data among client devices engaged in the split learning paradigm. In healthcare applications, comprehending the intricacies of data distribution is imperative, particularly given the non-IID nature of medical datasets, to ensure accurate analysis and decision-making. The DCSL leverages a novel clustering technique to create clusters of medical client devices, considering the data distributions of their local datasets, and employs parallel model training within the device clusters. It enhances model convergence and reduces training latency by optimizing the cluster formation. Extensive experiments demonstrate that DCSL outperforms traditional split learning approaches, significantly improving accuracy and reducing training latency across various applications.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"174 ","pages":"Article 107911"},"PeriodicalIF":6.2000,"publicationDate":"2025-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Data distribution aware clustering for parallel split learning in healthcare applications\",\"authors\":\"Md. Tanvir Arafat , Md. Abdur Razzaque , Abdulhameed Alelaiwi , Md. Zia Uddin , Mohammad Mehedi Hassan\",\"doi\":\"10.1016/j.future.2025.107911\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Split learning, a promising approach in privacy-preserving machine learning, decentralizes model training by dividing it among client devices and a central server. However, split learning has exhibited a certain level of slowness in its vanilla approach, mainly due to the serial processing of devices. Recent research endeavors have addressed this challenge by introducing parallelism and thus accelerating the split learning process. However, the existing split learning methodologies often overlook the critical aspect of data distribution among client devices.</div><div>This paper introduces a Data Distribution Aware Clustering-based Parallel Split Learning (DCSL), a scheme purposefully crafted to address the complexities stemming from non-identically and non-independently distributed (non-IID) data among client devices engaged in the split learning paradigm. In healthcare applications, comprehending the intricacies of data distribution is imperative, particularly given the non-IID nature of medical datasets, to ensure accurate analysis and decision-making. The DCSL leverages a novel clustering technique to create clusters of medical client devices, considering the data distributions of their local datasets, and employs parallel model training within the device clusters. It enhances model convergence and reduces training latency by optimizing the cluster formation. Extensive experiments demonstrate that DCSL outperforms traditional split learning approaches, significantly improving accuracy and reducing training latency across various applications.</div></div>\",\"PeriodicalId\":55132,\"journal\":{\"name\":\"Future Generation Computer Systems-The International Journal of Escience\",\"volume\":\"174 \",\"pages\":\"Article 107911\"},\"PeriodicalIF\":6.2000,\"publicationDate\":\"2025-06-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Future Generation Computer Systems-The International Journal of Escience\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0167739X25002067\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, THEORY & METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Future Generation Computer Systems-The International Journal of Escience","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167739X25002067","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
Data distribution aware clustering for parallel split learning in healthcare applications
Split learning, a promising approach in privacy-preserving machine learning, decentralizes model training by dividing it among client devices and a central server. However, split learning has exhibited a certain level of slowness in its vanilla approach, mainly due to the serial processing of devices. Recent research endeavors have addressed this challenge by introducing parallelism and thus accelerating the split learning process. However, the existing split learning methodologies often overlook the critical aspect of data distribution among client devices.
This paper introduces a Data Distribution Aware Clustering-based Parallel Split Learning (DCSL), a scheme purposefully crafted to address the complexities stemming from non-identically and non-independently distributed (non-IID) data among client devices engaged in the split learning paradigm. In healthcare applications, comprehending the intricacies of data distribution is imperative, particularly given the non-IID nature of medical datasets, to ensure accurate analysis and decision-making. The DCSL leverages a novel clustering technique to create clusters of medical client devices, considering the data distributions of their local datasets, and employs parallel model training within the device clusters. It enhances model convergence and reduces training latency by optimizing the cluster formation. Extensive experiments demonstrate that DCSL outperforms traditional split learning approaches, significantly improving accuracy and reducing training latency across various applications.
期刊介绍:
Computing infrastructures and systems are constantly evolving, resulting in increasingly complex and collaborative scientific applications. To cope with these advancements, there is a growing need for collaborative tools that can effectively map, control, and execute these applications.
Furthermore, with the explosion of Big Data, there is a requirement for innovative methods and infrastructures to collect, analyze, and derive meaningful insights from the vast amount of data generated. This necessitates the integration of computational and storage capabilities, databases, sensors, and human collaboration.
Future Generation Computer Systems aims to pioneer advancements in distributed systems, collaborative environments, high-performance computing, and Big Data analytics. It strives to stay at the forefront of developments in grids, clouds, and the Internet of Things (IoT) to effectively address the challenges posed by these wide-area, fully distributed sensing and computing systems.