Distributed collaborative machine learning in real-world application scenario: A white blood cell subtypes classification case study

IF 4.2 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Image and Vision Computing Pub Date : 2025-08-11 DOI:10.1016/j.imavis.2025.105673

Lorenzo Putzu , Simone Porcu , Andrea Loddo

{"title":"Distributed collaborative machine learning in real-world application scenario: A white blood cell subtypes classification case study","authors":"Lorenzo Putzu , Simone Porcu , Andrea Loddo","doi":"10.1016/j.imavis.2025.105673","DOIUrl":null,"url":null,"abstract":"<div><div>White blood cell (WBC) subtype classification is a critical step in monitoring an individual’s health. However, it remains a challenging task due to the significant morphological variability of WBCs and the domain shift introduced by differing acquisition protocols across hospitals. Numerous approaches have been proposed to mitigate domain shift, including supervised and unsupervised domain adaptation, as well as domain generalisation. These methods, however, require a suitable amount of representative target images, even if unlabelled, or a suitable amount of images from multiple sources, which may not be feasible due to privacy regulations. In this study, we explore an alternative paradigm, known as <em>Distributed Collaborative Machine Learning</em> (DCML), which consists of exploiting images from different sources in a privacy-preserving setup. Although DCML methods seem well suited to this application, to the best of our knowledge, they have not been used for this task or to address the above-mentioned issues. However, we argue that DCML deserves further consideration in medical images as a potential alternative solution against domain shift in a privacy-preserving setup. To substantiate our view, we consider three DCML methods: early and late fusion and federated learning approaches, each offering distinct trade-offs in terms of training constraints, computational overhead and communications costs. We then conduct an extensive, cross-dataset experimental evaluation on four benchmark datasets and provide evidence that even <em>simple</em> implementations of DCML methods can effectively mitigate domain shift in WBC classification tasks.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"162 ","pages":"Article 105673"},"PeriodicalIF":4.2000,"publicationDate":"2025-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Image and Vision Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0262885625002616","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

White blood cell (WBC) subtype classification is a critical step in monitoring an individual’s health. However, it remains a challenging task due to the significant morphological variability of WBCs and the domain shift introduced by differing acquisition protocols across hospitals. Numerous approaches have been proposed to mitigate domain shift, including supervised and unsupervised domain adaptation, as well as domain generalisation. These methods, however, require a suitable amount of representative target images, even if unlabelled, or a suitable amount of images from multiple sources, which may not be feasible due to privacy regulations. In this study, we explore an alternative paradigm, known as Distributed Collaborative Machine Learning (DCML), which consists of exploiting images from different sources in a privacy-preserving setup. Although DCML methods seem well suited to this application, to the best of our knowledge, they have not been used for this task or to address the above-mentioned issues. However, we argue that DCML deserves further consideration in medical images as a potential alternative solution against domain shift in a privacy-preserving setup. To substantiate our view, we consider three DCML methods: early and late fusion and federated learning approaches, each offering distinct trade-offs in terms of training constraints, computational overhead and communications costs. We then conduct an extensive, cross-dataset experimental evaluation on four benchmark datasets and provide evidence that even simple implementations of DCML methods can effectively mitigate domain shift in WBC classification tasks.

查看原文本刊更多论文

真实世界应用场景中的分布式协作机器学习：一个白细胞亚型分类案例研究

白细胞（WBC）亚型分类是监测个人健康的关键步骤。然而，这仍然是一项具有挑战性的任务，因为白细胞的显著形态变化和不同医院的不同获取协议引入的域转移。已经提出了许多缓解领域转移的方法，包括监督和无监督领域适应以及领域泛化。然而，这些方法需要适当数量的代表性目标图像，即使没有标记，或者来自多个来源的适当数量的图像，这可能由于隐私法规而不可行。在本研究中，我们探索了另一种范式，称为分布式协作机器学习（DCML），它包括在保护隐私的设置中利用来自不同来源的图像。尽管DCML方法似乎非常适合此应用程序，但据我们所知，它们还没有用于此任务或解决上述问题。然而，我们认为，在医学图像中，DCML值得进一步考虑，作为在隐私保护设置中对抗域转移的潜在替代解决方案。为了证实我们的观点，我们考虑了三种DCML方法：早期和晚期融合和联邦学习方法，每种方法在训练约束、计算开销和通信成本方面都提供了不同的权衡。然后，我们对四个基准数据集进行了广泛的跨数据集实验评估，并提供了证据，证明即使简单的DCML方法实现也可以有效地缓解WBC分类任务中的域转移。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Image and Vision Computing 工程技术-工程：电子与电气

CiteScore

8.50

自引率

8.50%

发文量

143

审稿时长

7.8 months

期刊介绍： Image and Vision Computing has as a primary aim the provision of an effective medium of interchange for the results of high quality theoretical and applied research fundamental to all aspects of image interpretation and computer vision. The journal publishes work that proposes new image interpretation and computer vision methodology or addresses the application of such methods to real world scenes. It seeks to strengthen a deeper understanding in the discipline by encouraging the quantitative comparison and performance evaluation of the proposed methodology. The coverage includes: image interpretation, scene modelling, object recognition and tracking, shape analysis, monitoring and surveillance, active vision and robotic systems, SLAM, biologically-inspired computer vision, motion analysis, stereo vision, document image understanding, character and handwritten text recognition, face and gesture recognition, biometrics, vision-based human-computer interaction, human activity and behavior understanding, data fusion from multiple sensor inputs, image databases.