局部数据稀缺和类别不平衡下皮肤病变分类的异步局部联邦学习

IF 4.8 2区医学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Computer methods and programs in biomedicine Pub Date : 2025-09-09 DOI:10.1016/j.cmpb.2025.109073

Shichao Ma , Yun-Hin Chan , Edith C.H. Ngai , Joshua W.K. Ho

{"title":"局部数据稀缺和类别不平衡下皮肤病变分类的异步局部联邦学习","authors":"Shichao Ma , Yun-Hin Chan , Edith C.H. Ngai , Joshua W.K. Ho","doi":"10.1016/j.cmpb.2025.109073","DOIUrl":null,"url":null,"abstract":"<div><h3>Background and Objectives</h3><div>Federated learning (FL) is an approach that enables the training of machine learning (ML) models using data from multiple data nodes without direct data transfer, hence making it a good choice for healthcare ML applications to alleviate data privacy and security concerns. Most standard FL approaches focus on the setting of a small number of nodes, with each node contributing a sizable amount of data. However, in emerging healthcare settings such as telemedicine and the Internet of Medical Things (IoMT), it is necessary to consider the situation in which there is a large number of nodes, and each contributes a relatively small number (data scarcity) of non-independent (class imbalance) data points.</div></div><div><h3>Methods</h3><div>In this paper, we propose an asynchronous and focal update approach to enable FL to address this problem. In particular, we demonstrate its use in a teledermatology setting, in which a skin lesion image classifier is continuously updated based on data in a highly distributed network of mobile devices. We performed a situation experiment in which 1,268 skin lesion images across 798 mobile devices contributed to the training of a 3-class classifier in an FL framework.</div></div><div><h3>Results</h3><div>We found that widely used synchronous FL methods perform poorly under conditions of data scarcity and imbalance. Specifically, using FedAvg, FedProx, and FedNova, the trained classifiers achieved AUROC values of 0.57-0.67, 0.63-0.66, and 0.64-0.67, respectively, on the held-out test set across various experimental settings. In contrast, our proposed asynchronous and focal approach achieved a test AUROC of 0.78-0.89 after 40 global training epochs. This performance is significantly closer to the optimal AUROC of 0.91, which is achievable by training a classifier with all the data on a centralised server without FL.</div></div><div><h3>Conclusions</h3><div>These results demonstrate that our approach provides a useful solution to implement an efficient FL scheme under the conditions of data scarcity and class imbalance that are commonly found in realistic telemedicine and IoMT applications.</div></div>","PeriodicalId":10624,"journal":{"name":"Computer methods and programs in biomedicine","volume":"272 ","pages":"Article 109073"},"PeriodicalIF":4.8000,"publicationDate":"2025-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Asynchronous and focal federated learning for skin lesion classification under local data scarcity and class imbalance\",\"authors\":\"Shichao Ma , Yun-Hin Chan , Edith C.H. Ngai , Joshua W.K. Ho\",\"doi\":\"10.1016/j.cmpb.2025.109073\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Background and Objectives</h3><div>Federated learning (FL) is an approach that enables the training of machine learning (ML) models using data from multiple data nodes without direct data transfer, hence making it a good choice for healthcare ML applications to alleviate data privacy and security concerns. Most standard FL approaches focus on the setting of a small number of nodes, with each node contributing a sizable amount of data. However, in emerging healthcare settings such as telemedicine and the Internet of Medical Things (IoMT), it is necessary to consider the situation in which there is a large number of nodes, and each contributes a relatively small number (data scarcity) of non-independent (class imbalance) data points.</div></div><div><h3>Methods</h3><div>In this paper, we propose an asynchronous and focal update approach to enable FL to address this problem. In particular, we demonstrate its use in a teledermatology setting, in which a skin lesion image classifier is continuously updated based on data in a highly distributed network of mobile devices. We performed a situation experiment in which 1,268 skin lesion images across 798 mobile devices contributed to the training of a 3-class classifier in an FL framework.</div></div><div><h3>Results</h3><div>We found that widely used synchronous FL methods perform poorly under conditions of data scarcity and imbalance. Specifically, using FedAvg, FedProx, and FedNova, the trained classifiers achieved AUROC values of 0.57-0.67, 0.63-0.66, and 0.64-0.67, respectively, on the held-out test set across various experimental settings. In contrast, our proposed asynchronous and focal approach achieved a test AUROC of 0.78-0.89 after 40 global training epochs. This performance is significantly closer to the optimal AUROC of 0.91, which is achievable by training a classifier with all the data on a centralised server without FL.</div></div><div><h3>Conclusions</h3><div>These results demonstrate that our approach provides a useful solution to implement an efficient FL scheme under the conditions of data scarcity and class imbalance that are commonly found in realistic telemedicine and IoMT applications.</div></div>\",\"PeriodicalId\":10624,\"journal\":{\"name\":\"Computer methods and programs in biomedicine\",\"volume\":\"272 \",\"pages\":\"Article 109073\"},\"PeriodicalIF\":4.8000,\"publicationDate\":\"2025-09-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computer methods and programs in biomedicine\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0169260725004900\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer methods and programs in biomedicine","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0169260725004900","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

摘要

背景和目的联邦学习（FL）是一种方法，它可以使用来自多个数据节点的数据来训练机器学习（ML）模型，而无需直接传输数据，因此使其成为医疗保健ML应用程序减轻数据隐私和安全问题的良好选择。大多数标准的FL方法侧重于设置少量节点，每个节点贡献相当数量的数据。然而，在远程医疗和医疗物联网（IoMT）等新兴医疗环境中，有必要考虑存在大量节点的情况，并且每个节点贡献的非独立（类不平衡）数据点数量相对较少（数据稀缺性）。方法在本文中，我们提出了一种异步和焦点更新的方法，使FL能够解决这个问题。特别是，我们展示了它在远程皮肤病学设置中的使用，其中皮肤病变图像分类器基于高度分布的移动设备网络中的数据不断更新。我们进行了一项情境实验，其中798个移动设备上的1268张皮肤病变图像有助于FL框架中3类分类器的训练。结果在数据稀缺和不平衡的情况下，广泛使用的同步FL方法性能较差。具体来说，使用fedag、FedProx和FedNova，训练后的分类器在各种实验设置的out测试集上分别获得了0.57-0.67、0.63-0.66和0.64-0.67的AUROC值。相比之下，我们提出的异步和焦点方法在40个全局训练周期后获得了0.78-0.89的测试AUROC。这一性能明显接近于最优AUROC(0.91)，这是通过在没有FL的集中式服务器上使用所有数据训练分类器可以实现的。结论这些结果表明，我们的方法为在现实远程医疗和IoMT应用中常见的数据稀缺和类不平衡条件下实现有效的FL方案提供了有用的解决方案。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Asynchronous and focal federated learning for skin lesion classification under local data scarcity and class imbalance

Background and Objectives

Federated learning (FL) is an approach that enables the training of machine learning (ML) models using data from multiple data nodes without direct data transfer, hence making it a good choice for healthcare ML applications to alleviate data privacy and security concerns. Most standard FL approaches focus on the setting of a small number of nodes, with each node contributing a sizable amount of data. However, in emerging healthcare settings such as telemedicine and the Internet of Medical Things (IoMT), it is necessary to consider the situation in which there is a large number of nodes, and each contributes a relatively small number (data scarcity) of non-independent (class imbalance) data points.

Methods

In this paper, we propose an asynchronous and focal update approach to enable FL to address this problem. In particular, we demonstrate its use in a teledermatology setting, in which a skin lesion image classifier is continuously updated based on data in a highly distributed network of mobile devices. We performed a situation experiment in which 1,268 skin lesion images across 798 mobile devices contributed to the training of a 3-class classifier in an FL framework.

Results

We found that widely used synchronous FL methods perform poorly under conditions of data scarcity and imbalance. Specifically, using FedAvg, FedProx, and FedNova, the trained classifiers achieved AUROC values of 0.57-0.67, 0.63-0.66, and 0.64-0.67, respectively, on the held-out test set across various experimental settings. In contrast, our proposed asynchronous and focal approach achieved a test AUROC of 0.78-0.89 after 40 global training epochs. This performance is significantly closer to the optimal AUROC of 0.91, which is achievable by training a classifier with all the data on a centralised server without FL.

Conclusions

These results demonstrate that our approach provides a useful solution to implement an efficient FL scheme under the conditions of data scarcity and class imbalance that are commonly found in realistic telemedicine and IoMT applications.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Computer methods and programs in biomedicine 工程技术-工程：生物医学

CiteScore

12.30

自引率

6.60%

发文量

601

审稿时长

135 days

期刊介绍： To encourage the development of formal computing methods, and their application in biomedical research and medical practice, by illustration of fundamental principles in biomedical informatics research; to stimulate basic research into application software design; to report the state of research of biomedical information processing projects; to report new computer methodologies applied in biomedical areas; the eventual distribution of demonstrable software to avoid duplication of effort; to provide a forum for discussion and improvement of existing software; to optimize contact between national organizations and regional user groups by promoting an international exchange of information on formal methods, standards and software in biomedicine. Computer Methods and Programs in Biomedicine covers computing methodology and software systems derived from computing science for implementation in all aspects of biomedical research and medical practice. It is designed to serve: biochemists; biologists; geneticists; immunologists; neuroscientists; pharmacologists; toxicologists; clinicians; epidemiologists; psychiatrists; psychologists; cardiologists; chemists; (radio)physicists; computer scientists; programmers and systems analysts; biomedical, clinical, electrical and other engineers; teachers of medical informatics and users of educational software.