Shichao Ma , Yun-Hin Chan , Edith C.H. Ngai , Joshua W.K. Ho
{"title":"局部数据稀缺和类别不平衡下皮肤病变分类的异步局部联邦学习","authors":"Shichao Ma , Yun-Hin Chan , Edith C.H. Ngai , Joshua W.K. Ho","doi":"10.1016/j.cmpb.2025.109073","DOIUrl":null,"url":null,"abstract":"<div><h3>Background and Objectives</h3><div>Federated learning (FL) is an approach that enables the training of machine learning (ML) models using data from multiple data nodes without direct data transfer, hence making it a good choice for healthcare ML applications to alleviate data privacy and security concerns. Most standard FL approaches focus on the setting of a small number of nodes, with each node contributing a sizable amount of data. However, in emerging healthcare settings such as telemedicine and the Internet of Medical Things (IoMT), it is necessary to consider the situation in which there is a large number of nodes, and each contributes a relatively small number (data scarcity) of non-independent (class imbalance) data points.</div></div><div><h3>Methods</h3><div>In this paper, we propose an asynchronous and focal update approach to enable FL to address this problem. In particular, we demonstrate its use in a teledermatology setting, in which a skin lesion image classifier is continuously updated based on data in a highly distributed network of mobile devices. We performed a situation experiment in which 1,268 skin lesion images across 798 mobile devices contributed to the training of a 3-class classifier in an FL framework.</div></div><div><h3>Results</h3><div>We found that widely used synchronous FL methods perform poorly under conditions of data scarcity and imbalance. Specifically, using FedAvg, FedProx, and FedNova, the trained classifiers achieved AUROC values of 0.57-0.67, 0.63-0.66, and 0.64-0.67, respectively, on the held-out test set across various experimental settings. In contrast, our proposed asynchronous and focal approach achieved a test AUROC of 0.78-0.89 after 40 global training epochs. This performance is significantly closer to the optimal AUROC of 0.91, which is achievable by training a classifier with all the data on a centralised server without FL.</div></div><div><h3>Conclusions</h3><div>These results demonstrate that our approach provides a useful solution to implement an efficient FL scheme under the conditions of data scarcity and class imbalance that are commonly found in realistic telemedicine and IoMT applications.</div></div>","PeriodicalId":10624,"journal":{"name":"Computer methods and programs in biomedicine","volume":"272 ","pages":"Article 109073"},"PeriodicalIF":4.8000,"publicationDate":"2025-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Asynchronous and focal federated learning for skin lesion classification under local data scarcity and class imbalance\",\"authors\":\"Shichao Ma , Yun-Hin Chan , Edith C.H. Ngai , Joshua W.K. Ho\",\"doi\":\"10.1016/j.cmpb.2025.109073\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Background and Objectives</h3><div>Federated learning (FL) is an approach that enables the training of machine learning (ML) models using data from multiple data nodes without direct data transfer, hence making it a good choice for healthcare ML applications to alleviate data privacy and security concerns. Most standard FL approaches focus on the setting of a small number of nodes, with each node contributing a sizable amount of data. However, in emerging healthcare settings such as telemedicine and the Internet of Medical Things (IoMT), it is necessary to consider the situation in which there is a large number of nodes, and each contributes a relatively small number (data scarcity) of non-independent (class imbalance) data points.</div></div><div><h3>Methods</h3><div>In this paper, we propose an asynchronous and focal update approach to enable FL to address this problem. In particular, we demonstrate its use in a teledermatology setting, in which a skin lesion image classifier is continuously updated based on data in a highly distributed network of mobile devices. We performed a situation experiment in which 1,268 skin lesion images across 798 mobile devices contributed to the training of a 3-class classifier in an FL framework.</div></div><div><h3>Results</h3><div>We found that widely used synchronous FL methods perform poorly under conditions of data scarcity and imbalance. Specifically, using FedAvg, FedProx, and FedNova, the trained classifiers achieved AUROC values of 0.57-0.67, 0.63-0.66, and 0.64-0.67, respectively, on the held-out test set across various experimental settings. In contrast, our proposed asynchronous and focal approach achieved a test AUROC of 0.78-0.89 after 40 global training epochs. This performance is significantly closer to the optimal AUROC of 0.91, which is achievable by training a classifier with all the data on a centralised server without FL.</div></div><div><h3>Conclusions</h3><div>These results demonstrate that our approach provides a useful solution to implement an efficient FL scheme under the conditions of data scarcity and class imbalance that are commonly found in realistic telemedicine and IoMT applications.</div></div>\",\"PeriodicalId\":10624,\"journal\":{\"name\":\"Computer methods and programs in biomedicine\",\"volume\":\"272 \",\"pages\":\"Article 109073\"},\"PeriodicalIF\":4.8000,\"publicationDate\":\"2025-09-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computer methods and programs in biomedicine\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0169260725004900\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer methods and programs in biomedicine","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0169260725004900","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
Asynchronous and focal federated learning for skin lesion classification under local data scarcity and class imbalance
Background and Objectives
Federated learning (FL) is an approach that enables the training of machine learning (ML) models using data from multiple data nodes without direct data transfer, hence making it a good choice for healthcare ML applications to alleviate data privacy and security concerns. Most standard FL approaches focus on the setting of a small number of nodes, with each node contributing a sizable amount of data. However, in emerging healthcare settings such as telemedicine and the Internet of Medical Things (IoMT), it is necessary to consider the situation in which there is a large number of nodes, and each contributes a relatively small number (data scarcity) of non-independent (class imbalance) data points.
Methods
In this paper, we propose an asynchronous and focal update approach to enable FL to address this problem. In particular, we demonstrate its use in a teledermatology setting, in which a skin lesion image classifier is continuously updated based on data in a highly distributed network of mobile devices. We performed a situation experiment in which 1,268 skin lesion images across 798 mobile devices contributed to the training of a 3-class classifier in an FL framework.
Results
We found that widely used synchronous FL methods perform poorly under conditions of data scarcity and imbalance. Specifically, using FedAvg, FedProx, and FedNova, the trained classifiers achieved AUROC values of 0.57-0.67, 0.63-0.66, and 0.64-0.67, respectively, on the held-out test set across various experimental settings. In contrast, our proposed asynchronous and focal approach achieved a test AUROC of 0.78-0.89 after 40 global training epochs. This performance is significantly closer to the optimal AUROC of 0.91, which is achievable by training a classifier with all the data on a centralised server without FL.
Conclusions
These results demonstrate that our approach provides a useful solution to implement an efficient FL scheme under the conditions of data scarcity and class imbalance that are commonly found in realistic telemedicine and IoMT applications.
期刊介绍:
To encourage the development of formal computing methods, and their application in biomedical research and medical practice, by illustration of fundamental principles in biomedical informatics research; to stimulate basic research into application software design; to report the state of research of biomedical information processing projects; to report new computer methodologies applied in biomedical areas; the eventual distribution of demonstrable software to avoid duplication of effort; to provide a forum for discussion and improvement of existing software; to optimize contact between national organizations and regional user groups by promoting an international exchange of information on formal methods, standards and software in biomedicine.
Computer Methods and Programs in Biomedicine covers computing methodology and software systems derived from computing science for implementation in all aspects of biomedical research and medical practice. It is designed to serve: biochemists; biologists; geneticists; immunologists; neuroscientists; pharmacologists; toxicologists; clinicians; epidemiologists; psychiatrists; psychologists; cardiologists; chemists; (radio)physicists; computer scientists; programmers and systems analysts; biomedical, clinical, electrical and other engineers; teachers of medical informatics and users of educational software.