Guojun Tang, Jason E. Black, Tyler S. Williamson, Steve H. Drew
{"title":"利用真实的跨省初级保健数据对加拿大成年人进行联合糖尿病预测","authors":"Guojun Tang, Jason E. Black, Tyler S. Williamson, Steve H. Drew","doi":"arxiv-2408.12029","DOIUrl":null,"url":null,"abstract":"Integrating Electronic Health Records (EHR) and the application of machine\nlearning present opportunities for enhancing the accuracy and accessibility of\ndata-driven diabetes prediction. In particular, developing data-driven machine\nlearning models can provide early identification of patients with high risk for\ndiabetes, potentially leading to more effective therapeutic strategies and\nreduced healthcare costs. However, regulation restrictions create barriers to\ndeveloping centralized predictive models. This paper addresses the challenges\nby introducing a federated learning approach, which amalgamates predictive\nmodels without centralized data storage and processing, thus avoiding privacy\nissues. This marks the first application of federated learning to predict\ndiabetes using real clinical datasets in Canada extracted from the Canadian\nPrimary Care Sentinel Surveillance Network (CPCSSN) without crossprovince\npatient data sharing. We address class-imbalance issues through downsampling\ntechniques and compare federated learning performance against province-based\nand centralized models. Experimental results show that the federated MLP model\npresents a similar or higher performance compared to the model trained with the\ncentralized approach. However, the federated logistic regression model showed\ninferior performance compared to its centralized peer.","PeriodicalId":501309,"journal":{"name":"arXiv - CS - Computational Engineering, Finance, and Science","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Federated Diabetes Prediction in Canadian Adults Using Real-world Cross-Province Primary Care Data\",\"authors\":\"Guojun Tang, Jason E. Black, Tyler S. Williamson, Steve H. Drew\",\"doi\":\"arxiv-2408.12029\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Integrating Electronic Health Records (EHR) and the application of machine\\nlearning present opportunities for enhancing the accuracy and accessibility of\\ndata-driven diabetes prediction. In particular, developing data-driven machine\\nlearning models can provide early identification of patients with high risk for\\ndiabetes, potentially leading to more effective therapeutic strategies and\\nreduced healthcare costs. However, regulation restrictions create barriers to\\ndeveloping centralized predictive models. This paper addresses the challenges\\nby introducing a federated learning approach, which amalgamates predictive\\nmodels without centralized data storage and processing, thus avoiding privacy\\nissues. This marks the first application of federated learning to predict\\ndiabetes using real clinical datasets in Canada extracted from the Canadian\\nPrimary Care Sentinel Surveillance Network (CPCSSN) without crossprovince\\npatient data sharing. We address class-imbalance issues through downsampling\\ntechniques and compare federated learning performance against province-based\\nand centralized models. Experimental results show that the federated MLP model\\npresents a similar or higher performance compared to the model trained with the\\ncentralized approach. However, the federated logistic regression model showed\\ninferior performance compared to its centralized peer.\",\"PeriodicalId\":501309,\"journal\":{\"name\":\"arXiv - CS - Computational Engineering, Finance, and Science\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-08-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Computational Engineering, Finance, and Science\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2408.12029\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Computational Engineering, Finance, and Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.12029","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Federated Diabetes Prediction in Canadian Adults Using Real-world Cross-Province Primary Care Data
Integrating Electronic Health Records (EHR) and the application of machine
learning present opportunities for enhancing the accuracy and accessibility of
data-driven diabetes prediction. In particular, developing data-driven machine
learning models can provide early identification of patients with high risk for
diabetes, potentially leading to more effective therapeutic strategies and
reduced healthcare costs. However, regulation restrictions create barriers to
developing centralized predictive models. This paper addresses the challenges
by introducing a federated learning approach, which amalgamates predictive
models without centralized data storage and processing, thus avoiding privacy
issues. This marks the first application of federated learning to predict
diabetes using real clinical datasets in Canada extracted from the Canadian
Primary Care Sentinel Surveillance Network (CPCSSN) without crossprovince
patient data sharing. We address class-imbalance issues through downsampling
techniques and compare federated learning performance against province-based
and centralized models. Experimental results show that the federated MLP model
presents a similar or higher performance compared to the model trained with the
centralized approach. However, the federated logistic regression model showed
inferior performance compared to its centralized peer.