Mădălin Mămuleanu, C. Ionete, Anca Albița, D. Selișteanu
{"title":"Distributed Deep Learning Model for Predicting the Risk of Diabetes, Trained on Imbalanced Dataset","authors":"Mădălin Mămuleanu, C. Ionete, Anca Albița, D. Selișteanu","doi":"10.1109/ICCC54292.2022.9805989","DOIUrl":null,"url":null,"abstract":"When developing and training a deep learning model, usually the dataset is balanced and contains approximately the same number of samples for each class. However, in some fields, this is not always the case. In healthcare, a dataset can have imbalanced classes due to few investigations for a specific lesion, rare diseases or because the institution did not have enough patients in the study. Besides that, these datasets are usually distributed across many institutions (hospitals, healthcare centers) and trying to obtain a complete dataset is almost impossible, especially due to legal concerns. This paper proposes to train a deep learning model for predicting the risk of diabetes in a distributed way, called federated learning. Our assumptions are that the data is distributed across many entities and merging it is not possible. In federated learning, the deep learning model is trained across multiple entities. The training is coordinated by a server which, at the end of the training session, compiles a new model based on the results obtained by each entity. The dataset used in our paper is imbalanced, having only 268 positive cases from a total of 768 cases. Training a deep learning model on the dataset as it is can lead to a biased model. Hence, for solving this problem, oversampling techniques for balancing the dataset are applied.","PeriodicalId":167963,"journal":{"name":"2022 23rd International Carpathian Control Conference (ICCC)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 23rd International Carpathian Control Conference (ICCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCC54292.2022.9805989","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
When developing and training a deep learning model, usually the dataset is balanced and contains approximately the same number of samples for each class. However, in some fields, this is not always the case. In healthcare, a dataset can have imbalanced classes due to few investigations for a specific lesion, rare diseases or because the institution did not have enough patients in the study. Besides that, these datasets are usually distributed across many institutions (hospitals, healthcare centers) and trying to obtain a complete dataset is almost impossible, especially due to legal concerns. This paper proposes to train a deep learning model for predicting the risk of diabetes in a distributed way, called federated learning. Our assumptions are that the data is distributed across many entities and merging it is not possible. In federated learning, the deep learning model is trained across multiple entities. The training is coordinated by a server which, at the end of the training session, compiles a new model based on the results obtained by each entity. The dataset used in our paper is imbalanced, having only 268 positive cases from a total of 768 cases. Training a deep learning model on the dataset as it is can lead to a biased model. Hence, for solving this problem, oversampling techniques for balancing the dataset are applied.