Mădălin Mămuleanu, C. Ionete, Anca Albița, D. Selișteanu
{"title":"预测糖尿病风险的分布式深度学习模型,在不平衡数据集上训练","authors":"Mădălin Mămuleanu, C. Ionete, Anca Albița, D. Selișteanu","doi":"10.1109/ICCC54292.2022.9805989","DOIUrl":null,"url":null,"abstract":"When developing and training a deep learning model, usually the dataset is balanced and contains approximately the same number of samples for each class. However, in some fields, this is not always the case. In healthcare, a dataset can have imbalanced classes due to few investigations for a specific lesion, rare diseases or because the institution did not have enough patients in the study. Besides that, these datasets are usually distributed across many institutions (hospitals, healthcare centers) and trying to obtain a complete dataset is almost impossible, especially due to legal concerns. This paper proposes to train a deep learning model for predicting the risk of diabetes in a distributed way, called federated learning. Our assumptions are that the data is distributed across many entities and merging it is not possible. In federated learning, the deep learning model is trained across multiple entities. The training is coordinated by a server which, at the end of the training session, compiles a new model based on the results obtained by each entity. The dataset used in our paper is imbalanced, having only 268 positive cases from a total of 768 cases. Training a deep learning model on the dataset as it is can lead to a biased model. Hence, for solving this problem, oversampling techniques for balancing the dataset are applied.","PeriodicalId":167963,"journal":{"name":"2022 23rd International Carpathian Control Conference (ICCC)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Distributed Deep Learning Model for Predicting the Risk of Diabetes, Trained on Imbalanced Dataset\",\"authors\":\"Mădălin Mămuleanu, C. Ionete, Anca Albița, D. Selișteanu\",\"doi\":\"10.1109/ICCC54292.2022.9805989\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"When developing and training a deep learning model, usually the dataset is balanced and contains approximately the same number of samples for each class. However, in some fields, this is not always the case. In healthcare, a dataset can have imbalanced classes due to few investigations for a specific lesion, rare diseases or because the institution did not have enough patients in the study. Besides that, these datasets are usually distributed across many institutions (hospitals, healthcare centers) and trying to obtain a complete dataset is almost impossible, especially due to legal concerns. This paper proposes to train a deep learning model for predicting the risk of diabetes in a distributed way, called federated learning. Our assumptions are that the data is distributed across many entities and merging it is not possible. In federated learning, the deep learning model is trained across multiple entities. The training is coordinated by a server which, at the end of the training session, compiles a new model based on the results obtained by each entity. The dataset used in our paper is imbalanced, having only 268 positive cases from a total of 768 cases. Training a deep learning model on the dataset as it is can lead to a biased model. Hence, for solving this problem, oversampling techniques for balancing the dataset are applied.\",\"PeriodicalId\":167963,\"journal\":{\"name\":\"2022 23rd International Carpathian Control Conference (ICCC)\",\"volume\":\"13 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-05-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 23rd International Carpathian Control Conference (ICCC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCC54292.2022.9805989\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 23rd International Carpathian Control Conference (ICCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCC54292.2022.9805989","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Distributed Deep Learning Model for Predicting the Risk of Diabetes, Trained on Imbalanced Dataset
When developing and training a deep learning model, usually the dataset is balanced and contains approximately the same number of samples for each class. However, in some fields, this is not always the case. In healthcare, a dataset can have imbalanced classes due to few investigations for a specific lesion, rare diseases or because the institution did not have enough patients in the study. Besides that, these datasets are usually distributed across many institutions (hospitals, healthcare centers) and trying to obtain a complete dataset is almost impossible, especially due to legal concerns. This paper proposes to train a deep learning model for predicting the risk of diabetes in a distributed way, called federated learning. Our assumptions are that the data is distributed across many entities and merging it is not possible. In federated learning, the deep learning model is trained across multiple entities. The training is coordinated by a server which, at the end of the training session, compiles a new model based on the results obtained by each entity. The dataset used in our paper is imbalanced, having only 268 positive cases from a total of 768 cases. Training a deep learning model on the dataset as it is can lead to a biased model. Hence, for solving this problem, oversampling techniques for balancing the dataset are applied.