{"title":"联邦学习系统中数据不平衡及异步聚合算法研究","authors":"Senapati Sang Diwangkara, A. I. Kistijantoro","doi":"10.1109/ICITSI50517.2020.9264958","DOIUrl":null,"url":null,"abstract":"As the use of machine learning techniques are becoming more widespread, the need for more elaborate dataset is becoming more prevalent. This is usually done with data collection methods that pay little to no attention to the data owner’s privacy and consent. Federated learning is an approach that tries to solve this problem, where such system can train a machine learning model without centrally storing the needed data. But one weakness of the current implementation is that they have a slow convergence time, despite the fact that they distribute the task on many nodes. This is mainly caused by the synchronous nature of the current algorithm. In this paper, we observe the effect that asynchronous aggregation algorithm has on convergence time and test the two factors that might affect it – staleness and data imbalance – on various levels. We implement the asynchronous aggregation algorithm by adapting the Stale Synchronous Parallel algorithm. We test our system on MNIST dataset and found that asynchronous aggregation algorithm improves convergence time in a federated learning system that has large inequality in server-wise update frequency and has a relatively balanced data distribution.","PeriodicalId":286828,"journal":{"name":"2020 International Conference on Information Technology Systems and Innovation (ICITSI)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"Study of Data Imbalance and Asynchronous Aggregation Algorithm on Federated Learning System\",\"authors\":\"Senapati Sang Diwangkara, A. I. Kistijantoro\",\"doi\":\"10.1109/ICITSI50517.2020.9264958\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"As the use of machine learning techniques are becoming more widespread, the need for more elaborate dataset is becoming more prevalent. This is usually done with data collection methods that pay little to no attention to the data owner’s privacy and consent. Federated learning is an approach that tries to solve this problem, where such system can train a machine learning model without centrally storing the needed data. But one weakness of the current implementation is that they have a slow convergence time, despite the fact that they distribute the task on many nodes. This is mainly caused by the synchronous nature of the current algorithm. In this paper, we observe the effect that asynchronous aggregation algorithm has on convergence time and test the two factors that might affect it – staleness and data imbalance – on various levels. We implement the asynchronous aggregation algorithm by adapting the Stale Synchronous Parallel algorithm. We test our system on MNIST dataset and found that asynchronous aggregation algorithm improves convergence time in a federated learning system that has large inequality in server-wise update frequency and has a relatively balanced data distribution.\",\"PeriodicalId\":286828,\"journal\":{\"name\":\"2020 International Conference on Information Technology Systems and Innovation (ICITSI)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-10-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 International Conference on Information Technology Systems and Innovation (ICITSI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICITSI50517.2020.9264958\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 International Conference on Information Technology Systems and Innovation (ICITSI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICITSI50517.2020.9264958","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Study of Data Imbalance and Asynchronous Aggregation Algorithm on Federated Learning System
As the use of machine learning techniques are becoming more widespread, the need for more elaborate dataset is becoming more prevalent. This is usually done with data collection methods that pay little to no attention to the data owner’s privacy and consent. Federated learning is an approach that tries to solve this problem, where such system can train a machine learning model without centrally storing the needed data. But one weakness of the current implementation is that they have a slow convergence time, despite the fact that they distribute the task on many nodes. This is mainly caused by the synchronous nature of the current algorithm. In this paper, we observe the effect that asynchronous aggregation algorithm has on convergence time and test the two factors that might affect it – staleness and data imbalance – on various levels. We implement the asynchronous aggregation algorithm by adapting the Stale Synchronous Parallel algorithm. We test our system on MNIST dataset and found that asynchronous aggregation algorithm improves convergence time in a federated learning system that has large inequality in server-wise update frequency and has a relatively balanced data distribution.