{"title":"基于局部知识聚合和知识蒸馏的异构系统语音识别分布式训练","authors":"Hongrui Shi, Valentin Radu, Po Yang","doi":"10.1145/3578356.3592591","DOIUrl":null,"url":null,"abstract":"Data privacy and data protection are crucial issues for automatic speech recognition (ASR) system when relying on client generated data for training. The best protection is achieved when training is distributed fashion, close to the client local data, rather than centralising the training. However, distributed training suffers from system heterogeneity, due to clients having unequal computation resources, and data heterogeneity, due to training data being non-independent and identically distributed (non-IID). To tackle these challenges, we introduce FedKAD, a Federated Learning (FL) framework that uses local Knowledge Aggregation over top level feature maps and Knowledge Distillation. We show that our FedKAD achieves better communication efficiency than standard FL methods that use uniform models, due to transferring parameters of smaller size client models, and overall better accuracy than FedMD, an alternative KD-based approach designed for heterogeneous data. Our work enables faster, cheaper and more inclusive participation of clients in heterogeneous distributed training.","PeriodicalId":370204,"journal":{"name":"Proceedings of the 3rd Workshop on Machine Learning and Systems","volume":"60 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Distributed Training for Speech Recognition using Local Knowledge Aggregation and Knowledge Distillation in Heterogeneous Systems\",\"authors\":\"Hongrui Shi, Valentin Radu, Po Yang\",\"doi\":\"10.1145/3578356.3592591\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Data privacy and data protection are crucial issues for automatic speech recognition (ASR) system when relying on client generated data for training. The best protection is achieved when training is distributed fashion, close to the client local data, rather than centralising the training. However, distributed training suffers from system heterogeneity, due to clients having unequal computation resources, and data heterogeneity, due to training data being non-independent and identically distributed (non-IID). To tackle these challenges, we introduce FedKAD, a Federated Learning (FL) framework that uses local Knowledge Aggregation over top level feature maps and Knowledge Distillation. We show that our FedKAD achieves better communication efficiency than standard FL methods that use uniform models, due to transferring parameters of smaller size client models, and overall better accuracy than FedMD, an alternative KD-based approach designed for heterogeneous data. Our work enables faster, cheaper and more inclusive participation of clients in heterogeneous distributed training.\",\"PeriodicalId\":370204,\"journal\":{\"name\":\"Proceedings of the 3rd Workshop on Machine Learning and Systems\",\"volume\":\"60 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-05-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 3rd Workshop on Machine Learning and Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3578356.3592591\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 3rd Workshop on Machine Learning and Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3578356.3592591","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Distributed Training for Speech Recognition using Local Knowledge Aggregation and Knowledge Distillation in Heterogeneous Systems
Data privacy and data protection are crucial issues for automatic speech recognition (ASR) system when relying on client generated data for training. The best protection is achieved when training is distributed fashion, close to the client local data, rather than centralising the training. However, distributed training suffers from system heterogeneity, due to clients having unequal computation resources, and data heterogeneity, due to training data being non-independent and identically distributed (non-IID). To tackle these challenges, we introduce FedKAD, a Federated Learning (FL) framework that uses local Knowledge Aggregation over top level feature maps and Knowledge Distillation. We show that our FedKAD achieves better communication efficiency than standard FL methods that use uniform models, due to transferring parameters of smaller size client models, and overall better accuracy than FedMD, an alternative KD-based approach designed for heterogeneous data. Our work enables faster, cheaper and more inclusive participation of clients in heterogeneous distributed training.