Yuxuan Jiang, Qiang Ye, E. T. Fapi, Wenting Sun, Fudong Li
{"title":"分布式编码机器学习的工作量分配:从基于离线模型到无在线模型","authors":"Yuxuan Jiang, Qiang Ye, E. T. Fapi, Wenting Sun, Fudong Li","doi":"10.1109/IOTM.001.2300247","DOIUrl":null,"url":null,"abstract":"Distributed machine learning (ML) is an important Internet-of-Things (IoT) application. In traditional partitioned learning (PL) paradigm, a coordinator divides a high-dimensional dataset into subsets, which are processed on IoT devices. The execution time of PL can be seriously bottlenecked by slow devices named stragglers. To mitigate the negative impact of stragglers, distributed coded machine learning (DCML) was recently proposed to inject redundancy into the subsets using coding techniques. With this redundancy, the coordinator no longer requires the processing results from all devices, but only from a subgroup, where stragglers can be eliminated. This article aims to bring the burgeoning field of DCML to the wider community. After outlining the principles of DCML, we focus on its workload allocation, which addresses the appropriate level of injected redundancy to minimize the overall execution time. We highlight the fundamental trade-off and point out two critical design choices in workload allocation: model-based versus model-free, and offline versus online. Despite the predominance of offline model-based approaches in the literature, online model-based approaches also have a wide array of use case scenarios, but remain largely unexplored. At the end of the article, we propose the first online model-free workload allocation scheme for DCML, and identify future paths and opportunities along this direction.","PeriodicalId":235472,"journal":{"name":"IEEE Internet of Things Magazine","volume":"44 1","pages":"100-106"},"PeriodicalIF":0.0000,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Workload Allocation for Distributed Coded Machine Learning: From Offline Model-Based to Online Model-Free\",\"authors\":\"Yuxuan Jiang, Qiang Ye, E. T. Fapi, Wenting Sun, Fudong Li\",\"doi\":\"10.1109/IOTM.001.2300247\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Distributed machine learning (ML) is an important Internet-of-Things (IoT) application. In traditional partitioned learning (PL) paradigm, a coordinator divides a high-dimensional dataset into subsets, which are processed on IoT devices. The execution time of PL can be seriously bottlenecked by slow devices named stragglers. To mitigate the negative impact of stragglers, distributed coded machine learning (DCML) was recently proposed to inject redundancy into the subsets using coding techniques. With this redundancy, the coordinator no longer requires the processing results from all devices, but only from a subgroup, where stragglers can be eliminated. This article aims to bring the burgeoning field of DCML to the wider community. After outlining the principles of DCML, we focus on its workload allocation, which addresses the appropriate level of injected redundancy to minimize the overall execution time. We highlight the fundamental trade-off and point out two critical design choices in workload allocation: model-based versus model-free, and offline versus online. Despite the predominance of offline model-based approaches in the literature, online model-based approaches also have a wide array of use case scenarios, but remain largely unexplored. At the end of the article, we propose the first online model-free workload allocation scheme for DCML, and identify future paths and opportunities along this direction.\",\"PeriodicalId\":235472,\"journal\":{\"name\":\"IEEE Internet of Things Magazine\",\"volume\":\"44 1\",\"pages\":\"100-106\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Internet of Things Magazine\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IOTM.001.2300247\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Internet of Things Magazine","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IOTM.001.2300247","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Workload Allocation for Distributed Coded Machine Learning: From Offline Model-Based to Online Model-Free
Distributed machine learning (ML) is an important Internet-of-Things (IoT) application. In traditional partitioned learning (PL) paradigm, a coordinator divides a high-dimensional dataset into subsets, which are processed on IoT devices. The execution time of PL can be seriously bottlenecked by slow devices named stragglers. To mitigate the negative impact of stragglers, distributed coded machine learning (DCML) was recently proposed to inject redundancy into the subsets using coding techniques. With this redundancy, the coordinator no longer requires the processing results from all devices, but only from a subgroup, where stragglers can be eliminated. This article aims to bring the burgeoning field of DCML to the wider community. After outlining the principles of DCML, we focus on its workload allocation, which addresses the appropriate level of injected redundancy to minimize the overall execution time. We highlight the fundamental trade-off and point out two critical design choices in workload allocation: model-based versus model-free, and offline versus online. Despite the predominance of offline model-based approaches in the literature, online model-based approaches also have a wide array of use case scenarios, but remain largely unexplored. At the end of the article, we propose the first online model-free workload allocation scheme for DCML, and identify future paths and opportunities along this direction.