Workload Allocation for Distributed Coded Machine Learning: From Offline Model-Based to Online Model-Free

Yuxuan Jiang, Qiang Ye, E. T. Fapi, Wenting Sun, Fudong Li
{"title":"Workload Allocation for Distributed Coded Machine Learning: From Offline Model-Based to Online Model-Free","authors":"Yuxuan Jiang, Qiang Ye, E. T. Fapi, Wenting Sun, Fudong Li","doi":"10.1109/IOTM.001.2300247","DOIUrl":null,"url":null,"abstract":"Distributed machine learning (ML) is an important Internet-of-Things (IoT) application. In traditional partitioned learning (PL) paradigm, a coordinator divides a high-dimensional dataset into subsets, which are processed on IoT devices. The execution time of PL can be seriously bottlenecked by slow devices named stragglers. To mitigate the negative impact of stragglers, distributed coded machine learning (DCML) was recently proposed to inject redundancy into the subsets using coding techniques. With this redundancy, the coordinator no longer requires the processing results from all devices, but only from a subgroup, where stragglers can be eliminated. This article aims to bring the burgeoning field of DCML to the wider community. After outlining the principles of DCML, we focus on its workload allocation, which addresses the appropriate level of injected redundancy to minimize the overall execution time. We highlight the fundamental trade-off and point out two critical design choices in workload allocation: model-based versus model-free, and offline versus online. Despite the predominance of offline model-based approaches in the literature, online model-based approaches also have a wide array of use case scenarios, but remain largely unexplored. At the end of the article, we propose the first online model-free workload allocation scheme for DCML, and identify future paths and opportunities along this direction.","PeriodicalId":235472,"journal":{"name":"IEEE Internet of Things Magazine","volume":"44 1","pages":"100-106"},"PeriodicalIF":0.0000,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Internet of Things Magazine","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IOTM.001.2300247","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Distributed machine learning (ML) is an important Internet-of-Things (IoT) application. In traditional partitioned learning (PL) paradigm, a coordinator divides a high-dimensional dataset into subsets, which are processed on IoT devices. The execution time of PL can be seriously bottlenecked by slow devices named stragglers. To mitigate the negative impact of stragglers, distributed coded machine learning (DCML) was recently proposed to inject redundancy into the subsets using coding techniques. With this redundancy, the coordinator no longer requires the processing results from all devices, but only from a subgroup, where stragglers can be eliminated. This article aims to bring the burgeoning field of DCML to the wider community. After outlining the principles of DCML, we focus on its workload allocation, which addresses the appropriate level of injected redundancy to minimize the overall execution time. We highlight the fundamental trade-off and point out two critical design choices in workload allocation: model-based versus model-free, and offline versus online. Despite the predominance of offline model-based approaches in the literature, online model-based approaches also have a wide array of use case scenarios, but remain largely unexplored. At the end of the article, we propose the first online model-free workload allocation scheme for DCML, and identify future paths and opportunities along this direction.
分布式编码机器学习的工作量分配:从基于离线模型到无在线模型
分布式机器学习(ML)是一种重要的物联网(IoT)应用。在传统的分区学习(PL)模式中,协调者将高维数据集划分为若干子集,并在物联网设备上进行处理。被称为 "游离者 "的慢速设备会严重制约 PL 的执行时间。为了减轻散兵游勇的负面影响,最近有人提出了分布式编码机器学习(DCML),利用编码技术为子集注入冗余。有了这种冗余,协调器就不再需要所有设备的处理结果,而只需要一个子组的处理结果,这样就可以消除游离者。本文旨在将新兴的 DCML 领域介绍给更广泛的社区。在概述了 DCML 的原理后,我们重点讨论了其工作负载分配问题,即注入冗余的适当水平,以最大限度地减少整体执行时间。我们强调了基本权衡,并指出了工作量分配中的两个关键设计选择:基于模型与无模型,离线与在线。尽管基于模型的离线方法在文献中占主导地位,但基于模型的在线方法也有广泛的应用场景,但在很大程度上仍未被探索。在文章的最后,我们提出了首个适用于 DCML 的无模型在线工作负载分配方案,并指出了这一方向的未来发展路径和机遇。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信