雾环境下分布式机器学习框架的比较：概念和性能分析

IF 7.6 3区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

Internet of Things Pub Date : 2025-09-27 DOI:10.1016/j.iot.2025.101774

Anusri Sanyadanam, Satish Narayana Srirama

{"title":"雾环境下分布式机器学习框架的比较：概念和性能分析","authors":"Anusri Sanyadanam, Satish Narayana Srirama","doi":"10.1016/j.iot.2025.101774","DOIUrl":null,"url":null,"abstract":"<div><div>The growing demand for real-time, latency-sensitive, and privacy-preserving analytics in IoT has brought fog computing as an alternative to cloud-based processing. However, training machine learning and deep learning (ML/DL) models in fog environments remains challenging due to limited computational resources. Despite the availability of numerous distributed ML frameworks, there is a lack of a comprehensive evaluation tailored to fog devices. This study conducts a comparative analysis of distributed ML frameworks for neural network training on resource-constrained fog nodes, using Raspberry Pi (RPi) devices. We started with Actor programming model-based frameworks and the study extended to general purpose distributed frameworks suitable for fog computing devices. We evaluate four actor-model-based frameworks (Akkordeon, DistBelief with Akka, Aktorain, and CANTO) along with general-purpose distributed frameworks (KubeRay, TensorFlow MultiWorkerMirroredStrategy (MWMS), Dask Distributed and Spark with Elephas). The frameworks are compared across key metrics including training time, accuracy, and resource utilization on diverse datasets. Our results highlight performance trade-offs: KubeRay offers a balance between efficiency and performance, Dask and MWMS achieve higher accuracy with increased latency, while Spark with Elephas excels in speed but struggles with accuracy. Although CANTO is optimized for fog-based training, it faces challenges with complex datasets. Overall, KubeRay emerges as the most practical choice for fog-based ML training because of its additional support for scalability and fault tolerance. This work bridges a critical research gap by providing experimental insights into the feasibility and performance of distributed ML frameworks in fog computing environments.</div></div>","PeriodicalId":29968,"journal":{"name":"Internet of Things","volume":"34 ","pages":"Article 101774"},"PeriodicalIF":7.6000,"publicationDate":"2025-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Comparison of distributed Machine Learning frameworks in a fog environment: Conceptual and Performance analysis\",\"authors\":\"Anusri Sanyadanam, Satish Narayana Srirama\",\"doi\":\"10.1016/j.iot.2025.101774\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>The growing demand for real-time, latency-sensitive, and privacy-preserving analytics in IoT has brought fog computing as an alternative to cloud-based processing. However, training machine learning and deep learning (ML/DL) models in fog environments remains challenging due to limited computational resources. Despite the availability of numerous distributed ML frameworks, there is a lack of a comprehensive evaluation tailored to fog devices. This study conducts a comparative analysis of distributed ML frameworks for neural network training on resource-constrained fog nodes, using Raspberry Pi (RPi) devices. We started with Actor programming model-based frameworks and the study extended to general purpose distributed frameworks suitable for fog computing devices. We evaluate four actor-model-based frameworks (Akkordeon, DistBelief with Akka, Aktorain, and CANTO) along with general-purpose distributed frameworks (KubeRay, TensorFlow MultiWorkerMirroredStrategy (MWMS), Dask Distributed and Spark with Elephas). The frameworks are compared across key metrics including training time, accuracy, and resource utilization on diverse datasets. Our results highlight performance trade-offs: KubeRay offers a balance between efficiency and performance, Dask and MWMS achieve higher accuracy with increased latency, while Spark with Elephas excels in speed but struggles with accuracy. Although CANTO is optimized for fog-based training, it faces challenges with complex datasets. Overall, KubeRay emerges as the most practical choice for fog-based ML training because of its additional support for scalability and fault tolerance. This work bridges a critical research gap by providing experimental insights into the feasibility and performance of distributed ML frameworks in fog computing environments.</div></div>\",\"PeriodicalId\":29968,\"journal\":{\"name\":\"Internet of Things\",\"volume\":\"34 \",\"pages\":\"Article 101774\"},\"PeriodicalIF\":7.6000,\"publicationDate\":\"2025-09-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Internet of Things\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2542660525002884\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Internet of Things","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2542660525002884","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

物联网中对实时、延迟敏感和隐私保护分析的需求不断增长，这使得雾计算成为基于云的处理的替代方案。然而，由于计算资源有限，在雾环境中训练机器学习和深度学习（ML/DL）模型仍然具有挑战性。尽管有许多分布式机器学习框架可用，但缺乏针对雾设备的全面评估。本研究使用树莓派（RPi）设备对资源受限雾节点上神经网络训练的分布式ML框架进行了比较分析。我们从Actor编程基于模型的框架开始，并将研究扩展到适用于雾计算设备的通用分布式框架。我们评估了四个基于参与者模型的框架（Akkordeon， DistBelief与Akka， Aktorain和CANTO）以及通用分布式框架（KubeRay, TensorFlow multiworkermirrorredstrategy (MWMS)， Dask distributed和Spark with大象）。这些框架通过关键指标进行比较，包括训练时间、准确性和不同数据集上的资源利用率。我们的结果突出了性能权衡：KubeRay提供了效率和性能之间的平衡，Dask和MWMS在增加延迟的情况下实现了更高的准确性，而Spark with elephhas在速度方面表现出色，但在准确性方面表现不佳。尽管CANTO针对基于雾的训练进行了优化，但它面临着复杂数据集的挑战。总的来说，KubeRay是基于雾的机器学习训练最实用的选择，因为它对可扩展性和容错性的额外支持。这项工作通过提供对雾计算环境中分布式机器学习框架的可行性和性能的实验见解，弥合了一个关键的研究差距。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Comparison of distributed Machine Learning frameworks in a fog environment: Conceptual and Performance analysis

The growing demand for real-time, latency-sensitive, and privacy-preserving analytics in IoT has brought fog computing as an alternative to cloud-based processing. However, training machine learning and deep learning (ML/DL) models in fog environments remains challenging due to limited computational resources. Despite the availability of numerous distributed ML frameworks, there is a lack of a comprehensive evaluation tailored to fog devices. This study conducts a comparative analysis of distributed ML frameworks for neural network training on resource-constrained fog nodes, using Raspberry Pi (RPi) devices. We started with Actor programming model-based frameworks and the study extended to general purpose distributed frameworks suitable for fog computing devices. We evaluate four actor-model-based frameworks (Akkordeon, DistBelief with Akka, Aktorain, and CANTO) along with general-purpose distributed frameworks (KubeRay, TensorFlow MultiWorkerMirroredStrategy (MWMS), Dask Distributed and Spark with Elephas). The frameworks are compared across key metrics including training time, accuracy, and resource utilization on diverse datasets. Our results highlight performance trade-offs: KubeRay offers a balance between efficiency and performance, Dask and MWMS achieve higher accuracy with increased latency, while Spark with Elephas excels in speed but struggles with accuracy. Although CANTO is optimized for fog-based training, it faces challenges with complex datasets. Overall, KubeRay emerges as the most practical choice for fog-based ML training because of its additional support for scalability and fault tolerance. This work bridges a critical research gap by providing experimental insights into the feasibility and performance of distributed ML frameworks in fog computing environments.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Internet of Things Multiple-

CiteScore

3.60

自引率

5.10%

发文量

115

审稿时长

37 days

期刊介绍： Internet of Things; Engineering Cyber Physical Human Systems is a comprehensive journal encouraging cross collaboration between researchers, engineers and practitioners in the field of IoT & Cyber Physical Human Systems. The journal offers a unique platform to exchange scientific information on the entire breadth of technology, science, and societal applications of the IoT. The journal will place a high priority on timely publication, and provide a home for high quality. Furthermore, IOT is interested in publishing topical Special Issues on any aspect of IOT.