物联网网络中基于区块链的联盟学习的新型资源管理框架

IF 3 3区 计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE
Aman Mishra;Yash Garg;Om Jee Pandey;Mahendra K. Shukla;Athanasios V. Vasilakos;Rajesh M. Hegde
{"title":"物联网网络中基于区块链的联盟学习的新型资源管理框架","authors":"Aman Mishra;Yash Garg;Om Jee Pandey;Mahendra K. Shukla;Athanasios V. Vasilakos;Rajesh M. Hegde","doi":"10.1109/TSUSC.2024.3358915","DOIUrl":null,"url":null,"abstract":"At present, the centralized learning models, used for IoT applications generating large amount of data, face several challenges such as bandwidth scarcity, more energy consumption, increased uses of computing resources, poor connectivity, high computational complexity, reduced privacy, and large latency towards data transfer. In order to address the aforementioned challenges, Blockchain-Enabled Federated Learning Networks (BFLNs) emerged recently, which deal with trained model parameters only, rather than raw data. BFLNs provide enhanced security along with improved energy-efficiency and Quality-of-Service (QoS). However, BFLNs suffer with the challenges of exponential increased action space in deciding various parameter levels towards training and block generation. Motivated by aforementioned challenges of BFLNs, in this work, we are proposing an actor-critic Reinforcement Learning (RL) method to model the Machine Learning Model Owner (MLMO) in selecting the optimal set of parameter levels, addressing the challenges of exponential grow of action space in BFLNs. Further, due to the implicit entropy exploration, actor-critic RL method balances the exploration-exploitation trade-off and shows better performance than most off-policy methods, on large discrete action spaces. Therefore, in this work, considering the mobile scenario of the devices, MLMO decides the data and energy levels that the mobile devices use for the training and determine the block generation rate. This leads to minimized system latency and reduced overall cost, while achieving the target accuracy. Specifically, we have used Proximal Policy Optimization (PPO) as an on-policy actor-critic method with it's two variants, one based on Monte Carlo (MC) returns and another based on Generalized Advantage Estimate (GAE). We analyzed that PPO has better exploration and sample efficiency, lesser training time, and consistently higher cumulative rewards, when compared to off-policy Deep Q-Network (DQN).","PeriodicalId":13268,"journal":{"name":"IEEE Transactions on Sustainable Computing","volume":"9 4","pages":"648-660"},"PeriodicalIF":3.0000,"publicationDate":"2024-01-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Novel Resource Management Framework for Blockchain-Based Federated Learning in IoT Networks\",\"authors\":\"Aman Mishra;Yash Garg;Om Jee Pandey;Mahendra K. Shukla;Athanasios V. Vasilakos;Rajesh M. Hegde\",\"doi\":\"10.1109/TSUSC.2024.3358915\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"At present, the centralized learning models, used for IoT applications generating large amount of data, face several challenges such as bandwidth scarcity, more energy consumption, increased uses of computing resources, poor connectivity, high computational complexity, reduced privacy, and large latency towards data transfer. In order to address the aforementioned challenges, Blockchain-Enabled Federated Learning Networks (BFLNs) emerged recently, which deal with trained model parameters only, rather than raw data. BFLNs provide enhanced security along with improved energy-efficiency and Quality-of-Service (QoS). However, BFLNs suffer with the challenges of exponential increased action space in deciding various parameter levels towards training and block generation. Motivated by aforementioned challenges of BFLNs, in this work, we are proposing an actor-critic Reinforcement Learning (RL) method to model the Machine Learning Model Owner (MLMO) in selecting the optimal set of parameter levels, addressing the challenges of exponential grow of action space in BFLNs. Further, due to the implicit entropy exploration, actor-critic RL method balances the exploration-exploitation trade-off and shows better performance than most off-policy methods, on large discrete action spaces. Therefore, in this work, considering the mobile scenario of the devices, MLMO decides the data and energy levels that the mobile devices use for the training and determine the block generation rate. This leads to minimized system latency and reduced overall cost, while achieving the target accuracy. Specifically, we have used Proximal Policy Optimization (PPO) as an on-policy actor-critic method with it's two variants, one based on Monte Carlo (MC) returns and another based on Generalized Advantage Estimate (GAE). We analyzed that PPO has better exploration and sample efficiency, lesser training time, and consistently higher cumulative rewards, when compared to off-policy Deep Q-Network (DQN).\",\"PeriodicalId\":13268,\"journal\":{\"name\":\"IEEE Transactions on Sustainable Computing\",\"volume\":\"9 4\",\"pages\":\"648-660\"},\"PeriodicalIF\":3.0000,\"publicationDate\":\"2024-01-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Sustainable Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10415198/\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Sustainable Computing","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10415198/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0

摘要

目前,用于产生大量数据的物联网应用的集中式学习模型面临着一些挑战,如带宽稀缺、能耗增加、计算资源使用增多、连接性差、计算复杂度高、隐私性降低以及数据传输延迟大等。为了应对上述挑战,最近出现了区块链联合学习网络(Blockchain-Enabled Federated Learning Networks,BFLNs),它只处理经过训练的模型参数,而不是原始数据。BFLNs 在提高能效和服务质量(QoS)的同时,还增强了安全性。然而,BFLNs 在决定训练和区块生成的各种参数水平时,面临着行动空间呈指数级增长的挑战。受 BFLNs 面临的上述挑战的启发,在这项工作中,我们提出了一种行为批判强化学习(RL)方法,以模拟机器学习模型所有者(MLMO)选择最佳参数水平集的过程,从而解决 BFLNs 行动空间呈指数增长的挑战。此外,由于隐式熵探索,演员批判 RL 方法平衡了探索与开发之间的权衡,在大型离散行动空间上比大多数非策略方法表现出更好的性能。因此,在这项工作中,考虑到设备的移动场景,MLMO 决定了移动设备用于训练的数据和能量水平,并决定了区块生成率。这样就能在实现目标精度的同时,最大限度地减少系统延迟,降低总体成本。具体来说,我们使用了近端策略优化(PPO)作为策略上的行为者批判方法,它有两种变体,一种基于蒙特卡罗(MC)回报,另一种基于广义优势估计(GAE)。我们分析发现,与非政策深度 Q 网络(DQN)相比,PPO 具有更好的探索和采样效率、更少的训练时间和持续更高的累积奖励。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
A Novel Resource Management Framework for Blockchain-Based Federated Learning in IoT Networks
At present, the centralized learning models, used for IoT applications generating large amount of data, face several challenges such as bandwidth scarcity, more energy consumption, increased uses of computing resources, poor connectivity, high computational complexity, reduced privacy, and large latency towards data transfer. In order to address the aforementioned challenges, Blockchain-Enabled Federated Learning Networks (BFLNs) emerged recently, which deal with trained model parameters only, rather than raw data. BFLNs provide enhanced security along with improved energy-efficiency and Quality-of-Service (QoS). However, BFLNs suffer with the challenges of exponential increased action space in deciding various parameter levels towards training and block generation. Motivated by aforementioned challenges of BFLNs, in this work, we are proposing an actor-critic Reinforcement Learning (RL) method to model the Machine Learning Model Owner (MLMO) in selecting the optimal set of parameter levels, addressing the challenges of exponential grow of action space in BFLNs. Further, due to the implicit entropy exploration, actor-critic RL method balances the exploration-exploitation trade-off and shows better performance than most off-policy methods, on large discrete action spaces. Therefore, in this work, considering the mobile scenario of the devices, MLMO decides the data and energy levels that the mobile devices use for the training and determine the block generation rate. This leads to minimized system latency and reduced overall cost, while achieving the target accuracy. Specifically, we have used Proximal Policy Optimization (PPO) as an on-policy actor-critic method with it's two variants, one based on Monte Carlo (MC) returns and another based on Generalized Advantage Estimate (GAE). We analyzed that PPO has better exploration and sample efficiency, lesser training time, and consistently higher cumulative rewards, when compared to off-policy Deep Q-Network (DQN).
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
IEEE Transactions on Sustainable Computing
IEEE Transactions on Sustainable Computing Mathematics-Control and Optimization
CiteScore
7.70
自引率
2.60%
发文量
54
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信