基于强化学习的异构NFV环境流调度

2021 IEEE International Conference on Networking, Architecture and Storage (NAS) Pub Date : 2021-10-01 DOI:10.1109/nas51552.2021.9605395

Chun Jen Lin, Yan Luo, Liang-Min Wang, Li-De Chen

{"title":"基于强化学习的异构NFV环境流调度","authors":"Chun Jen Lin, Yan Luo, Liang-Min Wang, Li-De Chen","doi":"10.1109/nas51552.2021.9605395","DOIUrl":null,"url":null,"abstract":"Network function virtualization (NFV) allows net-work functions executed on general-purpose servers or virtual machines (VMs) instead of proprietary hardware, greatly improving the flexibility and scalability of network services. Recent trends in using programmable accelerators to speed up NFV performance introduce challenges in flow scheduling in a dynamic NFV environment. Reinforcement learning (RL) trains machine learning models for decision making to maximize returns in uncertain environments such as NFV. In this paper, we study the allocation of heterogeneous processors (CPUs and FPGAs) to minimize the delays of flows in the system. We conduct extensive simulations to evaluate the performance of reinforcement learning based scheduling algorithms such as Advantage Actor Critic (A2C), Trust Region Policy Optimization (TRPO) and Proximal Policy Optimization (PPO), and compare with greedy policies. The results show that RL based schedulers can effectively learn from past experiences and converge to the optimal greedy policy. We also analyze in-depth how the policies lead to different processor utilization and flow processing time, and provide insights into these policies.","PeriodicalId":135930,"journal":{"name":"2021 IEEE International Conference on Networking, Architecture and Storage (NAS)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Flow Scheduling in a Heterogeneous NFV Environment using Reinforcement Learning\",\"authors\":\"Chun Jen Lin, Yan Luo, Liang-Min Wang, Li-De Chen\",\"doi\":\"10.1109/nas51552.2021.9605395\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Network function virtualization (NFV) allows net-work functions executed on general-purpose servers or virtual machines (VMs) instead of proprietary hardware, greatly improving the flexibility and scalability of network services. Recent trends in using programmable accelerators to speed up NFV performance introduce challenges in flow scheduling in a dynamic NFV environment. Reinforcement learning (RL) trains machine learning models for decision making to maximize returns in uncertain environments such as NFV. In this paper, we study the allocation of heterogeneous processors (CPUs and FPGAs) to minimize the delays of flows in the system. We conduct extensive simulations to evaluate the performance of reinforcement learning based scheduling algorithms such as Advantage Actor Critic (A2C), Trust Region Policy Optimization (TRPO) and Proximal Policy Optimization (PPO), and compare with greedy policies. The results show that RL based schedulers can effectively learn from past experiences and converge to the optimal greedy policy. We also analyze in-depth how the policies lead to different processor utilization and flow processing time, and provide insights into these policies.\",\"PeriodicalId\":135930,\"journal\":{\"name\":\"2021 IEEE International Conference on Networking, Architecture and Storage (NAS)\",\"volume\":\"28 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE International Conference on Networking, Architecture and Storage (NAS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/nas51552.2021.9605395\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Conference on Networking, Architecture and Storage (NAS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/nas51552.2021.9605395","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

网络功能虚拟化(NFV)允许在通用服务器或虚拟机上执行网络功能，而不是在专用硬件上执行，从而大大提高了网络服务的灵活性和可伸缩性。最近使用可编程加速器加速NFV性能的趋势给动态NFV环境中的流量调度带来了挑战。强化学习(RL)训练机器学习模型，用于决策制定，以在NFV等不确定环境中实现回报最大化。在本文中，我们研究了异构处理器(cpu和fpga)的分配，以最小化系统中的流延迟。我们进行了大量的模拟，以评估基于强化学习的调度算法的性能，如优势参与者批评家(A2C)，信任区域策略优化(TRPO)和近端策略优化(PPO)，并与贪婪策略进行比较。结果表明，基于强化学习的调度程序可以有效地从过去的经验中学习，并收敛到最优贪心策略。我们还深入分析了策略如何导致不同的处理器利用率和流处理时间，并提供了对这些策略的见解。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Flow Scheduling in a Heterogeneous NFV Environment using Reinforcement Learning

Network function virtualization (NFV) allows net-work functions executed on general-purpose servers or virtual machines (VMs) instead of proprietary hardware, greatly improving the flexibility and scalability of network services. Recent trends in using programmable accelerators to speed up NFV performance introduce challenges in flow scheduling in a dynamic NFV environment. Reinforcement learning (RL) trains machine learning models for decision making to maximize returns in uncertain environments such as NFV. In this paper, we study the allocation of heterogeneous processors (CPUs and FPGAs) to minimize the delays of flows in the system. We conduct extensive simulations to evaluate the performance of reinforcement learning based scheduling algorithms such as Advantage Actor Critic (A2C), Trust Region Policy Optimization (TRPO) and Proximal Policy Optimization (PPO), and compare with greedy policies. The results show that RL based schedulers can effectively learn from past experiences and converge to the optimal greedy policy. We also analyze in-depth how the policies lead to different processor utilization and flow processing time, and provide insights into these policies.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2021 IEEE International Conference on Networking, Architecture and Storage (NAS)

自引率

0.00%

发文量