使用云环境中的 CPU-GPU 混合集群对大数据进行高性能流量分类

IF 6.6 1区计算机科学 Q1 Multidisciplinary

Tsinghua Science and Technology Pub Date : 2024-02-09 DOI:10.26599/TST.2023.9010088

Azam Fazel-Najafabadi;Mahdi Abbasi;Hani H. Attar;Ayman Amer;Amir Taherkordi;Azad Shokrollahi;Mohammad R. Khosravi;Ahmed A. Solyman

{"title":"使用云环境中的 CPU-GPU 混合集群对大数据进行高性能流量分类","authors":"Azam Fazel-Najafabadi;Mahdi Abbasi;Hani H. Attar;Ayman Amer;Amir Taherkordi;Azad Shokrollahi;Mohammad R. Khosravi;Ahmed A. Solyman","doi":"10.26599/TST.2023.9010088","DOIUrl":null,"url":null,"abstract":"The network switches in the data plane of Software Defined Networking (SDN) are empowered by an elementary process, in which enormous number of packets which resemble big volumes of data are classified into specific flows by matching them against a set of dynamic rules. This basic process accelerates the processing of data, so that instead of processing singular packets repeatedly, corresponding actions are performed on corresponding flows of packets. In this paper, first, we address limitations on a typical packet classification algorithm like Tuple Space Search (TSS). Then, we present a set of different scenarios to parallelize it on different parallel processing platforms, including Graphics Processing Units (GPUs), clusters of Central Processing Units (CPUs), and hybrid clusters. Experimental results show that the hybrid cluster provides the best platform for parallelizing packet classification algorithms, which promises the average throughput rate of 4.2 Million packets per second (Mpps). That is, the hybrid cluster produced by the integration of Compute Unified Device Architecture (CUDA), Message Passing Interface (MPI), and OpenMP programming model could classify 0.24 million packets per second more than the GPU cluster scheme. Such a packet classifier satisfies the required processing speed in the programmable network systems that would be used to communicate big medical data.","PeriodicalId":48690,"journal":{"name":"Tsinghua Science and Technology","volume":"29 4","pages":"1118-1137"},"PeriodicalIF":6.6000,"publicationDate":"2024-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10431734","citationCount":"0","resultStr":"{\"title\":\"High-Performance Flow Classification of Big Data Using Hybrid CPU-GPU Clusters of Cloud Environments\",\"authors\":\"Azam Fazel-Najafabadi;Mahdi Abbasi;Hani H. Attar;Ayman Amer;Amir Taherkordi;Azad Shokrollahi;Mohammad R. Khosravi;Ahmed A. Solyman\",\"doi\":\"10.26599/TST.2023.9010088\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The network switches in the data plane of Software Defined Networking (SDN) are empowered by an elementary process, in which enormous number of packets which resemble big volumes of data are classified into specific flows by matching them against a set of dynamic rules. This basic process accelerates the processing of data, so that instead of processing singular packets repeatedly, corresponding actions are performed on corresponding flows of packets. In this paper, first, we address limitations on a typical packet classification algorithm like Tuple Space Search (TSS). Then, we present a set of different scenarios to parallelize it on different parallel processing platforms, including Graphics Processing Units (GPUs), clusters of Central Processing Units (CPUs), and hybrid clusters. Experimental results show that the hybrid cluster provides the best platform for parallelizing packet classification algorithms, which promises the average throughput rate of 4.2 Million packets per second (Mpps). That is, the hybrid cluster produced by the integration of Compute Unified Device Architecture (CUDA), Message Passing Interface (MPI), and OpenMP programming model could classify 0.24 million packets per second more than the GPU cluster scheme. Such a packet classifier satisfies the required processing speed in the programmable network systems that would be used to communicate big medical data.\",\"PeriodicalId\":48690,\"journal\":{\"name\":\"Tsinghua Science and Technology\",\"volume\":\"29 4\",\"pages\":\"1118-1137\"},\"PeriodicalIF\":6.6000,\"publicationDate\":\"2024-02-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10431734\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Tsinghua Science and Technology\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10431734/\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"Multidisciplinary\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Tsinghua Science and Technology","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10431734/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Multidisciplinary","Score":null,"Total":0}

引用次数: 0

摘要

软件定义网络（SDN）数据平面中的网络交换机由一个基本流程驱动，在这个流程中，大量类似于海量数据的数据包通过与一组动态规则进行匹配，被分类为特定的数据流。这一基本流程加快了数据处理速度，因此无需重复处理单个数据包，而是对相应的数据包流执行相应的操作。在本文中，我们首先讨论了典型数据包分类算法（如元组空间搜索（TSS））的局限性。然后，我们提出了在不同并行处理平台（包括图形处理器（GPU）、中央处理器（CPU）集群和混合集群）上并行处理该算法的一系列不同方案。实验结果表明，混合集群为数据包分类算法的并行化提供了最佳平台，其平均吞吐率可达每秒 420 万数据包（Mpps）。也就是说，集成了计算统一设备架构（CUDA）、消息传递接口（MPI）和 OpenMP 编程模型的混合集群比 GPU 集群方案每秒多分类 24 万个数据包。这样的数据包分类器满足了用于医疗大数据通信的可编程网络系统所需的处理速度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

High-Performance Flow Classification of Big Data Using Hybrid CPU-GPU Clusters of Cloud Environments

The network switches in the data plane of Software Defined Networking (SDN) are empowered by an elementary process, in which enormous number of packets which resemble big volumes of data are classified into specific flows by matching them against a set of dynamic rules. This basic process accelerates the processing of data, so that instead of processing singular packets repeatedly, corresponding actions are performed on corresponding flows of packets. In this paper, first, we address limitations on a typical packet classification algorithm like Tuple Space Search (TSS). Then, we present a set of different scenarios to parallelize it on different parallel processing platforms, including Graphics Processing Units (GPUs), clusters of Central Processing Units (CPUs), and hybrid clusters. Experimental results show that the hybrid cluster provides the best platform for parallelizing packet classification algorithms, which promises the average throughput rate of 4.2 Million packets per second (Mpps). That is, the hybrid cluster produced by the integration of Compute Unified Device Architecture (CUDA), Message Passing Interface (MPI), and OpenMP programming model could classify 0.24 million packets per second more than the GPU cluster scheme. Such a packet classifier satisfies the required processing speed in the programmable network systems that would be used to communicate big medical data.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Tsinghua Science and Technology COMPUTER SCIENCE, INFORMATION SYSTEMSCOMPU-COMPUTER SCIENCE, SOFTWARE ENGINEERING

CiteScore

10.20

自引率

10.60%

发文量

2340

期刊介绍： Tsinghua Science and Technology (Tsinghua Sci Technol) started publication in 1996. It is an international academic journal sponsored by Tsinghua University and is published bimonthly. This journal aims at presenting the up-to-date scientific achievements in computer science, electronic engineering, and other IT fields. Contributions all over the world are welcome.