CAF:核心到核心通信加速框架

2016 International Conference on Parallel Architecture and Compilation Techniques (PACT) Pub Date : 2016-09-11 DOI:10.1145/2967938.2967954

Yipeng Wang, Ren Wang, Andrew J. Herdrich, James Tsai, Yan Solihin

{"title":"CAF:核心到核心通信加速框架","authors":"Yipeng Wang, Ren Wang, Andrew J. Herdrich, James Tsai, Yan Solihin","doi":"10.1145/2967938.2967954","DOIUrl":null,"url":null,"abstract":"As the number of cores in a multicore system increases, core-to-core (C2C) communication is increasingly limiting the performance scaling of workloads that share data frequently. The traditional way cores communicate is by using shared memory space between them. However, shared memory communication fundamentally involves coherence invalidations and cache misses, which cause large performance overheads and incur a high amount of network traffic. Many important workloads incur significant C2C communication and are affected significantly by the costs, including pipelined packet processing which is widely used in software-based networking solutions. In these workloads, threads run on different cores and pass packets from one core to another for different stages of processing using software queues. In this paper, we analyze the behavior and overheads of software queue management. Based on this analysis, we propose a novel C2C Communication Acceleration Framework (CAF) to optimize C2C communication. CAF offloads substantial communication burdens from cores and memory to a designated, efficient hardware device we refer to as Queue Management Device (QMD) attached to the Network on Chip. CAF combines hardware and software optimizations to effectively reduce the queue-induced communication overheads and improve the overall system performance by up to 2-12× over traditional software queue implementations.","PeriodicalId":407717,"journal":{"name":"2016 International Conference on Parallel Architecture and Compilation Techniques (PACT)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"18","resultStr":"{\"title\":\"CAF: Core to core Communication Acceleration Framework\",\"authors\":\"Yipeng Wang, Ren Wang, Andrew J. Herdrich, James Tsai, Yan Solihin\",\"doi\":\"10.1145/2967938.2967954\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"As the number of cores in a multicore system increases, core-to-core (C2C) communication is increasingly limiting the performance scaling of workloads that share data frequently. The traditional way cores communicate is by using shared memory space between them. However, shared memory communication fundamentally involves coherence invalidations and cache misses, which cause large performance overheads and incur a high amount of network traffic. Many important workloads incur significant C2C communication and are affected significantly by the costs, including pipelined packet processing which is widely used in software-based networking solutions. In these workloads, threads run on different cores and pass packets from one core to another for different stages of processing using software queues. In this paper, we analyze the behavior and overheads of software queue management. Based on this analysis, we propose a novel C2C Communication Acceleration Framework (CAF) to optimize C2C communication. CAF offloads substantial communication burdens from cores and memory to a designated, efficient hardware device we refer to as Queue Management Device (QMD) attached to the Network on Chip. CAF combines hardware and software optimizations to effectively reduce the queue-induced communication overheads and improve the overall system performance by up to 2-12× over traditional software queue implementations.\",\"PeriodicalId\":407717,\"journal\":{\"name\":\"2016 International Conference on Parallel Architecture and Compilation Techniques (PACT)\",\"volume\":\"38 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-09-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"18\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 International Conference on Parallel Architecture and Compilation Techniques (PACT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2967938.2967954\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 International Conference on Parallel Architecture and Compilation Techniques (PACT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2967938.2967954","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 18

摘要

随着多核系统中核心数量的增加，核心对核心(C2C)通信越来越多地限制了频繁共享数据的工作负载的性能扩展。传统的内核通信方式是在它们之间使用共享内存空间。然而，共享内存通信从根本上涉及一致性失效和缓存丢失，这会导致巨大的性能开销并导致大量的网络流量。许多重要的工作负载需要大量的C2C通信，并且受到成本的显著影响，包括在基于软件的网络解决方案中广泛使用的流水线数据包处理。在这些工作负载中，线程在不同的核心上运行，并使用软件队列将数据包从一个核心传递到另一个核心，以进行不同的处理阶段。在本文中，我们分析了软件队列管理的行为和开销。在此基础上，我们提出了一种新的C2C通信加速框架(CAF)来优化C2C通信。CAF将大量的通信负担从内核和内存转移到一个指定的、高效的硬件设备上，我们称之为连接到片上网络的队列管理设备(QMD)。CAF结合了硬件和软件优化，有效地减少了队列引起的通信开销，并将整体系统性能提高到传统软件队列实现的2-12倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

CAF: Core to core Communication Acceleration Framework

As the number of cores in a multicore system increases, core-to-core (C2C) communication is increasingly limiting the performance scaling of workloads that share data frequently. The traditional way cores communicate is by using shared memory space between them. However, shared memory communication fundamentally involves coherence invalidations and cache misses, which cause large performance overheads and incur a high amount of network traffic. Many important workloads incur significant C2C communication and are affected significantly by the costs, including pipelined packet processing which is widely used in software-based networking solutions. In these workloads, threads run on different cores and pass packets from one core to another for different stages of processing using software queues. In this paper, we analyze the behavior and overheads of software queue management. Based on this analysis, we propose a novel C2C Communication Acceleration Framework (CAF) to optimize C2C communication. CAF offloads substantial communication burdens from cores and memory to a designated, efficient hardware device we refer to as Queue Management Device (QMD) attached to the Network on Chip. CAF combines hardware and software optimizations to effectively reduce the queue-induced communication overheads and improve the overall system performance by up to 2-12× over traditional software queue implementations.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2016 International Conference on Parallel Architecture and Compilation Techniques (PACT)

自引率

0.00%

发文量