BatchQueue:快速和内存节约的核心到核心通信

2010 22nd International Symposium on Computer Architecture and High Performance Computing Pub Date : 2010-10-27 DOI:10.1109/SBAC-PAD.2010.34

Thomas Preud'homme, Julien Sopena, Gaël Thomas, B. Folliot

{"title":"BatchQueue:快速和内存节约的核心到核心通信","authors":"Thomas Preud'homme, Julien Sopena, Gaël Thomas, B. Folliot","doi":"10.1109/SBAC-PAD.2010.34","DOIUrl":null,"url":null,"abstract":"Sequential applications can take advantage of multi-core systems by way of pipeline parallelism to improve their performance. In such parallelism, core to core communication overhead is the main limit of speedup. This paper presents BatchQueue, a fast and memory-thrifty core to core communication system based on batch processing of whole cache line. BatchQueue is able to send a 32bit word of data in just 12.5 ns on a Xeon X5472 and only needs 2 full cache lines plus 3 byte-sized variables — each on a different cache line for optimal performance — to work. The characteristics of BatchQueue — high throughput and increased latency resulting from its batch processing — makes it well suited for highly communicative tasks with no real time requirements such as monitoring.","PeriodicalId":432670,"journal":{"name":"2010 22nd International Symposium on Computer Architecture and High Performance Computing","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":"{\"title\":\"BatchQueue: Fast and Memory-Thrifty Core to Core Communication\",\"authors\":\"Thomas Preud'homme, Julien Sopena, Gaël Thomas, B. Folliot\",\"doi\":\"10.1109/SBAC-PAD.2010.34\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Sequential applications can take advantage of multi-core systems by way of pipeline parallelism to improve their performance. In such parallelism, core to core communication overhead is the main limit of speedup. This paper presents BatchQueue, a fast and memory-thrifty core to core communication system based on batch processing of whole cache line. BatchQueue is able to send a 32bit word of data in just 12.5 ns on a Xeon X5472 and only needs 2 full cache lines plus 3 byte-sized variables — each on a different cache line for optimal performance — to work. The characteristics of BatchQueue — high throughput and increased latency resulting from its batch processing — makes it well suited for highly communicative tasks with no real time requirements such as monitoring.\",\"PeriodicalId\":432670,\"journal\":{\"name\":\"2010 22nd International Symposium on Computer Architecture and High Performance Computing\",\"volume\":\"11 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-10-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"15\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2010 22nd International Symposium on Computer Architecture and High Performance Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SBAC-PAD.2010.34\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 22nd International Symposium on Computer Architecture and High Performance Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SBAC-PAD.2010.34","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 15

摘要

顺序应用程序可以利用多核系统的管道并行性来提高其性能。在这种并行性中，核心到核心的通信开销是加速的主要限制。BatchQueue是一种基于全缓存线批处理的快速、内存节约的核心对核心通信系统。在Xeon X5472上，BatchQueue能够在12.5 ns内发送一个32位字的数据，并且只需要2条完整的缓存线加上3个字节大小的变量(每个变量在不同的缓存线上以获得最佳性能)就可以工作。BatchQueue的特点——高吞吐量和由于批处理而增加的延迟——使得它非常适合于没有实时要求的高通信任务，比如监控。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

BatchQueue: Fast and Memory-Thrifty Core to Core Communication

Sequential applications can take advantage of multi-core systems by way of pipeline parallelism to improve their performance. In such parallelism, core to core communication overhead is the main limit of speedup. This paper presents BatchQueue, a fast and memory-thrifty core to core communication system based on batch processing of whole cache line. BatchQueue is able to send a 32bit word of data in just 12.5 ns on a Xeon X5472 and only needs 2 full cache lines plus 3 byte-sized variables — each on a different cache line for optimal performance — to work. The characteristics of BatchQueue — high throughput and increased latency resulting from its batch processing — makes it well suited for highly communicative tasks with no real time requirements such as monitoring.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2010 22nd International Symposium on Computer Architecture and High Performance Computing

自引率

0.00%

发文量