对在ALICE O系统中传输大数据量的消息队列库和网络技术进行基准测试

2016 IEEE-NPSS Real Time Conference (RT) Pub Date : 2016-08-15 DOI:10.1109/RTC.2016.7543162

V. C. Barroso, U. Fuchs, A. Wegrzynek

{"title":"对在ALICE O系统中传输大数据量的消息队列库和网络技术进行基准测试","authors":"V. C. Barroso, U. Fuchs, A. Wegrzynek","doi":"10.1109/RTC.2016.7543162","DOIUrl":null,"url":null,"abstract":"ALICE (A Large Ion Collider Experiment) is the heavy-ion detector designed to study the physics of strongly interacting matter and the quark-gluon plasma at the CERN LHC (Large Hadron Collider). ALICE has been successfully collecting physics data of Run 2 since spring 2015. In parallel, preparations for a major upgrade, called O2 (Online-Offline) and scheduled for the Long Shutdown 2 in 2019-2020, are being made. One of the major requirements is the capacity to transport data between so-called FLPs (First Level Processors), equipped with readout cards, and the EPNs (Event Processing Node), performing data aggregation, frame building and partial reconstruction. It is foreseen to have 268 FLPs dispatching data to 1500 EPNs with an average output of 20 Gb/s each. In overall, the O2 processing system will operate at terabits per second of throughput while handling millions of concurrent connections. To meet these requirements, the software and hardware layers of the new system need to be fully evaluated. In order to achieve a high performance to cost ratio three networking technologies (Ethernet, InfiniBand and Omni-Path) were benchmarked on Intel and IBM platforms. The core of the new transport layer will be based on a message queue library that supports push-pull and request-reply communication patterns and multipart messages. ZeroMQ and nanomsg are being evaluated as candidates and were tested in detail over the selected network technologies. This paper describes the benchmark programs and setups that were used during the tests, the significance of tuned kernel parameters, the configuration of network driver and the tuning of multi-core, multi-CPU, and NUMA (Non-Uniform Memory Access) architecture. It presents, compares and comments the final results. Eventually, it indicates the most efficient network technology and message queue library pair and provides an evaluation of the needed CPU and memory resources to handle foreseen traffic.","PeriodicalId":383702,"journal":{"name":"2016 IEEE-NPSS Real Time Conference (RT)","volume":"297 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Benchmarking message queue libraries and network technologies to transport large data volume in the ALICE O system\",\"authors\":\"V. C. Barroso, U. Fuchs, A. Wegrzynek\",\"doi\":\"10.1109/RTC.2016.7543162\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"ALICE (A Large Ion Collider Experiment) is the heavy-ion detector designed to study the physics of strongly interacting matter and the quark-gluon plasma at the CERN LHC (Large Hadron Collider). ALICE has been successfully collecting physics data of Run 2 since spring 2015. In parallel, preparations for a major upgrade, called O2 (Online-Offline) and scheduled for the Long Shutdown 2 in 2019-2020, are being made. One of the major requirements is the capacity to transport data between so-called FLPs (First Level Processors), equipped with readout cards, and the EPNs (Event Processing Node), performing data aggregation, frame building and partial reconstruction. It is foreseen to have 268 FLPs dispatching data to 1500 EPNs with an average output of 20 Gb/s each. In overall, the O2 processing system will operate at terabits per second of throughput while handling millions of concurrent connections. To meet these requirements, the software and hardware layers of the new system need to be fully evaluated. In order to achieve a high performance to cost ratio three networking technologies (Ethernet, InfiniBand and Omni-Path) were benchmarked on Intel and IBM platforms. The core of the new transport layer will be based on a message queue library that supports push-pull and request-reply communication patterns and multipart messages. ZeroMQ and nanomsg are being evaluated as candidates and were tested in detail over the selected network technologies. This paper describes the benchmark programs and setups that were used during the tests, the significance of tuned kernel parameters, the configuration of network driver and the tuning of multi-core, multi-CPU, and NUMA (Non-Uniform Memory Access) architecture. It presents, compares and comments the final results. Eventually, it indicates the most efficient network technology and message queue library pair and provides an evaluation of the needed CPU and memory resources to handle foreseen traffic.\",\"PeriodicalId\":383702,\"journal\":{\"name\":\"2016 IEEE-NPSS Real Time Conference (RT)\",\"volume\":\"297 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-08-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 IEEE-NPSS Real Time Conference (RT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/RTC.2016.7543162\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE-NPSS Real Time Conference (RT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/RTC.2016.7543162","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

ALICE(大型离子对撞机实验)是欧洲核子研究中心(CERN)大型强子对撞机(LHC)设计的用于研究强相互作用物质和夸克-胶子等离子体物理的重离子探测器。自2015年春季以来，ALICE已经成功地收集了Run 2的物理数据。与此同时，一项名为O2(在线-离线)的重大升级的准备工作也在进行中，该升级计划于2019-2020年长期停工。其中一个主要要求是在配备读出卡的所谓的flp(一级处理器)和epn(事件处理节点)之间传输数据的能力，执行数据聚合、框架构建和部分重建。预计将有268个flp将数据分发到1500个epn，每个epn的平均输出为20 Gb/s。总的来说，O2处理系统将以每秒太比特的吞吐量运行，同时处理数百万个并发连接。为了满足这些要求，需要对新系统的软件和硬件层进行充分的评估。为了实现高性能成本比，三种网络技术(以太网，InfiniBand和Omni-Path)在英特尔和IBM平台上进行了基准测试。新传输层的核心将基于消息队列库，该库支持推拉和请求-应答通信模式以及多部分消息。ZeroMQ和nanomsg正在作为候选网络进行评估，并在选定的网络技术上进行了详细测试。本文介绍了测试过程中使用的基准程序和设置，调优内核参数的意义，网络驱动程序的配置以及多核、多cpu和NUMA(非统一内存访问)体系结构的调优。它展示、比较和评论了最终结果。最后，它指出了最有效的网络技术和消息队列库对，并提供了处理可预见流量所需的CPU和内存资源的评估。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Benchmarking message queue libraries and network technologies to transport large data volume in the ALICE O system

ALICE (A Large Ion Collider Experiment) is the heavy-ion detector designed to study the physics of strongly interacting matter and the quark-gluon plasma at the CERN LHC (Large Hadron Collider). ALICE has been successfully collecting physics data of Run 2 since spring 2015. In parallel, preparations for a major upgrade, called O2 (Online-Offline) and scheduled for the Long Shutdown 2 in 2019-2020, are being made. One of the major requirements is the capacity to transport data between so-called FLPs (First Level Processors), equipped with readout cards, and the EPNs (Event Processing Node), performing data aggregation, frame building and partial reconstruction. It is foreseen to have 268 FLPs dispatching data to 1500 EPNs with an average output of 20 Gb/s each. In overall, the O2 processing system will operate at terabits per second of throughput while handling millions of concurrent connections. To meet these requirements, the software and hardware layers of the new system need to be fully evaluated. In order to achieve a high performance to cost ratio three networking technologies (Ethernet, InfiniBand and Omni-Path) were benchmarked on Intel and IBM platforms. The core of the new transport layer will be based on a message queue library that supports push-pull and request-reply communication patterns and multipart messages. ZeroMQ and nanomsg are being evaluated as candidates and were tested in detail over the selected network technologies. This paper describes the benchmark programs and setups that were used during the tests, the significance of tuned kernel parameters, the configuration of network driver and the tuning of multi-core, multi-CPU, and NUMA (Non-Uniform Memory Access) architecture. It presents, compares and comments the final results. Eventually, it indicates the most efficient network technology and message queue library pair and provides an evaluation of the needed CPU and memory resources to handle foreseen traffic.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2016 IEEE-NPSS Real Time Conference (RT)

自引率

0.00%

发文量