VELO:一种超低延迟消息传输的新型通信引擎

2008 37th International Conference on Parallel Processing Pub Date : 2008-09-09 DOI:10.1109/ICPP.2008.85

Heiner Litz, H. Fröning, M. Nüssle, U. Brüning

{"title":"VELO:一种超低延迟消息传输的新型通信引擎","authors":"Heiner Litz, H. Fröning, M. Nüssle, U. Brüning","doi":"10.1109/ICPP.2008.85","DOIUrl":null,"url":null,"abstract":"This paper presents a novel stateless, virtualized communication engine for sub-microsecond latency. Using a field-programmable-gate-array (FPGA) based prototype we show a latency of 970 ns between two machines with our virtualized engine for low overhead (VELO). The FPGA device is directly connected to the CPUs by a hypertransport link. The described hardware architecture is optimized for small messages and avoids the overhead typically found with direct-memory access (DMA) controlled transfers. The stateless approach allows to use the hardware unit directly from many threads and processes simultaneously. It provides a secure user level communication with an extremely optimized start-up phase. Micro benchmarks results are reported both based on proprietary API and OpenMPI basis.","PeriodicalId":388408,"journal":{"name":"2008 37th International Conference on Parallel Processing","volume":"29 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"34","resultStr":"{\"title\":\"VELO: A Novel Communication Engine for Ultra-Low Latency Message Transfers\",\"authors\":\"Heiner Litz, H. Fröning, M. Nüssle, U. Brüning\",\"doi\":\"10.1109/ICPP.2008.85\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper presents a novel stateless, virtualized communication engine for sub-microsecond latency. Using a field-programmable-gate-array (FPGA) based prototype we show a latency of 970 ns between two machines with our virtualized engine for low overhead (VELO). The FPGA device is directly connected to the CPUs by a hypertransport link. The described hardware architecture is optimized for small messages and avoids the overhead typically found with direct-memory access (DMA) controlled transfers. The stateless approach allows to use the hardware unit directly from many threads and processes simultaneously. It provides a secure user level communication with an extremely optimized start-up phase. Micro benchmarks results are reported both based on proprietary API and OpenMPI basis.\",\"PeriodicalId\":388408,\"journal\":{\"name\":\"2008 37th International Conference on Parallel Processing\",\"volume\":\"29 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2008-09-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"34\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2008 37th International Conference on Parallel Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICPP.2008.85\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 37th International Conference on Parallel Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICPP.2008.85","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 34

摘要

提出了一种新型的无状态虚拟化通信引擎，实现了亚微秒级的时延。使用基于现场可编程门阵列(FPGA)的原型，我们展示了使用我们的低开销虚拟化引擎(VELO)的两台机器之间的延迟为970 ns。FPGA设备通过超传输链路与cpu直接连接。所描述的硬件体系结构针对小消息进行了优化，并避免了直接内存访问(DMA)控制的传输中常见的开销。无状态方法允许同时从多个线程和进程直接使用硬件单元。它提供了一个安全的用户级通信与一个极其优化的启动阶段。基于专有API和OpenMPI报告微基准测试结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

VELO: A Novel Communication Engine for Ultra-Low Latency Message Transfers

This paper presents a novel stateless, virtualized communication engine for sub-microsecond latency. Using a field-programmable-gate-array (FPGA) based prototype we show a latency of 970 ns between two machines with our virtualized engine for low overhead (VELO). The FPGA device is directly connected to the CPUs by a hypertransport link. The described hardware architecture is optimized for small messages and avoids the overhead typically found with direct-memory access (DMA) controlled transfers. The stateless approach allows to use the hardware unit directly from many threads and processes simultaneously. It provides a secure user level communication with an extremely optimized start-up phase. Micro benchmarks results are reported both based on proprietary API and OpenMPI basis.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2008 37th International Conference on Parallel Processing

自引率

0.00%

发文量