Accelerating Financial Market Server through Hybrid List Design (Abstract Only)

Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays Pub Date : 2017-02-22 DOI:10.1145/3020078.3021775

H. Fu, Conghui He, Huabin Ruan, Itay Greenspon, W. Luk, Yongkang Zheng, Junfeng Liao, Qing Zhang, Guangwen Yang

{"title":"Accelerating Financial Market Server through Hybrid List Design (Abstract Only)","authors":"H. Fu, Conghui He, Huabin Ruan, Itay Greenspon, W. Luk, Yongkang Zheng, Junfeng Liao, Qing Zhang, Guangwen Yang","doi":"10.1145/3020078.3021775","DOIUrl":null,"url":null,"abstract":"The financial market server in exchanges aims to maintain the order books and provide real time market data feeds to traders. Low-latency processing is in a great demand in financial trading. Although software solutions provide the flexibility to express algorithms in high-level programming models and to recompile quickly, it is becoming increasingly uncompetitive due to the long and unpredictable response time. Nowadays, Field Programmable Gate Arrays (FPGAs) have been proved to be an established technology for achieving a low and constant latency for processing streaming packets in a hardware accelerated way. However, maintaining order books on FPGAs involves organizing packets into GBs of structural data information as well as complicated routines (sort, insertion, deletion, etc.), which is extremely challenging to FPGA designs in both design methodology and memory volume. Thus existing FPGA designs often leave the post-processing part to the CPUs. However, it largely cancels the latency gain of the network packet processing part. This paper proposes a CPU-FPGA hybrid list design to accelerate financial market servers that achieve microsecond-level latencies. This paper mainly includes four contributions. First, we design a CPU-FPGA hybrid list with two levels, a small cache list on the FPGA and a large master list at the CPU host. Both lists are sorted with different sorting schemes, where the bitonic sort is applied to the cache list while a balanced tree is used to maintain the master list. Second, in order to effectively update the hybrid sorted list, we derive a complete set of low-latency routines, including insertion, deletion, selection, sorting, etc., providing a low latency at the scale of a few cycles. Third, we propose a non-blocking on-demand synchronization strategy for the cache list and the master list to communicate with each other. Lastly, we integrate the hybrid list as well as other components, such as packets splitting, parsing, processing, etc. to form an industry-level financial market server. Our design is applied in the environment of the China Financial Futures Exchange (CFFEX), demonstrating its functionality and stability by running 600+ hours with hundreds of millions packets per day. Compared with the existing CPU-based solution in CFFEX, our system is able to support identical functionalities while significantly reducing the latency from 100+ microseconds to 2 microseconds, gaining a speedup of 50x.","PeriodicalId":252039,"journal":{"name":"Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","volume":"57 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3020078.3021775","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

The financial market server in exchanges aims to maintain the order books and provide real time market data feeds to traders. Low-latency processing is in a great demand in financial trading. Although software solutions provide the flexibility to express algorithms in high-level programming models and to recompile quickly, it is becoming increasingly uncompetitive due to the long and unpredictable response time. Nowadays, Field Programmable Gate Arrays (FPGAs) have been proved to be an established technology for achieving a low and constant latency for processing streaming packets in a hardware accelerated way. However, maintaining order books on FPGAs involves organizing packets into GBs of structural data information as well as complicated routines (sort, insertion, deletion, etc.), which is extremely challenging to FPGA designs in both design methodology and memory volume. Thus existing FPGA designs often leave the post-processing part to the CPUs. However, it largely cancels the latency gain of the network packet processing part. This paper proposes a CPU-FPGA hybrid list design to accelerate financial market servers that achieve microsecond-level latencies. This paper mainly includes four contributions. First, we design a CPU-FPGA hybrid list with two levels, a small cache list on the FPGA and a large master list at the CPU host. Both lists are sorted with different sorting schemes, where the bitonic sort is applied to the cache list while a balanced tree is used to maintain the master list. Second, in order to effectively update the hybrid sorted list, we derive a complete set of low-latency routines, including insertion, deletion, selection, sorting, etc., providing a low latency at the scale of a few cycles. Third, we propose a non-blocking on-demand synchronization strategy for the cache list and the master list to communicate with each other. Lastly, we integrate the hybrid list as well as other components, such as packets splitting, parsing, processing, etc. to form an industry-level financial market server. Our design is applied in the environment of the China Financial Futures Exchange (CFFEX), demonstrating its functionality and stability by running 600+ hours with hundreds of millions packets per day. Compared with the existing CPU-based solution in CFFEX, our system is able to support identical functionalities while significantly reducing the latency from 100+ microseconds to 2 microseconds, gaining a speedup of 50x.

查看原文本刊更多论文

通过混合列表设计加速金融市场服务器(仅摘要)

交易所中的金融市场服务器旨在维护订单簿，并向交易者提供实时市场数据。低延迟处理在金融交易中有很大的需求。尽管软件解决方案提供了在高级编程模型中表达算法和快速重新编译的灵活性，但由于响应时间长且不可预测，它正变得越来越没有竞争力。如今，现场可编程门阵列(fpga)已被证明是一种成熟的技术，可以实现以硬件加速方式处理流数据包的低延迟和恒定延迟。然而，在FPGA上维护订单簿涉及到将数据包组织成gb的结构数据信息以及复杂的例程(排序，插入，删除等)，这对FPGA设计在设计方法和内存容量方面都极具挑战性。因此，现有的FPGA设计通常将后处理部分留给cpu。然而，它在很大程度上抵消了网络数据包处理部分的延迟增益。本文提出了一种CPU-FPGA混合列表设计，以加速金融市场服务器实现微秒级延迟。本文主要包括四个方面的贡献。首先，我们设计了一个具有两层的CPU-FPGA混合列表，FPGA上的小缓存列表和CPU主机上的大主列表。这两个列表使用不同的排序方案进行排序，其中双元排序应用于缓存列表，而平衡树用于维护主列表。其次，为了有效地更新混合排序表，我们推导了一套完整的低延迟例程，包括插入、删除、选择、排序等，提供了几个周期规模的低延迟。第三，我们提出了一种非阻塞的按需同步策略，使缓存列表和主列表能够相互通信。最后，我们将混合列表以及其他组件，如分组拆分、解析、处理等，整合成一个行业级的金融市场服务器。我们的设计应用于中国金融期货交易所(CFFEX)的环境中，通过每天运行600多个小时，数以亿计的数据包，展示了其功能和稳定性。与CFFEX现有的基于cpu的解决方案相比，我们的系统能够支持相同的功能，同时显着将延迟从100多微秒减少到2微秒，获得50倍的速度提升。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

自引率

0.00%

发文量