大型强子对撞机fpga的模块化高通量和低延迟排序单元

2011 IEEE 9th Symposium on Application Specific Processors (SASP) Pub Date : 2011-06-05 DOI:10.1109/SASP.2011.5941075

Amin Farmahini Farahani, A. Gregerson, M. Schulte, Katherine Compton

{"title":"大型强子对撞机fpga的模块化高通量和低延迟排序单元","authors":"Amin Farmahini Farahani, A. Gregerson, M. Schulte, Katherine Compton","doi":"10.1109/SASP.2011.5941075","DOIUrl":null,"url":null,"abstract":"This paper presents efficient techniques for designing high-throughput, low-latency sorting units for FPGA implementation. Our sorting units use modular design techniques that hierarchically construct large sorting units from smaller building blocks. They are optimized for situations in which only the M largest numbers from N inputs are needed; this situation commonly occurs in high-energy physics experiments and other forms of digital signal processing. Based on these techniques, we design parameterized, pipelined sorting units. A detailed analysis indicates that their resource requirements scale linearly with the number of inputs, latencies scale logarithmically with the number of inputs, and frequencies remain fairly constant. Synthesis results indicate that a single pipelined 256-to-4 sorting unit with 19 stages can perform 200 million sorts per second with a latency of about 95 ns per sort on a Virtex-5 FPGA.","PeriodicalId":375788,"journal":{"name":"2011 IEEE 9th Symposium on Application Specific Processors (SASP)","volume":"176 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"16","resultStr":"{\"title\":\"Modular high-throughput and low-latency sorting units for FPGAs in the Large Hadron Collider\",\"authors\":\"Amin Farmahini Farahani, A. Gregerson, M. Schulte, Katherine Compton\",\"doi\":\"10.1109/SASP.2011.5941075\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper presents efficient techniques for designing high-throughput, low-latency sorting units for FPGA implementation. Our sorting units use modular design techniques that hierarchically construct large sorting units from smaller building blocks. They are optimized for situations in which only the M largest numbers from N inputs are needed; this situation commonly occurs in high-energy physics experiments and other forms of digital signal processing. Based on these techniques, we design parameterized, pipelined sorting units. A detailed analysis indicates that their resource requirements scale linearly with the number of inputs, latencies scale logarithmically with the number of inputs, and frequencies remain fairly constant. Synthesis results indicate that a single pipelined 256-to-4 sorting unit with 19 stages can perform 200 million sorts per second with a latency of about 95 ns per sort on a Virtex-5 FPGA.\",\"PeriodicalId\":375788,\"journal\":{\"name\":\"2011 IEEE 9th Symposium on Application Specific Processors (SASP)\",\"volume\":\"176 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-06-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"16\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2011 IEEE 9th Symposium on Application Specific Processors (SASP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SASP.2011.5941075\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 IEEE 9th Symposium on Application Specific Processors (SASP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SASP.2011.5941075","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 16

摘要

本文介绍了设计用于FPGA实现的高吞吐量、低延迟排序单元的有效技术。我们的排序单元使用模块化设计技术，从较小的构建块分层构建大型排序单元。它们针对只需要N个输入中的M个最大数字的情况进行了优化;这种情况常见于高能物理实验和其他形式的数字信号处理中。基于这些技术，我们设计了参数化、流水线式的分选单元。详细的分析表明，它们的资源需求与输入数量呈线性关系，延迟与输入数量呈对数关系，频率保持相当恒定。综合结果表明，在Virtex-5 FPGA上，具有19个阶段的单个256到4的流水线排序单元每秒可以执行2亿次排序，每次排序的延迟约为95 ns。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Modular high-throughput and low-latency sorting units for FPGAs in the Large Hadron Collider

This paper presents efficient techniques for designing high-throughput, low-latency sorting units for FPGA implementation. Our sorting units use modular design techniques that hierarchically construct large sorting units from smaller building blocks. They are optimized for situations in which only the M largest numbers from N inputs are needed; this situation commonly occurs in high-energy physics experiments and other forms of digital signal processing. Based on these techniques, we design parameterized, pipelined sorting units. A detailed analysis indicates that their resource requirements scale linearly with the number of inputs, latencies scale logarithmically with the number of inputs, and frequencies remain fairly constant. Synthesis results indicate that a single pipelined 256-to-4 sorting unit with 19 stages can perform 200 million sorts per second with a latency of about 95 ns per sort on a Virtex-5 FPGA.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2011 IEEE 9th Symposium on Application Specific Processors (SASP)

自引率

0.00%

发文量