流式排序网络

ACM Trans. Design Autom. Electr. Syst. Pub Date : 2016-09-22 DOI:10.1145/2854150

M. Zuluaga, Peter Milder, Markus Püschel

{"title":"流式排序网络","authors":"M. Zuluaga, Peter Milder, Markus Püschel","doi":"10.1145/2854150","DOIUrl":null,"url":null,"abstract":"Sorting is a fundamental problem in computer science and has been studied extensively. Thus, a large variety of sorting methods exist for both software and hardware implementations. For the latter, there is a trade-off between the throughput achieved and the cost (i.e., the logic and storage invested to sort n elements). Two popular solutions are bitonic sorting networks with O(nlog 2n) logic and storage, which sort n elements per cycle, and linear sorters with O(n) logic and storage, which sort n elements per n cycles. In this article, we present new hardware structures that we call streaming sorting networks, which we derive through a mathematical formalism that we introduce, and an accompanying domain-specific hardware generator that translates our formal mathematical description into synthesizable RTL Verilog. With the new networks, we achieve novel and improved cost-performance trade-offs. For example, assuming that n is a two-power and w is any divisor of n, one class of these networks can sort in n/;w cycles with O(wlog 2n) logic and O(nlog 2n) storage; the other class that we present sorts in nlog 2n/;w cycles with O(w) logic and O(n) storage. We carefully analyze the performance of these networks and their cost at three levels of abstraction: (1) asymptotically, (2) exactly in terms of the number of basic elements needed, and (3) in terms of the resources required by the actual circuit when mapped to a field-programmable gate array. The accompanying hardware generator allows us to explore the entire design space, identify the Pareto-optimal solutions, and show superior cost-performance trade-offs compared to prior work.","PeriodicalId":7063,"journal":{"name":"ACM Trans. Design Autom. Electr. Syst.","volume":"22 1","pages":"55:1-55:30"},"PeriodicalIF":0.0000,"publicationDate":"2016-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"34","resultStr":"{\"title\":\"Streaming Sorting Networks\",\"authors\":\"M. Zuluaga, Peter Milder, Markus Püschel\",\"doi\":\"10.1145/2854150\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Sorting is a fundamental problem in computer science and has been studied extensively. Thus, a large variety of sorting methods exist for both software and hardware implementations. For the latter, there is a trade-off between the throughput achieved and the cost (i.e., the logic and storage invested to sort n elements). Two popular solutions are bitonic sorting networks with O(nlog 2n) logic and storage, which sort n elements per cycle, and linear sorters with O(n) logic and storage, which sort n elements per n cycles. In this article, we present new hardware structures that we call streaming sorting networks, which we derive through a mathematical formalism that we introduce, and an accompanying domain-specific hardware generator that translates our formal mathematical description into synthesizable RTL Verilog. With the new networks, we achieve novel and improved cost-performance trade-offs. For example, assuming that n is a two-power and w is any divisor of n, one class of these networks can sort in n/;w cycles with O(wlog 2n) logic and O(nlog 2n) storage; the other class that we present sorts in nlog 2n/;w cycles with O(w) logic and O(n) storage. We carefully analyze the performance of these networks and their cost at three levels of abstraction: (1) asymptotically, (2) exactly in terms of the number of basic elements needed, and (3) in terms of the resources required by the actual circuit when mapped to a field-programmable gate array. The accompanying hardware generator allows us to explore the entire design space, identify the Pareto-optimal solutions, and show superior cost-performance trade-offs compared to prior work.\",\"PeriodicalId\":7063,\"journal\":{\"name\":\"ACM Trans. Design Autom. Electr. Syst.\",\"volume\":\"22 1\",\"pages\":\"55:1-55:30\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-09-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"34\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Trans. Design Autom. Electr. Syst.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2854150\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Trans. Design Autom. Electr. Syst.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2854150","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 34

摘要

排序是计算机科学中的一个基本问题，已经得到了广泛的研究。因此，在软件和硬件实现中都存在各种各样的排序方法。对于后者，在实现的吞吐量和成本(即对n个元素排序所投入的逻辑和存储)之间进行权衡。两种流行的解决方案是具有O(nlog 2n)逻辑和存储的双元排序网络，每个循环对n个元素进行排序，以及具有O(n)逻辑和存储的线性排序器，每n个循环对n个元素进行排序。在本文中，我们介绍了称为流排序网络的新硬件结构，它是通过我们引入的数学形式化推导出来的，以及附带的特定于领域的硬件生成器，它将我们的形式化数学描述转换为可合成的RTL Verilog。有了新的网络，我们实现了新颖和改进的成本效益权衡。例如，假设n是二次幂，w是n的任意因数，一类网络可以在n/ w个周期内排序，逻辑为O(wlog 2n)，存储为O(nlog 2n);我们介绍的另一类排序周期为nlog 2n/;w，逻辑为O(w)，存储为O(n)。我们在三个抽象层次上仔细分析了这些网络的性能及其成本:(1)渐近地，(2)根据所需基本元素的数量精确地，以及(3)根据映射到现场可编程门阵列时实际电路所需的资源。附带的硬件生成器允许我们探索整个设计空间，确定帕累托最优解决方案，并与之前的工作相比显示优越的成本性能权衡。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Streaming Sorting Networks

Sorting is a fundamental problem in computer science and has been studied extensively. Thus, a large variety of sorting methods exist for both software and hardware implementations. For the latter, there is a trade-off between the throughput achieved and the cost (i.e., the logic and storage invested to sort n elements). Two popular solutions are bitonic sorting networks with O(nlog ²n) logic and storage, which sort n elements per cycle, and linear sorters with O(n) logic and storage, which sort n elements per n cycles. In this article, we present new hardware structures that we call streaming sorting networks, which we derive through a mathematical formalism that we introduce, and an accompanying domain-specific hardware generator that translates our formal mathematical description into synthesizable RTL Verilog. With the new networks, we achieve novel and improved cost-performance trade-offs. For example, assuming that n is a two-power and w is any divisor of n, one class of these networks can sort in n/;w cycles with O(wlog ²n) logic and O(nlog ²n) storage; the other class that we present sorts in nlog ²n/;w cycles with O(w) logic and O(n) storage. We carefully analyze the performance of these networks and their cost at three levels of abstraction: (1) asymptotically, (2) exactly in terms of the number of basic elements needed, and (3) in terms of the resources required by the actual circuit when mapped to a field-programmable gate array. The accompanying hardware generator allows us to explore the entire design space, identify the Pareto-optimal solutions, and show superior cost-performance trade-offs compared to prior work.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ACM Trans. Design Autom. Electr. Syst.

自引率

0.00%

发文量