{"title":"fpga上的大型效用排序","authors":"Kristiyan Manev, Dirk Koch","doi":"10.1109/FPT.2018.00067","DOIUrl":null,"url":null,"abstract":"This paper presents a merge sorter able of merging thousands of streams in a single run where the logic cost scales logarithmic with the number of streams merged. Moreover, we apply several performance tuning techniques, including speculative execution, deep pipelining and optimized communication schemes between processing elements. An end-to-end case study utilizing a Xilinx VC709 board merges 2048 sequences of the Graysort benchmark between two DRAMs at 9.5GB/s or 1024 sequences at 10.3GB/s effective throughput.","PeriodicalId":434541,"journal":{"name":"2018 International Conference on Field-Programmable Technology (FPT)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":"{\"title\":\"Large Utility Sorting on FPGAs\",\"authors\":\"Kristiyan Manev, Dirk Koch\",\"doi\":\"10.1109/FPT.2018.00067\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper presents a merge sorter able of merging thousands of streams in a single run where the logic cost scales logarithmic with the number of streams merged. Moreover, we apply several performance tuning techniques, including speculative execution, deep pipelining and optimized communication schemes between processing elements. An end-to-end case study utilizing a Xilinx VC709 board merges 2048 sequences of the Graysort benchmark between two DRAMs at 9.5GB/s or 1024 sequences at 10.3GB/s effective throughput.\",\"PeriodicalId\":434541,\"journal\":{\"name\":\"2018 International Conference on Field-Programmable Technology (FPT)\",\"volume\":\"28 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-09-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"12\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 International Conference on Field-Programmable Technology (FPT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/FPT.2018.00067\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 International Conference on Field-Programmable Technology (FPT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FPT.2018.00067","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
This paper presents a merge sorter able of merging thousands of streams in a single run where the logic cost scales logarithmic with the number of streams merged. Moreover, we apply several performance tuning techniques, including speculative execution, deep pipelining and optimized communication schemes between processing elements. An end-to-end case study utilizing a Xilinx VC709 board merges 2048 sequences of the Graysort benchmark between two DRAMs at 9.5GB/s or 1024 sequences at 10.3GB/s effective throughput.