通过迁移线程架构极大地加速了流问题的扩展

2021 IEEE/ACM 11th Workshop on Irregular Applications: Architectures and Algorithms (IA3) Pub Date : 2021-11-01 DOI:10.1109/IA354616.2021.00009

Brian A. Page, P. Kogge

{"title":"通过迁移线程架构极大地加速了流问题的扩展","authors":"Brian A. Page, P. Kogge","doi":"10.1109/IA354616.2021.00009","DOIUrl":null,"url":null,"abstract":"Applications where continuous streams of data are passed through large data structures are becoming of increasing importance. However, their execution on conventional architectures, especially when parallelism is desired to boost performance, is highly inefficient. The primary issue is often with the need to stream large numbers of disparate data items through the equivalent of very large hash tables distributed across many nodes. This paper builds on some prior work on the Firehose streaming benchmark where an emerging architecture using threads that can migrate through memory has shown to be much more efficient at such problems. This paper extends that work to use a second generation system to not only show that same improved efficiency (10X) for larger core counts, but even significantly higher raw performance (with FPGA-based cores running at 1/10th the clock of conventional systems). Further, this additional data yields insight into what resources represent the bottlenecks to even more performance, and make a reasonable projection that implementation of such an architecture with current technology would lead to 10X performance gain on an apples-to-apples basis with conventional systems.","PeriodicalId":415158,"journal":{"name":"2021 IEEE/ACM 11th Workshop on Irregular Applications: Architectures and Algorithms (IA3)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Greatly Accelerated Scaling of Streaming Problems with A Migrating Thread Architecture\",\"authors\":\"Brian A. Page, P. Kogge\",\"doi\":\"10.1109/IA354616.2021.00009\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Applications where continuous streams of data are passed through large data structures are becoming of increasing importance. However, their execution on conventional architectures, especially when parallelism is desired to boost performance, is highly inefficient. The primary issue is often with the need to stream large numbers of disparate data items through the equivalent of very large hash tables distributed across many nodes. This paper builds on some prior work on the Firehose streaming benchmark where an emerging architecture using threads that can migrate through memory has shown to be much more efficient at such problems. This paper extends that work to use a second generation system to not only show that same improved efficiency (10X) for larger core counts, but even significantly higher raw performance (with FPGA-based cores running at 1/10th the clock of conventional systems). Further, this additional data yields insight into what resources represent the bottlenecks to even more performance, and make a reasonable projection that implementation of such an architecture with current technology would lead to 10X performance gain on an apples-to-apples basis with conventional systems.\",\"PeriodicalId\":415158,\"journal\":{\"name\":\"2021 IEEE/ACM 11th Workshop on Irregular Applications: Architectures and Algorithms (IA3)\",\"volume\":\"27 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE/ACM 11th Workshop on Irregular Applications: Architectures and Algorithms (IA3)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IA354616.2021.00009\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE/ACM 11th Workshop on Irregular Applications: Architectures and Algorithms (IA3)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IA354616.2021.00009","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

连续数据流通过大型数据结构的应用变得越来越重要。然而，它们在传统架构上的执行效率非常低，特别是当并行性需要提高性能时。主要问题通常是需要通过分布在许多节点上的非常大的散列表来传输大量不同的数据项。本文建立在Firehose流基准测试之前的一些工作的基础上，其中使用可以通过内存迁移的线程的新兴架构在此类问题上显示出更高的效率。本文将这项工作扩展到使用第二代系统，不仅可以在更大的内核数量下显示相同的改进效率(10倍)，而且甚至可以显着提高原始性能(基于fpga的内核以传统系统的1/10时钟运行)。此外，这些额外的数据可以深入了解哪些资源代表了更高性能的瓶颈，并做出合理的预测，即使用当前技术实现这种体系结构将在与传统系统的同类基础上获得10倍的性能提升。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Greatly Accelerated Scaling of Streaming Problems with A Migrating Thread Architecture

Applications where continuous streams of data are passed through large data structures are becoming of increasing importance. However, their execution on conventional architectures, especially when parallelism is desired to boost performance, is highly inefficient. The primary issue is often with the need to stream large numbers of disparate data items through the equivalent of very large hash tables distributed across many nodes. This paper builds on some prior work on the Firehose streaming benchmark where an emerging architecture using threads that can migrate through memory has shown to be much more efficient at such problems. This paper extends that work to use a second generation system to not only show that same improved efficiency (10X) for larger core counts, but even significantly higher raw performance (with FPGA-based cores running at 1/10th the clock of conventional systems). Further, this additional data yields insight into what resources represent the bottlenecks to even more performance, and make a reasonable projection that implementation of such an architecture with current technology would lead to 10X performance gain on an apples-to-apples basis with conventional systems.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2021 IEEE/ACM 11th Workshop on Irregular Applications: Architectures and Algorithms (IA3)

自引率

0.00%

发文量