带缓冲处理单元的暴露数据路径体系结构的内存最优代码生成

2018 18th International Conference on Application of Concurrency to System Design (ACSD) Pub Date : 2018-06-01 DOI:10.1109/ACSD.2018.00020

Markus Anders, Anoop Bhagyanath, K. Schneider

{"title":"带缓冲处理单元的暴露数据路径体系结构的内存最优代码生成","authors":"Markus Anders, Anoop Bhagyanath, K. Schneider","doi":"10.1109/ACSD.2018.00020","DOIUrl":null,"url":null,"abstract":"One reason for the limited use of instruction level parallelism (ILP) by conventional processors is their use of registers. Therefore, some recent processor architectures expose their datapaths to the compiler so that the compiler can move values directly between processing units. In particular, the Synchronous Control Asynchronous Dataflow (SCAD) machine is an exposed datapath architecture that uses FIFO buffers at the input and output ports of its processing units. Code generation techniques inspired by classic queue machines can completely eliminate the use of conventional registers in SCAD. However, bounded buffer sizes may still make spill code necessary to store values temporarily in main memory. Since memory access is expensive, it has to be avoided to improve the execution time of programs. Memory optimal code generation problems have been extensively studied in the case of register machines and were proven to be NP-complete. In this paper, we prove that memory optimal code generation for SCAD is also NP-complete by presenting a polynomial-time transformation from memory optimal register code to memory optimal SCAD code. In particular, we present a one to one correspondence between the registers in register machines and the entries of buffers in SCAD machines which indicates that these architectures are closer to each other than expected. Still, SCAD machines offer important advantages: The size of circuit implementations of buffers scales much better compared to register files so that more space is available on SCAD machines with the same chip size. Second, the instruction set of SCAD does not depend on a fixed number of registers or buffers. We therefore present experimental results to compare the execution time of memory optimal SCAD code with FIFO buffers and memory optimal code based on conventional register allocation.","PeriodicalId":242721,"journal":{"name":"2018 18th International Conference on Application of Concurrency to System Design (ACSD)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"On Memory Optimal Code Generation for Exposed Datapath Architectures with Buffered Processing Units\",\"authors\":\"Markus Anders, Anoop Bhagyanath, K. Schneider\",\"doi\":\"10.1109/ACSD.2018.00020\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"One reason for the limited use of instruction level parallelism (ILP) by conventional processors is their use of registers. Therefore, some recent processor architectures expose their datapaths to the compiler so that the compiler can move values directly between processing units. In particular, the Synchronous Control Asynchronous Dataflow (SCAD) machine is an exposed datapath architecture that uses FIFO buffers at the input and output ports of its processing units. Code generation techniques inspired by classic queue machines can completely eliminate the use of conventional registers in SCAD. However, bounded buffer sizes may still make spill code necessary to store values temporarily in main memory. Since memory access is expensive, it has to be avoided to improve the execution time of programs. Memory optimal code generation problems have been extensively studied in the case of register machines and were proven to be NP-complete. In this paper, we prove that memory optimal code generation for SCAD is also NP-complete by presenting a polynomial-time transformation from memory optimal register code to memory optimal SCAD code. In particular, we present a one to one correspondence between the registers in register machines and the entries of buffers in SCAD machines which indicates that these architectures are closer to each other than expected. Still, SCAD machines offer important advantages: The size of circuit implementations of buffers scales much better compared to register files so that more space is available on SCAD machines with the same chip size. Second, the instruction set of SCAD does not depend on a fixed number of registers or buffers. We therefore present experimental results to compare the execution time of memory optimal SCAD code with FIFO buffers and memory optimal code based on conventional register allocation.\",\"PeriodicalId\":242721,\"journal\":{\"name\":\"2018 18th International Conference on Application of Concurrency to System Design (ACSD)\",\"volume\":\"5 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 18th International Conference on Application of Concurrency to System Design (ACSD)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ACSD.2018.00020\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 18th International Conference on Application of Concurrency to System Design (ACSD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ACSD.2018.00020","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

摘要

传统处理器对指令级并行性(ILP)使用有限的一个原因是它们对寄存器的使用。因此，一些最新的处理器体系结构向编译器公开了它们的数据路径，以便编译器可以直接在处理单元之间移动值。特别是，同步控制异步数据流(SCAD)机器是一种公开的数据路径架构，它在其处理单元的输入和输出端口使用FIFO缓冲区。受经典队列机启发的代码生成技术可以完全消除SCAD中传统寄存器的使用。然而，有限的缓冲区大小可能仍然需要溢出代码来临时将值存储在主存中。由于内存访问是昂贵的，必须避免它，以提高程序的执行时间。在寄存器机的情况下，内存最优代码生成问题已经得到了广泛的研究，并被证明是np完全的。在本文中，我们通过提出一个从内存最优寄存器码到内存最优SCAD码的多项式时间变换，证明了SCAD的内存最优码生成也是np完全的。特别是，我们提出了寄存器机中的寄存器和SCAD机中的缓冲区条目之间的一对一对应关系，这表明这些体系结构比预期的更接近彼此。尽管如此，SCAD机器仍然提供了重要的优势:与寄存器文件相比，缓冲电路实现的大小可以更好地扩展，因此在具有相同芯片大小的SCAD机器上可以使用更多的空间。其次，SCAD的指令集不依赖于固定数量的寄存器或缓冲区。因此，我们提出了实验结果来比较内存最优SCAD代码与FIFO缓冲区和基于传统寄存器分配的内存最优代码的执行时间。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

On Memory Optimal Code Generation for Exposed Datapath Architectures with Buffered Processing Units

One reason for the limited use of instruction level parallelism (ILP) by conventional processors is their use of registers. Therefore, some recent processor architectures expose their datapaths to the compiler so that the compiler can move values directly between processing units. In particular, the Synchronous Control Asynchronous Dataflow (SCAD) machine is an exposed datapath architecture that uses FIFO buffers at the input and output ports of its processing units. Code generation techniques inspired by classic queue machines can completely eliminate the use of conventional registers in SCAD. However, bounded buffer sizes may still make spill code necessary to store values temporarily in main memory. Since memory access is expensive, it has to be avoided to improve the execution time of programs. Memory optimal code generation problems have been extensively studied in the case of register machines and were proven to be NP-complete. In this paper, we prove that memory optimal code generation for SCAD is also NP-complete by presenting a polynomial-time transformation from memory optimal register code to memory optimal SCAD code. In particular, we present a one to one correspondence between the registers in register machines and the entries of buffers in SCAD machines which indicates that these architectures are closer to each other than expected. Still, SCAD machines offer important advantages: The size of circuit implementations of buffers scales much better compared to register files so that more space is available on SCAD machines with the same chip size. Second, the instruction set of SCAD does not depend on a fixed number of registers or buffers. We therefore present experimental results to compare the execution time of memory optimal SCAD code with FIFO buffers and memory optimal code based on conventional register allocation.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2018 18th International Conference on Application of Concurrency to System Design (ACSD)

自引率

0.00%

发文量