{"title":"On Memory Optimal Code Generation for Exposed Datapath Architectures with Buffered Processing Units","authors":"Markus Anders, Anoop Bhagyanath, K. Schneider","doi":"10.1109/ACSD.2018.00020","DOIUrl":null,"url":null,"abstract":"One reason for the limited use of instruction level parallelism (ILP) by conventional processors is their use of registers. Therefore, some recent processor architectures expose their datapaths to the compiler so that the compiler can move values directly between processing units. In particular, the Synchronous Control Asynchronous Dataflow (SCAD) machine is an exposed datapath architecture that uses FIFO buffers at the input and output ports of its processing units. Code generation techniques inspired by classic queue machines can completely eliminate the use of conventional registers in SCAD. However, bounded buffer sizes may still make spill code necessary to store values temporarily in main memory. Since memory access is expensive, it has to be avoided to improve the execution time of programs. Memory optimal code generation problems have been extensively studied in the case of register machines and were proven to be NP-complete. In this paper, we prove that memory optimal code generation for SCAD is also NP-complete by presenting a polynomial-time transformation from memory optimal register code to memory optimal SCAD code. In particular, we present a one to one correspondence between the registers in register machines and the entries of buffers in SCAD machines which indicates that these architectures are closer to each other than expected. Still, SCAD machines offer important advantages: The size of circuit implementations of buffers scales much better compared to register files so that more space is available on SCAD machines with the same chip size. Second, the instruction set of SCAD does not depend on a fixed number of registers or buffers. We therefore present experimental results to compare the execution time of memory optimal SCAD code with FIFO buffers and memory optimal code based on conventional register allocation.","PeriodicalId":242721,"journal":{"name":"2018 18th International Conference on Application of Concurrency to System Design (ACSD)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 18th International Conference on Application of Concurrency to System Design (ACSD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ACSD.2018.00020","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5
Abstract
One reason for the limited use of instruction level parallelism (ILP) by conventional processors is their use of registers. Therefore, some recent processor architectures expose their datapaths to the compiler so that the compiler can move values directly between processing units. In particular, the Synchronous Control Asynchronous Dataflow (SCAD) machine is an exposed datapath architecture that uses FIFO buffers at the input and output ports of its processing units. Code generation techniques inspired by classic queue machines can completely eliminate the use of conventional registers in SCAD. However, bounded buffer sizes may still make spill code necessary to store values temporarily in main memory. Since memory access is expensive, it has to be avoided to improve the execution time of programs. Memory optimal code generation problems have been extensively studied in the case of register machines and were proven to be NP-complete. In this paper, we prove that memory optimal code generation for SCAD is also NP-complete by presenting a polynomial-time transformation from memory optimal register code to memory optimal SCAD code. In particular, we present a one to one correspondence between the registers in register machines and the entries of buffers in SCAD machines which indicates that these architectures are closer to each other than expected. Still, SCAD machines offer important advantages: The size of circuit implementations of buffers scales much better compared to register files so that more space is available on SCAD machines with the same chip size. Second, the instruction set of SCAD does not depend on a fixed number of registers or buffers. We therefore present experimental results to compare the execution time of memory optimal SCAD code with FIFO buffers and memory optimal code based on conventional register allocation.