Lazy Data Routing And Greedy Scheduling For Application-specific Signal Processors

[1988] Proceedings of the 21st Annual Workshop on Microprogramming and Microarchitecture - MICRO '21 Pub Date : 1988-01-03 DOI:10.1145/62504.62676

K. Rimey, P. Hilfinger

{"title":"Lazy Data Routing And Greedy Scheduling For Application-specific Signal Processors","authors":"K. Rimey, P. Hilfinger","doi":"10.1145/62504.62676","DOIUrl":null,"url":null,"abstract":"This paper concerns code generation for a troublesome class of horizontal-instruction-word architectures (whose machine language resembles horizontal microcode). These are application-specifrcprocessors, minimalistic programmable processors to be incorporated into application-specific signal processing chips. The processors of interest afford some opportunity for pipelined and for parallel operation of functional units, but do not provide enough bandwidth to store intermediate results in memory or in a register file. Instead, a typical datapath provides direct connections between functional units (often through pipeline registers), forming an irregular network. The usual way to generate horizontal code is to fist generate a loose sequence of microoperations (vertical code) and then pack these tightly into instructions in a compaction post-pass. Local compaction, which packs one straight-line code segment at a time, is now well-understood; theresearch community has largely shifted its attention to global compaction. For our application-specific processors, however, packing microoperations in a separate pass works poorly and generating good horizontal code for even straight-line code segments presents a challenge. Not only must the code generator choose which functional units to use; it must also choose how to route each intermediate result from the output of one functional unit to the input of another. This task is called data routing. How best to route a particular value depends on the time interval between its definition and use or uses, as well as on the datapath resources that are free during that interval. For this reason we abandon the compaction post-pass, and instead pack or schedule microoperations as they are generated. We consider only local scheduling in this paper. Our local scheduler is similar to the “operation scheduler” developed by Fisher et al. [l] for use in a trace-scheduling compiler for a VLIW supercomputer. However, we consider machines in which intermediate results must often reside in hot spots such as busses and latches as well as registers that would obstruct computation if tied up. Like Fisher et al.,","PeriodicalId":378625,"journal":{"name":"[1988] Proceedings of the 21st Annual Workshop on Microprogramming and Microarchitecture - MICRO '21","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1988-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"21","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"[1988] Proceedings of the 21st Annual Workshop on Microprogramming and Microarchitecture - MICRO '21","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/62504.62676","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 21

Abstract

This paper concerns code generation for a troublesome class of horizontal-instruction-word architectures (whose machine language resembles horizontal microcode). These are application-specifrcprocessors, minimalistic programmable processors to be incorporated into application-specific signal processing chips. The processors of interest afford some opportunity for pipelined and for parallel operation of functional units, but do not provide enough bandwidth to store intermediate results in memory or in a register file. Instead, a typical datapath provides direct connections between functional units (often through pipeline registers), forming an irregular network. The usual way to generate horizontal code is to fist generate a loose sequence of microoperations (vertical code) and then pack these tightly into instructions in a compaction post-pass. Local compaction, which packs one straight-line code segment at a time, is now well-understood; theresearch community has largely shifted its attention to global compaction. For our application-specific processors, however, packing microoperations in a separate pass works poorly and generating good horizontal code for even straight-line code segments presents a challenge. Not only must the code generator choose which functional units to use; it must also choose how to route each intermediate result from the output of one functional unit to the input of another. This task is called data routing. How best to route a particular value depends on the time interval between its definition and use or uses, as well as on the datapath resources that are free during that interval. For this reason we abandon the compaction post-pass, and instead pack or schedule microoperations as they are generated. We consider only local scheduling in this paper. Our local scheduler is similar to the “operation scheduler” developed by Fisher et al. [l] for use in a trace-scheduling compiler for a VLIW supercomputer. However, we consider machines in which intermediate results must often reside in hot spots such as busses and latches as well as registers that would obstruct computation if tied up. Like Fisher et al.,

查看原文本刊更多论文

针对特定应用的信号处理器的延迟数据路由和贪婪调度

本文研究了一类棘手的水平指令词体系结构(其机器语言类似于水平微码)的代码生成。这些是应用专用处理器，极简的可编程处理器，可集成到应用专用信号处理芯片中。感兴趣的处理器为功能单元的流水线和并行操作提供了一些机会，但没有提供足够的带宽将中间结果存储在内存或寄存器文件中。相反，典型的数据路径提供功能单元之间的直接连接(通常通过管道寄存器)，形成不规则的网络。生成水平代码的通常方法是首先生成一个松散的微操作序列(垂直代码)，然后在压缩后传递中将它们紧密地打包成指令。局部压缩，一次打包一个直线代码段，现在已经很好理解了;研究界已经在很大程度上将注意力转移到了全球压实上。然而，对于我们的特定于应用程序的处理器来说，将微操作打包到单独的通道中效果很差，甚至为直线代码段生成良好的水平代码也是一个挑战。代码生成器不仅要选择要使用的功能单元;它还必须选择如何将每个中间结果从一个功能单元的输出路由到另一个功能单元的输入。这个任务称为数据路由。如何最好地路由一个特定值取决于它的定义和使用之间的时间间隔，以及该时间间隔内空闲的数据路径资源。由于这个原因，我们放弃了传递后的压缩，而是在微操作生成时打包或调度微操作。本文只考虑局部调度问题。我们的本地调度器类似于Fisher等人开发的“操作调度器”[1]，用于VLIW超级计算机的跟踪调度编译器。然而，我们考虑的机器中，中间结果通常必须驻留在热点(如总线和锁存器)以及寄存器中，这些寄存器如果被捆绑起来会阻碍计算。像Fisher等人一样，

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

[1988] Proceedings of the 21st Annual Workshop on Microprogramming and Microarchitecture - MICRO '21

自引率

0.00%

发文量