{"title":"Lazy Data Routing And Greedy Scheduling For Application-specific Signal Processors","authors":"K. Rimey, P. Hilfinger","doi":"10.1145/62504.62676","DOIUrl":null,"url":null,"abstract":"This paper concerns code generation for a troublesome class of horizontal-instruction-word architectures (whose machine language resembles horizontal microcode). These are application-specifrcprocessors, minimalistic programmable processors to be incorporated into application-specific signal processing chips. The processors of interest afford some opportunity for pipelined and for parallel operation of functional units, but do not provide enough bandwidth to store intermediate results in memory or in a register file. Instead, a typical datapath provides direct connections between functional units (often through pipeline registers), forming an irregular network. The usual way to generate horizontal code is to fist generate a loose sequence of microoperations (vertical code) and then pack these tightly into instructions in a compaction post-pass. Local compaction, which packs one straight-line code segment at a time, is now well-understood; theresearch community has largely shifted its attention to global compaction. For our application-specific processors, however, packing microoperations in a separate pass works poorly and generating good horizontal code for even straight-line code segments presents a challenge. Not only must the code generator choose which functional units to use; it must also choose how to route each intermediate result from the output of one functional unit to the input of another. This task is called data routing. How best to route a particular value depends on the time interval between its definition and use or uses, as well as on the datapath resources that are free during that interval. For this reason we abandon the compaction post-pass, and instead pack or schedule microoperations as they are generated. We consider only local scheduling in this paper. Our local scheduler is similar to the “operation scheduler” developed by Fisher et al. [l] for use in a trace-scheduling compiler for a VLIW supercomputer. However, we consider machines in which intermediate results must often reside in hot spots such as busses and latches as well as registers that would obstruct computation if tied up. Like Fisher et al.,","PeriodicalId":378625,"journal":{"name":"[1988] Proceedings of the 21st Annual Workshop on Microprogramming and Microarchitecture - MICRO '21","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1988-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"21","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"[1988] Proceedings of the 21st Annual Workshop on Microprogramming and Microarchitecture - MICRO '21","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/62504.62676","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 21
Abstract
This paper concerns code generation for a troublesome class of horizontal-instruction-word architectures (whose machine language resembles horizontal microcode). These are application-specifrcprocessors, minimalistic programmable processors to be incorporated into application-specific signal processing chips. The processors of interest afford some opportunity for pipelined and for parallel operation of functional units, but do not provide enough bandwidth to store intermediate results in memory or in a register file. Instead, a typical datapath provides direct connections between functional units (often through pipeline registers), forming an irregular network. The usual way to generate horizontal code is to fist generate a loose sequence of microoperations (vertical code) and then pack these tightly into instructions in a compaction post-pass. Local compaction, which packs one straight-line code segment at a time, is now well-understood; theresearch community has largely shifted its attention to global compaction. For our application-specific processors, however, packing microoperations in a separate pass works poorly and generating good horizontal code for even straight-line code segments presents a challenge. Not only must the code generator choose which functional units to use; it must also choose how to route each intermediate result from the output of one functional unit to the input of another. This task is called data routing. How best to route a particular value depends on the time interval between its definition and use or uses, as well as on the datapath resources that are free during that interval. For this reason we abandon the compaction post-pass, and instead pack or schedule microoperations as they are generated. We consider only local scheduling in this paper. Our local scheduler is similar to the “operation scheduler” developed by Fisher et al. [l] for use in a trace-scheduling compiler for a VLIW supercomputer. However, we consider machines in which intermediate results must often reside in hot spots such as busses and latches as well as registers that would obstruct computation if tied up. Like Fisher et al.,