一种用于集群处理器的统一模调度和寄存器分配技术

Proceedings 2001 International Conference on Parallel Architectures and Compilation Techniques Pub Date : 2001-09-08 DOI:10.1109/PACT.2001.953298

J. M. Codina, Jesús Sánchez, Antonio González

{"title":"一种用于集群处理器的统一模调度和寄存器分配技术","authors":"J. M. Codina, Jesús Sánchez, Antonio González","doi":"10.1109/PACT.2001.953298","DOIUrl":null,"url":null,"abstract":"This work presents a modulo scheduling framework for clustered ILP processors that integrates the cluster assignment, instruction scheduling and register allocation steps in a single phase. This unified approach is more effective than traditional approaches based on sequentially performing some (or all) of the three steps, since it allows optimizing the global code generation problem instead of searching for optimal solutions to each individual step. Besides, it avoids the iterative nature of traditional approaches, which require repeated applications of the three steps until a valid solution is found. The proposed framework includes a mechanism to insert spill code on-the-fly and heuristics to evaluate the quality of partial schedules considering simultaneously inter-cluster communications, memory pressure and register pressure. Transformations that allow trading pressure on a type of resource for another resource are also included. We show that the proposed technique outperforms previously proposed techniques. For instance, the average speed-up for the SPECfp95 is 36% for a 4-cluster configuration.","PeriodicalId":276650,"journal":{"name":"Proceedings 2001 International Conference on Parallel Architectures and Compilation Techniques","volume":"35 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2001-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"51","resultStr":"{\"title\":\"A unified modulo scheduling and register allocation technique for clustered processors\",\"authors\":\"J. M. Codina, Jesús Sánchez, Antonio González\",\"doi\":\"10.1109/PACT.2001.953298\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This work presents a modulo scheduling framework for clustered ILP processors that integrates the cluster assignment, instruction scheduling and register allocation steps in a single phase. This unified approach is more effective than traditional approaches based on sequentially performing some (or all) of the three steps, since it allows optimizing the global code generation problem instead of searching for optimal solutions to each individual step. Besides, it avoids the iterative nature of traditional approaches, which require repeated applications of the three steps until a valid solution is found. The proposed framework includes a mechanism to insert spill code on-the-fly and heuristics to evaluate the quality of partial schedules considering simultaneously inter-cluster communications, memory pressure and register pressure. Transformations that allow trading pressure on a type of resource for another resource are also included. We show that the proposed technique outperforms previously proposed techniques. For instance, the average speed-up for the SPECfp95 is 36% for a 4-cluster configuration.\",\"PeriodicalId\":276650,\"journal\":{\"name\":\"Proceedings 2001 International Conference on Parallel Architectures and Compilation Techniques\",\"volume\":\"35 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2001-09-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"51\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings 2001 International Conference on Parallel Architectures and Compilation Techniques\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/PACT.2001.953298\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings 2001 International Conference on Parallel Architectures and Compilation Techniques","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PACT.2001.953298","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 51

摘要

本文提出了一种用于集群ILP处理器的模调度框架，该框架将集群分配、指令调度和寄存器分配步骤集成在一个阶段中。这种统一的方法比按顺序执行三个步骤中的一些(或全部)的传统方法更有效，因为它允许优化全局代码生成问题，而不是为每个单独的步骤搜索最优解决方案。此外，它避免了传统方法的迭代性，传统方法需要重复应用这三个步骤，直到找到一个有效的解决方案。该框架包括动态插入溢出代码的机制，以及同时考虑集群间通信、内存压力和寄存器压力来评估部分调度质量的启发式方法。还包括允许将一种资源类型的压力交换为另一种资源的转换。我们表明，所提出的技术优于先前提出的技术。例如，对于4集群配置，SPECfp95的平均加速速度为36%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A unified modulo scheduling and register allocation technique for clustered processors

This work presents a modulo scheduling framework for clustered ILP processors that integrates the cluster assignment, instruction scheduling and register allocation steps in a single phase. This unified approach is more effective than traditional approaches based on sequentially performing some (or all) of the three steps, since it allows optimizing the global code generation problem instead of searching for optimal solutions to each individual step. Besides, it avoids the iterative nature of traditional approaches, which require repeated applications of the three steps until a valid solution is found. The proposed framework includes a mechanism to insert spill code on-the-fly and heuristics to evaluate the quality of partial schedules considering simultaneously inter-cluster communications, memory pressure and register pressure. Transformations that allow trading pressure on a type of resource for another resource are also included. We show that the proposed technique outperforms previously proposed techniques. For instance, the average speed-up for the SPECfp95 is 36% for a 4-cluster configuration.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings 2001 International Conference on Parallel Architectures and Compilation Techniques

自引率

0.00%

发文量