{"title":"分布式模调度","authors":"M. M. Fernandes, J. Llosa, N. Topham","doi":"10.1109/HPCA.1999.744349","DOIUrl":null,"url":null,"abstract":"Wide-issue ILP machines can be built using the VLIW approach as many of the hardware complexities found in superscalar processors can be transferred to the compiler. However, the scalability of VLIW architectures is still constrained by the size and number of ports of the register file required by a large number of functional units. Organizations composed of clusters of a few functional units and small private register files have been proposed to deal with this problem; an approach highly dependent on scheduling and partitioning strategies. The paper presents DMS, an algorithm that integrates modulo scheduling and code partitioning in a single procedure. Experimental results have shown that the algorithm is effective for configurations up to 8 clusters, or even more when targeting vectorizable loops.","PeriodicalId":287867,"journal":{"name":"Proceedings Fifth International Symposium on High-Performance Computer Architecture","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1999-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"48","resultStr":"{\"title\":\"Distributed modulo scheduling\",\"authors\":\"M. M. Fernandes, J. Llosa, N. Topham\",\"doi\":\"10.1109/HPCA.1999.744349\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Wide-issue ILP machines can be built using the VLIW approach as many of the hardware complexities found in superscalar processors can be transferred to the compiler. However, the scalability of VLIW architectures is still constrained by the size and number of ports of the register file required by a large number of functional units. Organizations composed of clusters of a few functional units and small private register files have been proposed to deal with this problem; an approach highly dependent on scheduling and partitioning strategies. The paper presents DMS, an algorithm that integrates modulo scheduling and code partitioning in a single procedure. Experimental results have shown that the algorithm is effective for configurations up to 8 clusters, or even more when targeting vectorizable loops.\",\"PeriodicalId\":287867,\"journal\":{\"name\":\"Proceedings Fifth International Symposium on High-Performance Computer Architecture\",\"volume\":\"8 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1999-01-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"48\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings Fifth International Symposium on High-Performance Computer Architecture\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/HPCA.1999.744349\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings Fifth International Symposium on High-Performance Computer Architecture","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPCA.1999.744349","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Wide-issue ILP machines can be built using the VLIW approach as many of the hardware complexities found in superscalar processors can be transferred to the compiler. However, the scalability of VLIW architectures is still constrained by the size and number of ports of the register file required by a large number of functional units. Organizations composed of clusters of a few functional units and small private register files have been proposed to deal with this problem; an approach highly dependent on scheduling and partitioning strategies. The paper presents DMS, an algorithm that integrates modulo scheduling and code partitioning in a single procedure. Experimental results have shown that the algorithm is effective for configurations up to 8 clusters, or even more when targeting vectorizable loops.