{"title":"Staggered distribution: a loop allocation scheme for dataflow multiprocessor systems","authors":"J. T. Lim, A. Hurson, B. Lee, B. Shirazi","doi":"10.1109/FMPC.1992.234944","DOIUrl":null,"url":null,"abstract":"The authors present a staggered distribution scheme for DOACROSS loops. The scheme uses heuristics to distribute the loop iterations unevenly among processors in order to mask the delay caused by data dependencies and inter-PE (processing element) communication. Simulation results have shown that this scheme is effective for loops that have a large degree of parallelism among iterations. The scheme, due to its nature, distributes loop iterations among PEs based on architectural characteristics of the underlying organization, i.e. processor speed and communication cost. The maximum speedup attained is very close to the maximum speedup possible for a particular loop even in the presence of inter-PE communication cost. This scheme utilizes processors more efficiently, since, relative to the equal distribution approach, it requires fewer processors to attain maximum speedup. Although this scheme produces an unbalanced distribution among processors, this can be remedied by considering other loops when making the distribution to produce a balanced load among processors.<<ETX>>","PeriodicalId":117789,"journal":{"name":"[Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation","volume":"126 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1992-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"[Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FMPC.1992.234944","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5
Abstract
The authors present a staggered distribution scheme for DOACROSS loops. The scheme uses heuristics to distribute the loop iterations unevenly among processors in order to mask the delay caused by data dependencies and inter-PE (processing element) communication. Simulation results have shown that this scheme is effective for loops that have a large degree of parallelism among iterations. The scheme, due to its nature, distributes loop iterations among PEs based on architectural characteristics of the underlying organization, i.e. processor speed and communication cost. The maximum speedup attained is very close to the maximum speedup possible for a particular loop even in the presence of inter-PE communication cost. This scheme utilizes processors more efficiently, since, relative to the equal distribution approach, it requires fewer processors to attain maximum speedup. Although this scheme produces an unbalanced distribution among processors, this can be remedied by considering other loops when making the distribution to produce a balanced load among processors.<>