{"title":"Exploiting Outer Loop Parallelism of Nested Loop on Coarse-Grained Reconfigurable Architectures","authors":"Dajiang Liu, S. Yin, Leibo Liu, Shaojun Wei","doi":"10.1109/FCCM.2014.19","DOIUrl":null,"url":null,"abstract":"A coarse-grained reconfigurable architecture is a promising architecture with high power efficiency, which is typically composed of a host controller and a processing element array (PEA). Loops are often mapped onto PEAs for acceleration. In previous work, innermost loop is pipelined, and the the maximal number of concurrently executable operators (CEOs) in the kernel is limited by the inner loop. The loop body DFG of the input 2D nested loop with a inner loop carried dependence ([0,1]) and outer loop carried dependence ([1,1]). We would map this loop onto a 4×4 PEA with pipelining. We assume that the latency of executing one loop iteration is Lb, and the number of iterations involved at one cycle in the kernel phase of pipelining is Wk. As there is a inner loop dependence ([0,1]), the initiation interval (IIi) of inner loop pipelining could be minimized to 1 and we get Wk = 4. We also note that the angle α is contained by two sides in Figure 1(b), which could be written as follow: tan(α) = Wk/Lb = 1/IIi.","PeriodicalId":246162,"journal":{"name":"2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines","volume":"58 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FCCM.2014.19","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
A coarse-grained reconfigurable architecture is a promising architecture with high power efficiency, which is typically composed of a host controller and a processing element array (PEA). Loops are often mapped onto PEAs for acceleration. In previous work, innermost loop is pipelined, and the the maximal number of concurrently executable operators (CEOs) in the kernel is limited by the inner loop. The loop body DFG of the input 2D nested loop with a inner loop carried dependence ([0,1]) and outer loop carried dependence ([1,1]). We would map this loop onto a 4×4 PEA with pipelining. We assume that the latency of executing one loop iteration is Lb, and the number of iterations involved at one cycle in the kernel phase of pipelining is Wk. As there is a inner loop dependence ([0,1]), the initiation interval (IIi) of inner loop pipelining could be minimized to 1 and we get Wk = 4. We also note that the angle α is contained by two sides in Figure 1(b), which could be written as follow: tan(α) = Wk/Lb = 1/IIi.