Wei Wang, Jun Yao, Youhui Zhang, Wei Xue, Y. Nakashima, Weimin Zheng
{"title":"HW/SW approaches to accelerate GRAPES in an FU array","authors":"Wei Wang, Jun Yao, Youhui Zhang, Wei Xue, Y. Nakashima, Weimin Zheng","doi":"10.1109/CoolChips.2013.6547920","DOIUrl":null,"url":null,"abstract":"In this research, a high performance computing weather forecasting application GRAPES has been tuned onto a functional unit (FU) array based architecture. Software and hardware approaches are specifically employed to increase the data locality and data reuse to accelerate the stencil computation in GRAPES. The simulation results indicate that we can achieve a per-core average IPC of 12.3 within a 20-stage FU array processor, which has a 5.8x power-efficiency boost than the many-core processor (MCP) of a same process technology. This can accordingly slow down the increase of communication by one order in the cluster system, resulting in a 12x power-efficiency boost in all PEs.","PeriodicalId":340576,"journal":{"name":"2013 IEEE COOL Chips XVI","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 IEEE COOL Chips XVI","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CoolChips.2013.6547920","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In this research, a high performance computing weather forecasting application GRAPES has been tuned onto a functional unit (FU) array based architecture. Software and hardware approaches are specifically employed to increase the data locality and data reuse to accelerate the stencil computation in GRAPES. The simulation results indicate that we can achieve a per-core average IPC of 12.3 within a 20-stage FU array processor, which has a 5.8x power-efficiency boost than the many-core processor (MCP) of a same process technology. This can accordingly slow down the increase of communication by one order in the cluster system, resulting in a 12x power-efficiency boost in all PEs.