Wei Wang, Jun Yao, Youhui Zhang, Wei Xue, Y. Nakashima, Weimin Zheng
{"title":"在FU阵列中加速GRAPES的硬件/软件方法","authors":"Wei Wang, Jun Yao, Youhui Zhang, Wei Xue, Y. Nakashima, Weimin Zheng","doi":"10.1109/CoolChips.2013.6547920","DOIUrl":null,"url":null,"abstract":"In this research, a high performance computing weather forecasting application GRAPES has been tuned onto a functional unit (FU) array based architecture. Software and hardware approaches are specifically employed to increase the data locality and data reuse to accelerate the stencil computation in GRAPES. The simulation results indicate that we can achieve a per-core average IPC of 12.3 within a 20-stage FU array processor, which has a 5.8x power-efficiency boost than the many-core processor (MCP) of a same process technology. This can accordingly slow down the increase of communication by one order in the cluster system, resulting in a 12x power-efficiency boost in all PEs.","PeriodicalId":340576,"journal":{"name":"2013 IEEE COOL Chips XVI","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"HW/SW approaches to accelerate GRAPES in an FU array\",\"authors\":\"Wei Wang, Jun Yao, Youhui Zhang, Wei Xue, Y. Nakashima, Weimin Zheng\",\"doi\":\"10.1109/CoolChips.2013.6547920\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this research, a high performance computing weather forecasting application GRAPES has been tuned onto a functional unit (FU) array based architecture. Software and hardware approaches are specifically employed to increase the data locality and data reuse to accelerate the stencil computation in GRAPES. The simulation results indicate that we can achieve a per-core average IPC of 12.3 within a 20-stage FU array processor, which has a 5.8x power-efficiency boost than the many-core processor (MCP) of a same process technology. This can accordingly slow down the increase of communication by one order in the cluster system, resulting in a 12x power-efficiency boost in all PEs.\",\"PeriodicalId\":340576,\"journal\":{\"name\":\"2013 IEEE COOL Chips XVI\",\"volume\":\"13 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 IEEE COOL Chips XVI\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CoolChips.2013.6547920\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 IEEE COOL Chips XVI","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CoolChips.2013.6547920","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
HW/SW approaches to accelerate GRAPES in an FU array
In this research, a high performance computing weather forecasting application GRAPES has been tuned onto a functional unit (FU) array based architecture. Software and hardware approaches are specifically employed to increase the data locality and data reuse to accelerate the stencil computation in GRAPES. The simulation results indicate that we can achieve a per-core average IPC of 12.3 within a 20-stage FU array processor, which has a 5.8x power-efficiency boost than the many-core processor (MCP) of a same process technology. This can accordingly slow down the increase of communication by one order in the cluster system, resulting in a 12x power-efficiency boost in all PEs.