{"title":"RpStacks:快速和准确的处理器设计空间探索使用代表性的停滞事件堆栈","authors":"Jaewon Lee, Hanhwi Jang, Jangwoo Kim","doi":"10.1109/MICRO.2014.26","DOIUrl":null,"url":null,"abstract":"CPU architects perform a series of slow timing simulations to explore large processor design space. To minimize the exploration overhead, architects make their best efforts to accelerate each simulation step as well as reduce the number of simulations by predicting the exact performance of designs. However, the existing methods are either too slow to overcome the large number of design points, or inaccurate to safely substitute extra simulation steps with performance predictions. In this paper, we propose RpStacks, a fast and accurate processor design space exploration method to 1) identify the current design point's key performance bottlenecks and 2) estimate the exact impacts of latency adjustments without launching an extra step of simulations. The key idea is to selectively collect the information about performance-critical events from a single simulation, construct a small number of event stacks describing the latency of distinctive execution paths, and estimate the overall performance as well as stall-event composition using the stacks. Our proposed method significantly outperforms the existing design space exploration methods in terms of both the latency and the accuracy. For investigating 1,000 design points, RpStacks achieves 26 times speedup on average over a variety of applications while showing high accuracy, when compared to a popular x86 timing simulator.","PeriodicalId":6591,"journal":{"name":"2014 47th Annual IEEE/ACM International Symposium on Microarchitecture","volume":"25 1","pages":"255-267"},"PeriodicalIF":0.0000,"publicationDate":"2014-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"18","resultStr":"{\"title\":\"RpStacks: Fast and Accurate Processor Design Space Exploration Using Representative Stall-Event Stacks\",\"authors\":\"Jaewon Lee, Hanhwi Jang, Jangwoo Kim\",\"doi\":\"10.1109/MICRO.2014.26\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"CPU architects perform a series of slow timing simulations to explore large processor design space. To minimize the exploration overhead, architects make their best efforts to accelerate each simulation step as well as reduce the number of simulations by predicting the exact performance of designs. However, the existing methods are either too slow to overcome the large number of design points, or inaccurate to safely substitute extra simulation steps with performance predictions. In this paper, we propose RpStacks, a fast and accurate processor design space exploration method to 1) identify the current design point's key performance bottlenecks and 2) estimate the exact impacts of latency adjustments without launching an extra step of simulations. The key idea is to selectively collect the information about performance-critical events from a single simulation, construct a small number of event stacks describing the latency of distinctive execution paths, and estimate the overall performance as well as stall-event composition using the stacks. Our proposed method significantly outperforms the existing design space exploration methods in terms of both the latency and the accuracy. For investigating 1,000 design points, RpStacks achieves 26 times speedup on average over a variety of applications while showing high accuracy, when compared to a popular x86 timing simulator.\",\"PeriodicalId\":6591,\"journal\":{\"name\":\"2014 47th Annual IEEE/ACM International Symposium on Microarchitecture\",\"volume\":\"25 1\",\"pages\":\"255-267\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-12-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"18\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 47th Annual IEEE/ACM International Symposium on Microarchitecture\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/MICRO.2014.26\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 47th Annual IEEE/ACM International Symposium on Microarchitecture","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MICRO.2014.26","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
RpStacks: Fast and Accurate Processor Design Space Exploration Using Representative Stall-Event Stacks
CPU architects perform a series of slow timing simulations to explore large processor design space. To minimize the exploration overhead, architects make their best efforts to accelerate each simulation step as well as reduce the number of simulations by predicting the exact performance of designs. However, the existing methods are either too slow to overcome the large number of design points, or inaccurate to safely substitute extra simulation steps with performance predictions. In this paper, we propose RpStacks, a fast and accurate processor design space exploration method to 1) identify the current design point's key performance bottlenecks and 2) estimate the exact impacts of latency adjustments without launching an extra step of simulations. The key idea is to selectively collect the information about performance-critical events from a single simulation, construct a small number of event stacks describing the latency of distinctive execution paths, and estimate the overall performance as well as stall-event composition using the stacks. Our proposed method significantly outperforms the existing design space exploration methods in terms of both the latency and the accuracy. For investigating 1,000 design points, RpStacks achieves 26 times speedup on average over a variety of applications while showing high accuracy, when compared to a popular x86 timing simulator.