Stijn Eyerman, L. Eeckhout, T. Karkhanis, James E. Smith
{"title":"用于计算精确CPI组件的性能计数器架构","authors":"Stijn Eyerman, L. Eeckhout, T. Karkhanis, James E. Smith","doi":"10.1145/1168857.1168880","DOIUrl":null,"url":null,"abstract":"A common way of representing processor performance is to use Cycles per Instruction (CPI) `stacks' which break performance into a baseline CPI plus a number of individual miss event CPI components. CPI stacks can be very helpful in gaining insight into the behavior of an application on a given microprocessor; consequently, they are widely used by software application developers and computer architects. However, computing CPI stacks on superscalar out-of-order processors is challenging because of various overlaps among execution and miss events (cache misses, TLB misses, and branch mispredictions).This paper shows that meaningful and accurate CPI stacks can be computed for superscalar out-of-order processors. Using interval analysis, a novel method for analyzing out-of-order processor performance, we gain understanding into the performance impact of the various miss events. Based on this understanding, we propose a novel way of architecting hardware performance counters for building accurate CPI stacks. The additional hardware for implementing these counters is limited and comparable to existing hardware performance counter architectures while being significantly more accurate than previous approaches.","PeriodicalId":270694,"journal":{"name":"ASPLOS XII","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2006-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"182","resultStr":"{\"title\":\"A performance counter architecture for computing accurate CPI components\",\"authors\":\"Stijn Eyerman, L. Eeckhout, T. Karkhanis, James E. Smith\",\"doi\":\"10.1145/1168857.1168880\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A common way of representing processor performance is to use Cycles per Instruction (CPI) `stacks' which break performance into a baseline CPI plus a number of individual miss event CPI components. CPI stacks can be very helpful in gaining insight into the behavior of an application on a given microprocessor; consequently, they are widely used by software application developers and computer architects. However, computing CPI stacks on superscalar out-of-order processors is challenging because of various overlaps among execution and miss events (cache misses, TLB misses, and branch mispredictions).This paper shows that meaningful and accurate CPI stacks can be computed for superscalar out-of-order processors. Using interval analysis, a novel method for analyzing out-of-order processor performance, we gain understanding into the performance impact of the various miss events. Based on this understanding, we propose a novel way of architecting hardware performance counters for building accurate CPI stacks. The additional hardware for implementing these counters is limited and comparable to existing hardware performance counter architectures while being significantly more accurate than previous approaches.\",\"PeriodicalId\":270694,\"journal\":{\"name\":\"ASPLOS XII\",\"volume\":\"17 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2006-10-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"182\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ASPLOS XII\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/1168857.1168880\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ASPLOS XII","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1168857.1168880","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A performance counter architecture for computing accurate CPI components
A common way of representing processor performance is to use Cycles per Instruction (CPI) `stacks' which break performance into a baseline CPI plus a number of individual miss event CPI components. CPI stacks can be very helpful in gaining insight into the behavior of an application on a given microprocessor; consequently, they are widely used by software application developers and computer architects. However, computing CPI stacks on superscalar out-of-order processors is challenging because of various overlaps among execution and miss events (cache misses, TLB misses, and branch mispredictions).This paper shows that meaningful and accurate CPI stacks can be computed for superscalar out-of-order processors. Using interval analysis, a novel method for analyzing out-of-order processor performance, we gain understanding into the performance impact of the various miss events. Based on this understanding, we propose a novel way of architecting hardware performance counters for building accurate CPI stacks. The additional hardware for implementing these counters is limited and comparable to existing hardware performance counter architectures while being significantly more accurate than previous approaches.