{"title":"Exploiting procedure level locality to reduce instruction cache misses","authors":"Ravi V. Batchu, Daniel A. Jiménez","doi":"10.1109/INTERA.2004.1299512","DOIUrl":"https://doi.org/10.1109/INTERA.2004.1299512","url":null,"abstract":"High instruction fetch bandwidth is essential for high performance in today's wide-issue out-of-order processors. Instruction caches must provide a low miss rate as well as low latency. We introduce procedure level relocation, a class of dynamic feedback-directed optimizations that substantially reduce the instruction cache miss rate by exploiting the temporal locality of procedure usage. Based on the observation that half of all procedures executed are at most 128 bytes in length, we present a small procedure cache, a small and fast explicitly managed memory for storing small procedures. We show that procedure level relocation into a small procedure cache reduces the instruction cache miss rate by an average of 15%.","PeriodicalId":262940,"journal":{"name":"Eighth Workshop on Interaction between Compilers and Computer Architectures, 2004. INTERACT-8 2004.","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129674136","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Continuous trip count profiling for loop optimization in two-phase dynamic binary translators","authors":"Youfeng Wu, M. Breternitz, Tevi Devor","doi":"10.1109/INTERA.2004.1299505","DOIUrl":"https://doi.org/10.1109/INTERA.2004.1299505","url":null,"abstract":"Most dynamic binary translators use a two-phase approach to identify and optimize frequently executed code dynamically. In the profiling phase, blocks of code are interpreted or translated without optimization to collect execution frequency information for the blocks. In the optimization phase, frequently executed blocks are grouped into regions and advanced optimizations are applied on them. This approach implicitly assumes that the initial execution of each block is representative of the block throughout its lifetime. In particular, loop optimizations may use the block frequency information to determine loop trip counts to guide their optimizations. If the trip count information is incorrect, however, a loop may be improperly optimized, and program performance suffers. In this paper we show that the initial profile is inadequate at predicting loop trip count information for several integer programs. We propose and evaluate efficient algorithms to continuously profile for trip count. Our results show that accurate trip count information may be obtained with very low overhead (about 0.5%). This enables advanced loop optimizations in dynamic binary translators.","PeriodicalId":262940,"journal":{"name":"Eighth Workshop on Interaction between Compilers and Computer Architectures, 2004. INTERACT-8 2004.","volume":"11 4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125684434","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}