{"title":"Instruction-level characterization of scientific computing applications using hardware performance counters","authors":"Yong Luo, K. Cameron","doi":"10.1109/WWC.1998.809368","DOIUrl":null,"url":null,"abstract":"The paper provides characterization methods based on empirical performance counter measurements. In particular, we provide an instruction-level characterization derived empirically in an effort to demonstrate how architectural limitations in underlying hardware will affect the performance of existing codes. Preliminary results provide promise in code characterization, and empirical/analytical modeling. These include the ability to quantify outstanding miss utilization and stall time attributable to architectural limitations in the CPU and the memory hierarchy. This work further promises insight into quantifying bounds for CPI/sub 0/ or the ideal CPI with infinite, perfect L1 cache. In general, if we can characterize workloads using parameters that are independent of architecture, such as this work, then we can more appropriately compare different architectures in an effort to direct processor/code development.","PeriodicalId":190931,"journal":{"name":"Workload Characterization: Methodology and Case Studies. Based on the First Workshop on Workload Characterization","volume":"31 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1998-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Workload Characterization: Methodology and Case Studies. Based on the First Workshop on Workload Characterization","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WWC.1998.809368","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 9
Abstract
The paper provides characterization methods based on empirical performance counter measurements. In particular, we provide an instruction-level characterization derived empirically in an effort to demonstrate how architectural limitations in underlying hardware will affect the performance of existing codes. Preliminary results provide promise in code characterization, and empirical/analytical modeling. These include the ability to quantify outstanding miss utilization and stall time attributable to architectural limitations in the CPU and the memory hierarchy. This work further promises insight into quantifying bounds for CPI/sub 0/ or the ideal CPI with infinite, perfect L1 cache. In general, if we can characterize workloads using parameters that are independent of architecture, such as this work, then we can more appropriately compare different architectures in an effort to direct processor/code development.