{"title":"通过性能建模了解应用程序","authors":"G. Marin, J. Mellor-Crummey","doi":"10.1109/PCCC.2007.358880","DOIUrl":null,"url":null,"abstract":"Tuning the performance of applications requires understanding the interactions between code and target architecture. This paper describes a performance modeling approach that not only makes accurate predictions about the behavior of an application on a target architecture for different inputs, but also provides guidance for tuning by high-lighting the factors that limit performance in each section of a program. We introduce two new performance metrics that estimate the maximum gain expected from tuning different parts of an application, or from increasing the number of machine resources. We show how this metric helped identify a bottleneck in the ASCI SweepSD benchmark where the lack of instruction-level parallelism limited performance. Transforming one frequently executed loop to ameliorate this bottleneck improved performance by 16% on an Itanium2 system.","PeriodicalId":356565,"journal":{"name":"2007 IEEE International Performance, Computing, and Communications Conference","volume":"117 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2007-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":"{\"title\":\"Application Insight Through Performance Modeling\",\"authors\":\"G. Marin, J. Mellor-Crummey\",\"doi\":\"10.1109/PCCC.2007.358880\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Tuning the performance of applications requires understanding the interactions between code and target architecture. This paper describes a performance modeling approach that not only makes accurate predictions about the behavior of an application on a target architecture for different inputs, but also provides guidance for tuning by high-lighting the factors that limit performance in each section of a program. We introduce two new performance metrics that estimate the maximum gain expected from tuning different parts of an application, or from increasing the number of machine resources. We show how this metric helped identify a bottleneck in the ASCI SweepSD benchmark where the lack of instruction-level parallelism limited performance. Transforming one frequently executed loop to ameliorate this bottleneck improved performance by 16% on an Itanium2 system.\",\"PeriodicalId\":356565,\"journal\":{\"name\":\"2007 IEEE International Performance, Computing, and Communications Conference\",\"volume\":\"117 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2007-04-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"12\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2007 IEEE International Performance, Computing, and Communications Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/PCCC.2007.358880\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2007 IEEE International Performance, Computing, and Communications Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PCCC.2007.358880","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Tuning the performance of applications requires understanding the interactions between code and target architecture. This paper describes a performance modeling approach that not only makes accurate predictions about the behavior of an application on a target architecture for different inputs, but also provides guidance for tuning by high-lighting the factors that limit performance in each section of a program. We introduce two new performance metrics that estimate the maximum gain expected from tuning different parts of an application, or from increasing the number of machine resources. We show how this metric helped identify a bottleneck in the ASCI SweepSD benchmark where the lack of instruction-level parallelism limited performance. Transforming one frequently executed loop to ameliorate this bottleneck improved performance by 16% on an Itanium2 system.