{"title":"Analyzing Performance of Multi-cores and Applications with Cache-aware Roofline Model","authors":"Diogo Marques, Helder Duarte, L. Sousa, A. Ilic","doi":"10.1109/HPCS.2017.158","DOIUrl":null,"url":null,"abstract":"To satisfy growing computational demands of modern applications, significant enhancements have been introduced in the contemporary processor architectures with the aim to increase their attainable performance, such as increased number of cores, improved capability of memory subsystem and enhancements in the processor pipeline [1]. Therefore, the performance improvements are usually coupled with an increased complexity at the architecture level, which imposes additional challenges when designing, prototyping and optimizing the execution of real-world applications on a given compute platform. Since the application performance depends on multiple factors, e.g., multi-threading, vectorization efficiency and memory accesses, achieving the most efficient execution is not a trivial task, especially when aiming at fully exploiting the capabilities of modern multi-core processors.","PeriodicalId":115758,"journal":{"name":"2017 International Conference on High Performance Computing & Simulation (HPCS)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 International Conference on High Performance Computing & Simulation (HPCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPCS.2017.158","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
To satisfy growing computational demands of modern applications, significant enhancements have been introduced in the contemporary processor architectures with the aim to increase their attainable performance, such as increased number of cores, improved capability of memory subsystem and enhancements in the processor pipeline [1]. Therefore, the performance improvements are usually coupled with an increased complexity at the architecture level, which imposes additional challenges when designing, prototyping and optimizing the execution of real-world applications on a given compute platform. Since the application performance depends on multiple factors, e.g., multi-threading, vectorization efficiency and memory accesses, achieving the most efficient execution is not a trivial task, especially when aiming at fully exploiting the capabilities of modern multi-core processors.