{"title":"L2 Cache Access Pattern Analysis using Static Profiling of an Application","authors":"Theodora Adufu, Yoonhee Kim","doi":"10.1109/COMPSAC57700.2023.00022","DOIUrl":null,"url":null,"abstract":"Cache management is a significant aspect of executing applications on GPUs. With the advancements in GPU architecture, issues such as data reuse, cache line eviction and data residency are to be considered for optimal performance. Frequency of data access from global memory has significant impacts on the performance of the application with increased latencies. However, the L2 cache data residency feature by NVIDIA promises to reduce the overheads associated with frequent data accesses. Through the information extracted from static profiling analysis, we quantitatively analyzed the frequency of data reuse by threads to determine whether an application has frequent data accesses or not. We also estimated the size of access policy window from which persistent data should be cached to avoid stalling of warps. Also with our proposed approach, we observed that L1 cache load throughput increased by 2.75% for GEMM, 0.33% for 2DConv St and 0.46% for 2DConv Large respectively as data was resident in the L2 cache.","PeriodicalId":296288,"journal":{"name":"2023 IEEE 47th Annual Computers, Software, and Applications Conference (COMPSAC)","volume":"179 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE 47th Annual Computers, Software, and Applications Conference (COMPSAC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/COMPSAC57700.2023.00022","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Cache management is a significant aspect of executing applications on GPUs. With the advancements in GPU architecture, issues such as data reuse, cache line eviction and data residency are to be considered for optimal performance. Frequency of data access from global memory has significant impacts on the performance of the application with increased latencies. However, the L2 cache data residency feature by NVIDIA promises to reduce the overheads associated with frequent data accesses. Through the information extracted from static profiling analysis, we quantitatively analyzed the frequency of data reuse by threads to determine whether an application has frequent data accesses or not. We also estimated the size of access policy window from which persistent data should be cached to avoid stalling of warps. Also with our proposed approach, we observed that L1 cache load throughput increased by 2.75% for GEMM, 0.33% for 2DConv St and 0.46% for 2DConv Large respectively as data was resident in the L2 cache.