Francis Giraldeau, Naser Ezzati-Jivan, M. Dagenais
{"title":"System execution path profiling using hardware performance counters","authors":"Francis Giraldeau, Naser Ezzati-Jivan, M. Dagenais","doi":"10.1109/SysCon48628.2021.9447121","DOIUrl":null,"url":null,"abstract":"The task critical execution path, obtained from a kernel trace, reports the time spent waiting for each task involved in a heterogeneous and distributed application. However, additional profiling is needed to understand and identify the problematic code associated with long-lasting path edges. Hardware counter sampling provides insight on software performance at the microarchitecture level, for instance extracting the call stack every 100K execution cycles to understand where the execution time is spent. Similarly, extracting the call stack at the end of a long waiting system call is often useful. This technique is readily available for either statically or JIT compiled code. However, interpreted code is indirectly executed on the processor and the link between the statements and the executed assembly is missing. We propose an architecture to efficiently record call stacks along the execution path, including interpreted programs, in a low intrusive way that maintains the abstraction boundary between the kernel, the interpreter, and the user code. The method consists in sending a signal from within the performance counter interrupt handler. The user-space code receiving the signal can inspect and record the state of the program. We implemented a profiler for the CPython interpreter using this technique. We studied the benefit, the accuracy, and the cost of the proposed technique compared to an all-kernel monitoring solution.","PeriodicalId":384949,"journal":{"name":"2021 IEEE International Systems Conference (SysCon)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Systems Conference (SysCon)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SysCon48628.2021.9447121","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The task critical execution path, obtained from a kernel trace, reports the time spent waiting for each task involved in a heterogeneous and distributed application. However, additional profiling is needed to understand and identify the problematic code associated with long-lasting path edges. Hardware counter sampling provides insight on software performance at the microarchitecture level, for instance extracting the call stack every 100K execution cycles to understand where the execution time is spent. Similarly, extracting the call stack at the end of a long waiting system call is often useful. This technique is readily available for either statically or JIT compiled code. However, interpreted code is indirectly executed on the processor and the link between the statements and the executed assembly is missing. We propose an architecture to efficiently record call stacks along the execution path, including interpreted programs, in a low intrusive way that maintains the abstraction boundary between the kernel, the interpreter, and the user code. The method consists in sending a signal from within the performance counter interrupt handler. The user-space code receiving the signal can inspect and record the state of the program. We implemented a profiler for the CPython interpreter using this technique. We studied the benefit, the accuracy, and the cost of the proposed technique compared to an all-kernel monitoring solution.