A Static Binary Instrumentation Threading Model for Fast Memory Trace Collection

2012 SC Companion: High Performance Computing, Networking Storage and Analysis Pub Date : 2012-11-10 DOI:10.1109/SC.Companion.2012.101

M. Laurenzano, Joshua Peraza, L. Carrington, Ananta Tiwari, W. A. Ward, R. Campbell

{"title":"A Static Binary Instrumentation Threading Model for Fast Memory Trace Collection","authors":"M. Laurenzano, Joshua Peraza, L. Carrington, Ananta Tiwari, W. A. Ward, R. Campbell","doi":"10.1109/SC.Companion.2012.101","DOIUrl":null,"url":null,"abstract":"In order to achieve a high level of performance, data intensive applications such as the real-time processing of surveillance feeds from unmanned aerial vehicles will require the strategic application of multi/many-core processors and coprocessors using a hybrid of inter-process message passing (e.g. MPI and SHMEM) and intra-process threading (e.g. pthreads and OpenMP). To facilitate program design decisions, memory traces gathered through binary instrumentation can be used to understand the low-level interactions between a data intensive code and the memory subsystem of a multi-core processor or many-core co-processor. Toward this end, this paper introduces the addition of threading support for PMaCs Efficient Binary Instrumentation Toolkit for Linux/x86 (PEBIL) and compares PEBILs threading model to the threading models of two other popular Linux/x86 binary instrumentation platforms - Pin and Dyninst - on both theoretical and empirical grounds. The empirical comparisons are based on experiments which collect memory address traces for the OpenMP-threaded implementations of the NASA Advanced Supercomputing Parallel Benchmarks (NPBs). This work shows that the overhead of collecting full memory address traces for multithreaded programs is higher in PEBIL (7.7x) than in Pin (4.7x), both of which are significantly lower than Dyninst (897x). This work also shows that PEBIL, uniquely, is able to take advantage of interval-based sampling of a memory address trace by rapidly disabling and re-enabling instrumentation at the transitions into and out of sampling periods in order to achieve significant decreases in the overhead of memory address trace collection. For collecting the memory address streams of each of the NPBs at a 10% sampling rate, PEBIL incurs an average slowdown of 2.9x compared to 4.4x with Pin and 897x with Dyninst.","PeriodicalId":6346,"journal":{"name":"2012 SC Companion: High Performance Computing, Networking Storage and Analysis","volume":"584 1","pages":"741-745"},"PeriodicalIF":0.0000,"publicationDate":"2012-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 SC Companion: High Performance Computing, Networking Storage and Analysis","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SC.Companion.2012.101","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

Abstract

In order to achieve a high level of performance, data intensive applications such as the real-time processing of surveillance feeds from unmanned aerial vehicles will require the strategic application of multi/many-core processors and coprocessors using a hybrid of inter-process message passing (e.g. MPI and SHMEM) and intra-process threading (e.g. pthreads and OpenMP). To facilitate program design decisions, memory traces gathered through binary instrumentation can be used to understand the low-level interactions between a data intensive code and the memory subsystem of a multi-core processor or many-core co-processor. Toward this end, this paper introduces the addition of threading support for PMaCs Efficient Binary Instrumentation Toolkit for Linux/x86 (PEBIL) and compares PEBILs threading model to the threading models of two other popular Linux/x86 binary instrumentation platforms - Pin and Dyninst - on both theoretical and empirical grounds. The empirical comparisons are based on experiments which collect memory address traces for the OpenMP-threaded implementations of the NASA Advanced Supercomputing Parallel Benchmarks (NPBs). This work shows that the overhead of collecting full memory address traces for multithreaded programs is higher in PEBIL (7.7x) than in Pin (4.7x), both of which are significantly lower than Dyninst (897x). This work also shows that PEBIL, uniquely, is able to take advantage of interval-based sampling of a memory address trace by rapidly disabling and re-enabling instrumentation at the transitions into and out of sampling periods in order to achieve significant decreases in the overhead of memory address trace collection. For collecting the memory address streams of each of the NPBs at a 10% sampling rate, PEBIL incurs an average slowdown of 2.9x compared to 4.4x with Pin and 897x with Dyninst.

查看原文本刊更多论文

用于快速内存跟踪收集的静态二进制检测线程模型

为了实现高水平的性能，数据密集型应用，如无人机监控馈送的实时处理，将需要使用混合进程间消息传递(例如MPI和SHMEM)和进程内线程(例如pthreads和OpenMP)的多/多核处理器和协处理器的战略应用。为了便于程序设计决策，可以使用通过二进制检测收集的内存跟踪来理解数据密集型代码与多核处理器或多核协处理器的内存子系统之间的低级交互。为此，本文介绍了pmac Linux/x86高效二进制仪表工具箱(PEBIL)中增加的线程支持，并将PEBIL的线程模型与其他两个流行的Linux/x86二进制仪表平台(Pin和Dyninst)的线程模型进行了理论和实证比较。经验比较是基于收集NASA高级超级计算并行基准(NPBs)的openmp线程实现的内存地址跟踪的实验。这项工作表明，在PEBIL (7.7x)中收集多线程程序的全部内存地址跟踪的开销比在Pin (4.7x)中要高，两者都明显低于Dyninst (897x)。这项工作还表明，PEBIL能够独特地利用基于间隔的内存地址跟踪采样，在进入和退出采样周期时快速禁用和重新启用检测，从而显著降低内存地址跟踪收集的开销。为了以10%的采样率收集每个npb的内存地址流，与Pin的4.4倍和Dyninst的897x相比，PEBIL的平均速度降低了2.9倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2012 SC Companion: High Performance Computing, Networking Storage and Analysis

自引率

0.00%

发文量