Using shared library interposing for transparent application acceleration in systems with heterogeneous hardware accelerators

ASAP 2010 - 21st IEEE International Conference on Application-specific Systems, Architectures and Processors Pub Date : 2010-07-07 DOI:10.1109/ASAP.2010.5540798

Tobias Beisel, Manuel Niekamp, Christian Plessl

{"title":"Using shared library interposing for transparent application acceleration in systems with heterogeneous hardware accelerators","authors":"Tobias Beisel, Manuel Niekamp, Christian Plessl","doi":"10.1109/ASAP.2010.5540798","DOIUrl":null,"url":null,"abstract":"Todays computer systems increasingly comprise het-erogenous computing elements like multi-core processors, graphics processing units, and specialized co-processors, which allow parallel processing. Programming applications to utilize such systems is a complex process and needs good knowledge about the hardware architecture. Automatic and transparent use of these resources is a major concern of domain specific software developers and users. We present a new approach of using shared library interposing to replace libraries in binary applications with highly optimized accelerated versions. A plugin-based framework was developed, which allows interposing shared library calls, delegating them to accelerator specific libraries and adapting them to the library specific interface. Accelerator specific plugins can be added with a high degree of automatism. First steps were taken to develop a fast and intelligent selection component, choosing the best possible accelerator for a shared library call. It was shown, that such a framework may be efficiently used to apply shared library interposing to transparently speedup existing applications. The BLAS library for linear algebra was used as an example to develop plugins for an acceleratable library. Runtimes of BLAS functions were measured on different architectures and expose significant differences depending on the used implementation and hardware, showing the potentially high speedups of the approach.","PeriodicalId":175846,"journal":{"name":"ASAP 2010 - 21st IEEE International Conference on Application-specific Systems, Architectures and Processors","volume":"68 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ASAP 2010 - 21st IEEE International Conference on Application-specific Systems, Architectures and Processors","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASAP.2010.5540798","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

Abstract

Todays computer systems increasingly comprise het-erogenous computing elements like multi-core processors, graphics processing units, and specialized co-processors, which allow parallel processing. Programming applications to utilize such systems is a complex process and needs good knowledge about the hardware architecture. Automatic and transparent use of these resources is a major concern of domain specific software developers and users. We present a new approach of using shared library interposing to replace libraries in binary applications with highly optimized accelerated versions. A plugin-based framework was developed, which allows interposing shared library calls, delegating them to accelerator specific libraries and adapting them to the library specific interface. Accelerator specific plugins can be added with a high degree of automatism. First steps were taken to develop a fast and intelligent selection component, choosing the best possible accelerator for a shared library call. It was shown, that such a framework may be efficiently used to apply shared library interposing to transparently speedup existing applications. The BLAS library for linear algebra was used as an example to develop plugins for an acceleratable library. Runtimes of BLAS functions were measured on different architectures and expose significant differences depending on the used implementation and hardware, showing the potentially high speedups of the approach.

查看原文本刊更多论文

在异构硬件加速器系统中使用共享库插入实现透明的应用程序加速

今天的计算机系统越来越多地包含异构计算元素，如多核处理器、图形处理单元和专门的协处理器，它们允许并行处理。编写应用程序以利用这些系统是一个复杂的过程，需要对硬件体系结构有很好的了解。这些资源的自动和透明使用是领域特定软件开发人员和用户主要关心的问题。我们提出了一种使用共享库插入的新方法，以高度优化的加速版本取代二进制应用程序中的库。开发了一个基于插件的框架，它允许插入共享库调用，将它们委托给特定的加速器库，并使它们适应特定的库接口。加速器特定的插件可以添加高度自动化。第一步是开发一个快速和智能的选择组件，为共享库调用选择最好的加速器。结果表明，该框架可以有效地应用共享库插入，以透明地加快现有应用程序的运行速度。以线性代数的BLAS库为例，开发了一个可加速库的插件。在不同的体系结构上测量了BLAS函数的运行时，并根据所使用的实现和硬件揭示了显著的差异，显示了该方法的潜在高速。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ASAP 2010 - 21st IEEE International Conference on Application-specific Systems, Architectures and Processors

自引率

0.00%

发文量