Compiling Dynamic Data Structures in Python to Enable the Use of Multi-core and Many-core Libraries

2011 International Conference on Parallel Architectures and Compilation Techniques Pub Date : 2011-10-10 DOI:10.1109/PACT.2011.13

Bin Ren, G. Agrawal

{"title":"Compiling Dynamic Data Structures in Python to Enable the Use of Multi-core and Many-core Libraries","authors":"Bin Ren, G. Agrawal","doi":"10.1109/PACT.2011.13","DOIUrl":null,"url":null,"abstract":"Programmer productivity considerations are increasing the popularity of interpreted languages like Python. At the same time, for applications where performance is important, these languages clearly lack even on uniprocessors. In addition, the use of dynamic data structures in a language like Python makes it very hard to use emerging libraries for enabling the execution on multi-core and many-core architectures. This paper presents a framework for compiling Python to use multi-core and many-core libraries. The key component of our framework involves a suite of algorithms for replacing dynamic and/or nested data structures by arrays, while minimizing unnecessary data copying costs. This involves a novel use of an existing partial redundancy elimination algorithm, development of a new demand-driven interprocedural partial redundancy algorithm, a data flow formulation for determining that the contents of the data structure are of the same type, and a linearization algorithm. We have evaluated our framework using data mining and two linear algebra applications written in pure Python. The key observations were: 1) the code generated by our framework is only 10\\% to 20\\% slower compared to the hand-written C code that invokes the same libraries, 2) our optimizations turn out to be significant for improving the performance in most cases, and 3) we outperform interpreted Python and the C++ code generated by an existing tool by one to two orders of magnitude.","PeriodicalId":106423,"journal":{"name":"2011 International Conference on Parallel Architectures and Compilation Techniques","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 International Conference on Parallel Architectures and Compilation Techniques","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PACT.2011.13","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

Abstract

Programmer productivity considerations are increasing the popularity of interpreted languages like Python. At the same time, for applications where performance is important, these languages clearly lack even on uniprocessors. In addition, the use of dynamic data structures in a language like Python makes it very hard to use emerging libraries for enabling the execution on multi-core and many-core architectures. This paper presents a framework for compiling Python to use multi-core and many-core libraries. The key component of our framework involves a suite of algorithms for replacing dynamic and/or nested data structures by arrays, while minimizing unnecessary data copying costs. This involves a novel use of an existing partial redundancy elimination algorithm, development of a new demand-driven interprocedural partial redundancy algorithm, a data flow formulation for determining that the contents of the data structure are of the same type, and a linearization algorithm. We have evaluated our framework using data mining and two linear algebra applications written in pure Python. The key observations were: 1) the code generated by our framework is only 10\% to 20\% slower compared to the hand-written C code that invokes the same libraries, 2) our optimizations turn out to be significant for improving the performance in most cases, and 3) we outperform interpreted Python and the C++ code generated by an existing tool by one to two orders of magnitude.

查看原文本刊更多论文

在Python中编译动态数据结构以启用多核和多核库

考虑到程序员的生产力，像Python这样的解释性语言越来越受欢迎。同时，对于性能很重要的应用程序，这些语言甚至在单处理器上也明显缺乏。此外，在像Python这样的语言中使用动态数据结构使得很难使用新兴的库来支持在多核和多核架构上执行。本文介绍了一个编译Python以使用多核和多核库的框架。我们框架的关键组件包括一套算法，用于通过数组替换动态和/或嵌套数据结构，同时最大限度地减少不必要的数据复制成本。这包括对现有部分冗余消除算法的新使用，开发新的需求驱动的程序间部分冗余算法，用于确定数据结构的内容具有相同类型的数据流公式，以及线性化算法。我们使用数据挖掘和两个用纯Python编写的线性代数应用程序来评估我们的框架。关键的观察结果是:1)与调用相同库的手工编写的C代码相比，我们的框架生成的代码只慢10%到20%;2)我们的优化在大多数情况下对提高性能非常重要;3)我们的性能比由现有工具生成的解释型Python和c++代码高出一到两个数量级。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2011 International Conference on Parallel Architectures and Compilation Techniques

自引率

0.00%

发文量