DyPyBench: A Benchmark of Executable Python Software

ArXiv Pub Date : 2024-03-01 DOI:10.1145/3643742

Islem Bouzenia, Bajaj Piyush Krishan, Michael Pradel

{"title":"DyPyBench: A Benchmark of Executable Python Software","authors":"Islem Bouzenia, Bajaj Piyush Krishan, Michael Pradel","doi":"10.1145/3643742","DOIUrl":null,"url":null,"abstract":"Python has emerged as one of the most popular programming languages, extensively utilized in domains such as machine learning, data analysis, and web applications. Python's dynamic nature and extensive usage make it an attractive candidate for dynamic program analysis. However, unlike for other popular languages, there currently is no comprehensive benchmark suite of executable Python projects, which hinders the development of dynamic analyses. This work addresses this gap by presenting DyPyBench, the first benchmark of Python projects that is large scale, diverse, ready to run (i.e., with fully configured and prepared test suites), and ready to analyze (by integrating with the DynaPyt dynamic analysis framework). The benchmark encompasses 50 popular opensource projects from various application domains, with a total of 681k lines of Python code, and 30k test cases. DyPyBench enables various applications in testing and dynamic analysis, of which we explore three in this work: (i) Gathering dynamic call graphs and empirically comparing them to statically computed call graphs, which exposes and quantifies limitations of existing call graph construction techniques for Python. (ii) Using DyPyBench to build a training data set for LExecutor, a neural model that learns to predict values that otherwise would be missing at runtime. (iii) Using dynamically gathered execution traces to mine API usage specifications, which establishes a baseline for future work on specification mining for Python. We envision DyPyBench to provide a basis for other dynamic analyses and for studying the runtime behavior of Python code.","PeriodicalId":513202,"journal":{"name":"ArXiv","volume":"47 29","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ArXiv","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3643742","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Python has emerged as one of the most popular programming languages, extensively utilized in domains such as machine learning, data analysis, and web applications. Python's dynamic nature and extensive usage make it an attractive candidate for dynamic program analysis. However, unlike for other popular languages, there currently is no comprehensive benchmark suite of executable Python projects, which hinders the development of dynamic analyses. This work addresses this gap by presenting DyPyBench, the first benchmark of Python projects that is large scale, diverse, ready to run (i.e., with fully configured and prepared test suites), and ready to analyze (by integrating with the DynaPyt dynamic analysis framework). The benchmark encompasses 50 popular opensource projects from various application domains, with a total of 681k lines of Python code, and 30k test cases. DyPyBench enables various applications in testing and dynamic analysis, of which we explore three in this work: (i) Gathering dynamic call graphs and empirically comparing them to statically computed call graphs, which exposes and quantifies limitations of existing call graph construction techniques for Python. (ii) Using DyPyBench to build a training data set for LExecutor, a neural model that learns to predict values that otherwise would be missing at runtime. (iii) Using dynamically gathered execution traces to mine API usage specifications, which establishes a baseline for future work on specification mining for Python. We envision DyPyBench to provide a basis for other dynamic analyses and for studying the runtime behavior of Python code.

查看原文本刊更多论文

DyPyBench：可执行 Python 软件基准

Python 已成为最流行的编程语言之一，广泛应用于机器学习、数据分析和网络应用等领域。Python 的动态特性和广泛应用使其成为动态程序分析的理想对象。然而，与其他流行语言不同的是，目前还没有可执行 Python 项目的综合基准套件，这阻碍了动态分析的发展。为了弥补这一缺陷，本研究提出了 DyPyBench，它是第一个大规模、多样化、可运行（即具有完全配置和准备好的测试套件）和可分析（通过与 DynaPyt 动态分析框架集成）的 Python 项目基准。该基准包括来自不同应用领域的 50 个流行开源项目，共计 681k 行 Python 代码和 30k 个测试用例。DyPyBench 支持测试和动态分析中的各种应用，我们在这项工作中探讨了其中的三个应用：(i) 收集动态调用图，并将其与静态计算的调用图进行经验比较，从而揭示和量化现有 Python 调用图构建技术的局限性。(ii) 使用 DyPyBench 为 LExecutor 构建训练数据集，Lexecutor 是一个神经模型，可学习预测运行时缺失的值。(iii) 利用动态收集的执行跟踪挖掘 API 的使用规范，为 Python 未来的规范挖掘工作建立基线。我们设想 DyPyBench 将为其他动态分析和研究 Python 代码的运行时行为提供基础。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ArXiv

自引率

0.00%

发文量