PyCG: Practical Call Graph Generation in Python

Vitalis Salis, Thodoris Sotiropoulos, Panos Louridas, D. Spinellis, Dimitris Mitropoulos
{"title":"PyCG: Practical Call Graph Generation in Python","authors":"Vitalis Salis, Thodoris Sotiropoulos, Panos Louridas, D. Spinellis, Dimitris Mitropoulos","doi":"10.1109/ICSE43902.2021.00146","DOIUrl":null,"url":null,"abstract":"Call graphs play an important role in different contexts, such as profiling and vulnerability propagation analysis. Generating call graphs in an efficient manner can be a challenging task when it comes to high-level languages that are modular and incorporate dynamic features and higher-order functions. Despite the language's popularity, there have been very few tools aiming to generate call graphs for Python programs. Worse, these tools suffer from several effectiveness issues that limit their practicality in realistic programs. We propose a pragmatic, static approach for call graph generation in Python. We compute all assignment relations between program identifiers of functions, variables, classes, and modules through an inter-procedural analysis. Based on these assignment relations, we produce the resulting call graph by resolving all calls to potentially invoked functions. Notably, the underlying analysis is designed to be efficient and scalable, handling several Python features, such as modules, generators, function closures, and multiple inheritance. We have evaluated our prototype implementation, which we call PyCG, using two benchmarks: a micro-benchmark suite containing small Python programs and a set of macro-benchmarks with several popular real-world Python packages. Our results indicate that PyCG can efficiently handle thousands of lines of code in less than a second (0.38 seconds for 1k LoC on average). Further, it outperforms the state-of-the-art for Python in both precision and recall: PyCG achieves high rates of precision ~99.2% and adequate recall ~69.9%. Finally, we demonstrate how PyCG can aid dependency impact analysis by showcasing a potential enhancement to GitHub's \"security advisory\" notification service using a real-world example.","PeriodicalId":305167,"journal":{"name":"2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"44","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSE43902.2021.00146","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 44

Abstract

Call graphs play an important role in different contexts, such as profiling and vulnerability propagation analysis. Generating call graphs in an efficient manner can be a challenging task when it comes to high-level languages that are modular and incorporate dynamic features and higher-order functions. Despite the language's popularity, there have been very few tools aiming to generate call graphs for Python programs. Worse, these tools suffer from several effectiveness issues that limit their practicality in realistic programs. We propose a pragmatic, static approach for call graph generation in Python. We compute all assignment relations between program identifiers of functions, variables, classes, and modules through an inter-procedural analysis. Based on these assignment relations, we produce the resulting call graph by resolving all calls to potentially invoked functions. Notably, the underlying analysis is designed to be efficient and scalable, handling several Python features, such as modules, generators, function closures, and multiple inheritance. We have evaluated our prototype implementation, which we call PyCG, using two benchmarks: a micro-benchmark suite containing small Python programs and a set of macro-benchmarks with several popular real-world Python packages. Our results indicate that PyCG can efficiently handle thousands of lines of code in less than a second (0.38 seconds for 1k LoC on average). Further, it outperforms the state-of-the-art for Python in both precision and recall: PyCG achieves high rates of precision ~99.2% and adequate recall ~69.9%. Finally, we demonstrate how PyCG can aid dependency impact analysis by showcasing a potential enhancement to GitHub's "security advisory" notification service using a real-world example.
Python中的实用调用图生成
调用图在不同的上下文中扮演着重要的角色,例如概要分析和漏洞传播分析。当涉及到模块化并包含动态特性和高阶函数的高级语言时,以有效的方式生成调用图可能是一项具有挑战性的任务。尽管Python语言很受欢迎,但很少有工具旨在为Python程序生成调用图。更糟糕的是,这些工具存在一些有效性问题,限制了它们在实际程序中的实用性。我们提出了一种在Python中生成调用图的实用的静态方法。我们通过过程间分析计算函数、变量、类和模块的程序标识符之间的所有赋值关系。基于这些赋值关系,我们通过解析对潜在调用函数的所有调用来生成最终的调用图。值得注意的是,底层分析被设计为高效和可伸缩的,处理几个Python特性,如模块、生成器、函数闭包和多重继承。我们使用两个基准测试评估了我们的原型实现(我们称之为PyCG):一个包含小型Python程序的微基准测试套件和一组包含几个流行的真实Python包的宏观基准测试。我们的结果表明,PyCG可以在不到一秒钟的时间内有效地处理数千行代码(1k LoC平均0.38秒)。此外,它在准确率和召回率方面都优于Python的最新技术:PyCG达到了高准确率~99.2%和足够的召回率~69.9%。最后,我们使用一个真实的例子,通过展示对GitHub的“安全咨询”通知服务的潜在增强,来演示PyCG如何帮助依赖性影响分析。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信