{"title":"DynaPyt: Python的动态分析框架","authors":"A. Eghbali, Michael Pradel","doi":"10.1145/3540250.3549126","DOIUrl":null,"url":null,"abstract":"Python is a widely used programming language that powers important application domains such as machine learning, data analysis, and web applications. For many programs in these domains it is consequential to analyze aspects like security and performance, and with Python’s dynamic nature, it is crucial to be able to dynamically analyze Python programs. However, existing tools and frameworks do not provide the means to implement dynamic analyses easily and practitioners resort to implementing an ad-hoc dynamic analysis for their own use case. This work presents DynaPyt, the first general-purpose framework for heavy-weight dynamic analysis of Python programs. Compared to existing tools for other programming languages, our framework provides a wider range of analysis hooks arranged in a hierarchical structure, which allows developers to concisely implement analyses. DynaPyt features selective instrumentation and execution modification as well. We evaluate our framework on test suites of 9 popular open-source Python projects, 1,268,545 lines of code in total, and show that it, by and large, preserves the semantics of the original execution. The running time of DynaPyt is between 1.2x and 16x times the original execution time, which is in line with similar frameworks designed for other languages, and 5.6%–88.6% faster than analyses using a built-in tracing API offered by Python. We also implement multiple analyses, show the simplicity of implementing them and some potential use cases of DynaPyt. Among the analyses implemented are: an analysis to detect a memory blow up in Pytorch programs, a taint analysis to detect SQL injections, and an analysis to warn about a runtime performance anti-pattern.","PeriodicalId":68155,"journal":{"name":"软件产业与工程","volume":"49 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"DynaPyt: a dynamic analysis framework for Python\",\"authors\":\"A. Eghbali, Michael Pradel\",\"doi\":\"10.1145/3540250.3549126\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Python is a widely used programming language that powers important application domains such as machine learning, data analysis, and web applications. For many programs in these domains it is consequential to analyze aspects like security and performance, and with Python’s dynamic nature, it is crucial to be able to dynamically analyze Python programs. However, existing tools and frameworks do not provide the means to implement dynamic analyses easily and practitioners resort to implementing an ad-hoc dynamic analysis for their own use case. This work presents DynaPyt, the first general-purpose framework for heavy-weight dynamic analysis of Python programs. Compared to existing tools for other programming languages, our framework provides a wider range of analysis hooks arranged in a hierarchical structure, which allows developers to concisely implement analyses. DynaPyt features selective instrumentation and execution modification as well. We evaluate our framework on test suites of 9 popular open-source Python projects, 1,268,545 lines of code in total, and show that it, by and large, preserves the semantics of the original execution. The running time of DynaPyt is between 1.2x and 16x times the original execution time, which is in line with similar frameworks designed for other languages, and 5.6%–88.6% faster than analyses using a built-in tracing API offered by Python. We also implement multiple analyses, show the simplicity of implementing them and some potential use cases of DynaPyt. Among the analyses implemented are: an analysis to detect a memory blow up in Pytorch programs, a taint analysis to detect SQL injections, and an analysis to warn about a runtime performance anti-pattern.\",\"PeriodicalId\":68155,\"journal\":{\"name\":\"软件产业与工程\",\"volume\":\"49 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-11-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"软件产业与工程\",\"FirstCategoryId\":\"1089\",\"ListUrlMain\":\"https://doi.org/10.1145/3540250.3549126\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"软件产业与工程","FirstCategoryId":"1089","ListUrlMain":"https://doi.org/10.1145/3540250.3549126","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Python is a widely used programming language that powers important application domains such as machine learning, data analysis, and web applications. For many programs in these domains it is consequential to analyze aspects like security and performance, and with Python’s dynamic nature, it is crucial to be able to dynamically analyze Python programs. However, existing tools and frameworks do not provide the means to implement dynamic analyses easily and practitioners resort to implementing an ad-hoc dynamic analysis for their own use case. This work presents DynaPyt, the first general-purpose framework for heavy-weight dynamic analysis of Python programs. Compared to existing tools for other programming languages, our framework provides a wider range of analysis hooks arranged in a hierarchical structure, which allows developers to concisely implement analyses. DynaPyt features selective instrumentation and execution modification as well. We evaluate our framework on test suites of 9 popular open-source Python projects, 1,268,545 lines of code in total, and show that it, by and large, preserves the semantics of the original execution. The running time of DynaPyt is between 1.2x and 16x times the original execution time, which is in line with similar frameworks designed for other languages, and 5.6%–88.6% faster than analyses using a built-in tracing API offered by Python. We also implement multiple analyses, show the simplicity of implementing them and some potential use cases of DynaPyt. Among the analyses implemented are: an analysis to detect a memory blow up in Pytorch programs, a taint analysis to detect SQL injections, and an analysis to warn about a runtime performance anti-pattern.