DLInfer: Python类型推断的静态切片深度学习

2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE) Pub Date : 2023-05-01 DOI:10.1109/ICSE48619.2023.00170

Yanyan Yan, Yang Feng, Hongcheng Fan, Baowen Xu

{"title":"DLInfer: Python类型推断的静态切片深度学习","authors":"Yanyan Yan, Yang Feng, Hongcheng Fan, Baowen Xu","doi":"10.1109/ICSE48619.2023.00170","DOIUrl":null,"url":null,"abstract":"Python programming language has gained enor-mous popularity in the past decades. While its flexibility signifi-cantly improves software development productivity, the dynamic typing feature challenges software maintenance and quality assurance. To facilitate programming and type error checking, the Python programming language has provided a type hint mechanism enabling developers to annotate type information for variables. However, this manual annotation process often requires plenty of resources and may introduce errors. In this paper, we propose a deep learning type inference technique, namely DLInfer, to automatically infer the type infor-mation for Python programs. DLInfer collects slice statements for variables through static analysis and then vectorizes them with the Unigram Language Model algorithm. Based on the vectorized slicing features, we designed a bi-directional gated recurrent unit model to learn the type propagation information for inference. To validate the effectiveness of DLInfer, we conduct an extensive empirical study on 700 open-source projects. We evaluate its accuracy in inferring three kinds of fundamental types, including built-in, library, and user-defined types. By training with a large-scale dataset, DLInfer achieves an average of 98.79% Top-1 accuracy for the variables that can get type information through static analysis and manual annotation. Further, DLInfer achieves 83.03% type inference accuracy on average for the variables that can only obtain the type information through dynamic analysis. The results indicate DLInfer is highly effective in inferring types. It is promising to apply it to assist in various software engineering tasks for Python programs.","PeriodicalId":376379,"journal":{"name":"2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"DLInfer: Deep Learning with Static Slicing for Python Type Inference\",\"authors\":\"Yanyan Yan, Yang Feng, Hongcheng Fan, Baowen Xu\",\"doi\":\"10.1109/ICSE48619.2023.00170\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Python programming language has gained enor-mous popularity in the past decades. While its flexibility signifi-cantly improves software development productivity, the dynamic typing feature challenges software maintenance and quality assurance. To facilitate programming and type error checking, the Python programming language has provided a type hint mechanism enabling developers to annotate type information for variables. However, this manual annotation process often requires plenty of resources and may introduce errors. In this paper, we propose a deep learning type inference technique, namely DLInfer, to automatically infer the type infor-mation for Python programs. DLInfer collects slice statements for variables through static analysis and then vectorizes them with the Unigram Language Model algorithm. Based on the vectorized slicing features, we designed a bi-directional gated recurrent unit model to learn the type propagation information for inference. To validate the effectiveness of DLInfer, we conduct an extensive empirical study on 700 open-source projects. We evaluate its accuracy in inferring three kinds of fundamental types, including built-in, library, and user-defined types. By training with a large-scale dataset, DLInfer achieves an average of 98.79% Top-1 accuracy for the variables that can get type information through static analysis and manual annotation. Further, DLInfer achieves 83.03% type inference accuracy on average for the variables that can only obtain the type information through dynamic analysis. The results indicate DLInfer is highly effective in inferring types. It is promising to apply it to assist in various software engineering tasks for Python programs.\",\"PeriodicalId\":376379,\"journal\":{\"name\":\"2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE)\",\"volume\":\"23 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICSE48619.2023.00170\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSE48619.2023.00170","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

Python编程语言在过去几十年中获得了极大的普及。虽然它的灵活性显著地提高了软件开发的生产力，但动态类型特性对软件维护和质量保证提出了挑战。为了方便编程和类型错误检查，Python编程语言提供了一种类型提示机制，使开发人员能够为变量注释类型信息。然而，这种手工注释过程通常需要大量的资源，并且可能会引入错误。在本文中，我们提出了一种深度学习类型推断技术，即DLInfer，用于自动推断Python程序的类型信息。DLInfer通过静态分析收集变量的切片语句，然后使用Unigram语言模型算法对其进行矢量化。基于向量化切片特征，设计了双向门控循环单元模型来学习类型传播信息并进行推理。为了验证DLInfer的有效性，我们对700个开源项目进行了广泛的实证研究。我们通过推断三种基本类型来评估它的准确性，包括内置类型、库类型和用户定义类型。通过大规模数据集的训练，DLInfer对于静态分析和人工标注可以获得类型信息的变量，平均达到了98.79%的Top-1准确率。对于只能通过动态分析获得类型信息的变量，DLInfer的平均类型推断准确率为83.03%。结果表明，DLInfer在推断类型方面是非常有效的。它有望应用于Python程序的各种软件工程任务中。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

DLInfer: Deep Learning with Static Slicing for Python Type Inference

Python programming language has gained enor-mous popularity in the past decades. While its flexibility signifi-cantly improves software development productivity, the dynamic typing feature challenges software maintenance and quality assurance. To facilitate programming and type error checking, the Python programming language has provided a type hint mechanism enabling developers to annotate type information for variables. However, this manual annotation process often requires plenty of resources and may introduce errors. In this paper, we propose a deep learning type inference technique, namely DLInfer, to automatically infer the type infor-mation for Python programs. DLInfer collects slice statements for variables through static analysis and then vectorizes them with the Unigram Language Model algorithm. Based on the vectorized slicing features, we designed a bi-directional gated recurrent unit model to learn the type propagation information for inference. To validate the effectiveness of DLInfer, we conduct an extensive empirical study on 700 open-source projects. We evaluate its accuracy in inferring three kinds of fundamental types, including built-in, library, and user-defined types. By training with a large-scale dataset, DLInfer achieves an average of 98.79% Top-1 accuracy for the variables that can get type information through static analysis and manual annotation. Further, DLInfer achieves 83.03% type inference accuracy on average for the variables that can only obtain the type information through dynamic analysis. The results indicate DLInfer is highly effective in inferring types. It is promising to apply it to assist in various software engineering tasks for Python programs.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE)

自引率

0.00%

发文量