{"title":"DLInfer: Python类型推断的静态切片深度学习","authors":"Yanyan Yan, Yang Feng, Hongcheng Fan, Baowen Xu","doi":"10.1109/ICSE48619.2023.00170","DOIUrl":null,"url":null,"abstract":"Python programming language has gained enor-mous popularity in the past decades. While its flexibility signifi-cantly improves software development productivity, the dynamic typing feature challenges software maintenance and quality assurance. To facilitate programming and type error checking, the Python programming language has provided a type hint mechanism enabling developers to annotate type information for variables. However, this manual annotation process often requires plenty of resources and may introduce errors. In this paper, we propose a deep learning type inference technique, namely DLInfer, to automatically infer the type infor-mation for Python programs. DLInfer collects slice statements for variables through static analysis and then vectorizes them with the Unigram Language Model algorithm. Based on the vectorized slicing features, we designed a bi-directional gated recurrent unit model to learn the type propagation information for inference. To validate the effectiveness of DLInfer, we conduct an extensive empirical study on 700 open-source projects. We evaluate its accuracy in inferring three kinds of fundamental types, including built-in, library, and user-defined types. By training with a large-scale dataset, DLInfer achieves an average of 98.79% Top-1 accuracy for the variables that can get type information through static analysis and manual annotation. Further, DLInfer achieves 83.03% type inference accuracy on average for the variables that can only obtain the type information through dynamic analysis. The results indicate DLInfer is highly effective in inferring types. It is promising to apply it to assist in various software engineering tasks for Python programs.","PeriodicalId":376379,"journal":{"name":"2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"DLInfer: Deep Learning with Static Slicing for Python Type Inference\",\"authors\":\"Yanyan Yan, Yang Feng, Hongcheng Fan, Baowen Xu\",\"doi\":\"10.1109/ICSE48619.2023.00170\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Python programming language has gained enor-mous popularity in the past decades. While its flexibility signifi-cantly improves software development productivity, the dynamic typing feature challenges software maintenance and quality assurance. To facilitate programming and type error checking, the Python programming language has provided a type hint mechanism enabling developers to annotate type information for variables. However, this manual annotation process often requires plenty of resources and may introduce errors. In this paper, we propose a deep learning type inference technique, namely DLInfer, to automatically infer the type infor-mation for Python programs. DLInfer collects slice statements for variables through static analysis and then vectorizes them with the Unigram Language Model algorithm. Based on the vectorized slicing features, we designed a bi-directional gated recurrent unit model to learn the type propagation information for inference. To validate the effectiveness of DLInfer, we conduct an extensive empirical study on 700 open-source projects. We evaluate its accuracy in inferring three kinds of fundamental types, including built-in, library, and user-defined types. By training with a large-scale dataset, DLInfer achieves an average of 98.79% Top-1 accuracy for the variables that can get type information through static analysis and manual annotation. Further, DLInfer achieves 83.03% type inference accuracy on average for the variables that can only obtain the type information through dynamic analysis. The results indicate DLInfer is highly effective in inferring types. It is promising to apply it to assist in various software engineering tasks for Python programs.\",\"PeriodicalId\":376379,\"journal\":{\"name\":\"2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE)\",\"volume\":\"23 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICSE48619.2023.00170\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSE48619.2023.00170","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
DLInfer: Deep Learning with Static Slicing for Python Type Inference
Python programming language has gained enor-mous popularity in the past decades. While its flexibility signifi-cantly improves software development productivity, the dynamic typing feature challenges software maintenance and quality assurance. To facilitate programming and type error checking, the Python programming language has provided a type hint mechanism enabling developers to annotate type information for variables. However, this manual annotation process often requires plenty of resources and may introduce errors. In this paper, we propose a deep learning type inference technique, namely DLInfer, to automatically infer the type infor-mation for Python programs. DLInfer collects slice statements for variables through static analysis and then vectorizes them with the Unigram Language Model algorithm. Based on the vectorized slicing features, we designed a bi-directional gated recurrent unit model to learn the type propagation information for inference. To validate the effectiveness of DLInfer, we conduct an extensive empirical study on 700 open-source projects. We evaluate its accuracy in inferring three kinds of fundamental types, including built-in, library, and user-defined types. By training with a large-scale dataset, DLInfer achieves an average of 98.79% Top-1 accuracy for the variables that can get type information through static analysis and manual annotation. Further, DLInfer achieves 83.03% type inference accuracy on average for the variables that can only obtain the type information through dynamic analysis. The results indicate DLInfer is highly effective in inferring types. It is promising to apply it to assist in various software engineering tasks for Python programs.