Hongjie Ye, Wei Chen, Wensheng Dou, Guoquan Wu, Jun Wei
{"title":"Knowledge-Based Environment Dependency Inference for Python Programs","authors":"Hongjie Ye, Wei Chen, Wensheng Dou, Guoquan Wu, Jun Wei","doi":"10.1145/3510003.3510127","DOIUrl":null,"url":null,"abstract":"Besides third-party packages, the Python interpreter and system libraries are also critical dependencies of a Python program. In our empirical study, 34% programs are only compatible with specific Python interpreter versions, and 24% programs require specific system libraries. However, existing techniques mainly focus on inferring third-party package dependencies. Therefore, they can lack other necessary dependencies and violate version constraints, thus resulting in program build failures and runtime errors. This paper proposes a knowledge-based technique named PyEGo, which can automatically infer dependencies of third-party packages, the Python interpreter, and system libraries at compatible versions for Python programs. We first construct the dependency knowl-edge graph PyKG, which can portray the relations and constraints among third-party packages, the Python interpreter, and system libraries. Then, by querying PyKG with extracted program features, PyEGo constructs a program-related sub-graph with dependency candidates of the three types. It finally outputs the latest compatible dependency versions by solving constraints in the sub-graph. We evaluate PyEGo on 2,891 single-file Python gists, 100 open-source Python projects and 4,836 jupyter notebooks. The experimental re-sults show that PyEGo achieves better accuracy, 0.2x to 3.5x higher than the state-of-the-art approaches.","PeriodicalId":202896,"journal":{"name":"2022 IEEE/ACM 44th International Conference on Software Engineering (ICSE)","volume":"81 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE/ACM 44th International Conference on Software Engineering (ICSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3510003.3510127","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7
Abstract
Besides third-party packages, the Python interpreter and system libraries are also critical dependencies of a Python program. In our empirical study, 34% programs are only compatible with specific Python interpreter versions, and 24% programs require specific system libraries. However, existing techniques mainly focus on inferring third-party package dependencies. Therefore, they can lack other necessary dependencies and violate version constraints, thus resulting in program build failures and runtime errors. This paper proposes a knowledge-based technique named PyEGo, which can automatically infer dependencies of third-party packages, the Python interpreter, and system libraries at compatible versions for Python programs. We first construct the dependency knowl-edge graph PyKG, which can portray the relations and constraints among third-party packages, the Python interpreter, and system libraries. Then, by querying PyKG with extracted program features, PyEGo constructs a program-related sub-graph with dependency candidates of the three types. It finally outputs the latest compatible dependency versions by solving constraints in the sub-graph. We evaluate PyEGo on 2,891 single-file Python gists, 100 open-source Python projects and 4,836 jupyter notebooks. The experimental re-sults show that PyEGo achieves better accuracy, 0.2x to 3.5x higher than the state-of-the-art approaches.