smartPip: A Smart Approach to Resolving Python Dependency Conflict Issues

Chao Wang, Rongxin Wu, Haohao Song, J. Shu, Guoqing Li
{"title":"smartPip: A Smart Approach to Resolving Python Dependency Conflict Issues","authors":"Chao Wang, Rongxin Wu, Haohao Song, J. Shu, Guoqing Li","doi":"10.1145/3551349.3560437","DOIUrl":null,"url":null,"abstract":"As one of the representative software ecosystems, PyPI, together with the Python package management tool pip, greatly facilitates Python developers to automatically manage the reuse of third-party libraries, thus saving development time and cost. Despite its great success in practice, a recent empirical study revealed the risks of dependency conflict (DC) issues and then summarized the characteristics of DC issues. However, the dependency resolving strategy, which is the foundation of the prior study, has evolved to a new one, namely the backtracking strategy. To understand how the evolution of this dependency resolving strategy affects the prior findings, we conducted an empirical study to revisit the characteristics of DC issues under the new strategy. Our study revealed that, of the two previously discovered DC issue manifestation patterns, one has significantly changed (Pattern A), while the other remained the same (Pattern B). We also observed, the resolving strategy for the DC issues of Pattern A suffers from the efficiency issue, while the one for the DC issues of Pattern B would lead to a waste of time and space. Based on our findings, we propose a tool smartPip to overcome the limitations of the resolving strategies. To resolve the DC issues of Pattern A, instead of iteratively verifying each candidate dependency library, we leverage a pre-built knowledge base of library dependencies to collect version constraints for concerned libraries, and then convert the version constraints into the SMT expressions for solving. To resolve the DC issues of Pattern B, we improve the existing virtual environment solution to reuse the local libraries as far as possible. Finally, we evaluated smartPip in three benchmark datasets of open source projects. The results showed that, smartPip can outperform the existing Python package management tools including pip with the new strategy and Conda in resolving DC issues of Pattern A, and achieve 1.19X - 1.60X speedups over the best baseline approach. Compared with the built-in Python virtual environment (venv), smartPip reduced 34.55% - 80.26% of storage space and achieved up to 2.26X - 6.53X speedups in resolving the DC issues of Pattern B.","PeriodicalId":197939,"journal":{"name":"Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering","volume":"121 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3551349.3560437","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

Abstract

As one of the representative software ecosystems, PyPI, together with the Python package management tool pip, greatly facilitates Python developers to automatically manage the reuse of third-party libraries, thus saving development time and cost. Despite its great success in practice, a recent empirical study revealed the risks of dependency conflict (DC) issues and then summarized the characteristics of DC issues. However, the dependency resolving strategy, which is the foundation of the prior study, has evolved to a new one, namely the backtracking strategy. To understand how the evolution of this dependency resolving strategy affects the prior findings, we conducted an empirical study to revisit the characteristics of DC issues under the new strategy. Our study revealed that, of the two previously discovered DC issue manifestation patterns, one has significantly changed (Pattern A), while the other remained the same (Pattern B). We also observed, the resolving strategy for the DC issues of Pattern A suffers from the efficiency issue, while the one for the DC issues of Pattern B would lead to a waste of time and space. Based on our findings, we propose a tool smartPip to overcome the limitations of the resolving strategies. To resolve the DC issues of Pattern A, instead of iteratively verifying each candidate dependency library, we leverage a pre-built knowledge base of library dependencies to collect version constraints for concerned libraries, and then convert the version constraints into the SMT expressions for solving. To resolve the DC issues of Pattern B, we improve the existing virtual environment solution to reuse the local libraries as far as possible. Finally, we evaluated smartPip in three benchmark datasets of open source projects. The results showed that, smartPip can outperform the existing Python package management tools including pip with the new strategy and Conda in resolving DC issues of Pattern A, and achieve 1.19X - 1.60X speedups over the best baseline approach. Compared with the built-in Python virtual environment (venv), smartPip reduced 34.55% - 80.26% of storage space and achieved up to 2.26X - 6.53X speedups in resolving the DC issues of Pattern B.
smartPip:解决Python依赖冲突问题的聪明方法
作为具有代表性的软件生态系统之一,PyPI与Python包管理工具pip一起,极大地方便了Python开发人员自动管理第三方库的重用,从而节省了开发时间和成本。尽管在实践中取得了巨大的成功,但最近的一项实证研究揭示了依赖冲突(DC)问题的风险,并总结了DC问题的特征。然而,作为先前研究基础的依赖解决策略已经演变为一种新的依赖解决策略,即回溯策略。为了了解这种依赖解决策略的演变如何影响先前的研究结果,我们进行了一项实证研究,重新审视了新策略下DC问题的特征。我们的研究发现,在之前发现的两种DC问题表现模式中,一种发生了显著变化(模式A),另一种保持不变(模式B)。我们还观察到,针对模式A的DC问题的解决策略存在效率问题,而针对模式B的DC问题的解决策略会导致时间和空间的浪费。基于我们的发现,我们提出了一个工具smartPip来克服解决策略的局限性。为了解决模式A的DC问题,我们不是迭代地验证每个候选依赖库,而是利用预先构建的库依赖关系知识库来收集相关库的版本约束,然后将版本约束转换为SMT表达式以进行求解。为了解决模式B的DC问题,我们改进了现有的虚拟环境解决方案,以尽可能地重用本地库。最后,我们在三个开源项目的基准数据集中评估了smartPip。结果表明,smartPip在解决模式A的DC问题方面优于现有的Python包管理工具,包括使用新策略的pip和Conda,并且比最佳基线方法实现1.19X - 1.60X的加速。与内置的Python虚拟环境(venv)相比,smartPip减少了34.55% - 80.26%的存储空间,并在解决模式B的DC问题方面实现了高达2.26X - 6.53X的速度提升。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信