在1 + 2数据科学库中拆箱默认参数破坏更改

IF 4.1 2区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Journal of Systems and Software Pub Date : 2025-05-10 DOI:10.1016/j.jss.2025.112460

João Eduardo Montandon , Luciana Lourdes Silva , Cristiano Politowski , Daniel Prates , Arthur de Brito Bonifácio , Ghizlane El Boussaidi

{"title":"在1 + 2数据科学库中拆箱默认参数破坏更改","authors":"João Eduardo Montandon , Luciana Lourdes Silva , Cristiano Politowski , Daniel Prates , Arthur de Brito Bonifácio , Ghizlane El Boussaidi","doi":"10.1016/j.jss.2025.112460","DOIUrl":null,"url":null,"abstract":"<div><div>Data Science (DS) has become a cornerstone for modern software, enabling data-driven decisions to improve companies services. Following modern software development practices, data scientists use third-party libraries to support their tasks. As the APIs provided by these tools often require an extensive list of arguments to be set up, data scientists rely on default values to simplify their usage. It turns out that these default values can change over time, leading to a specific type of breaking change, defined as Default Argument Breaking Change (DABC). This work reveals 93 DABCs in three Python libraries frequently used in Data Science tasks—Scikit Learn, NumPy, and Pandas—studying their potential impact on more than 500K client applications. We find out that the occurrence of DABCs varies significantly depending on the library; 35% of Scikit Learn clients are affected, while only 0.13% of NumPy clients are impacted. The main reason for introducing DABCs is to enhance API maintainability, but they often change the function’s behavior. We discuss the importance of managing DABCs in third-party DS libraries and provide insights for developers to mitigate the potential impact of these changes in their applications.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"229 ","pages":"Article 112460"},"PeriodicalIF":4.1000,"publicationDate":"2025-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Unboxing Default Argument Breaking Changes in 1 + 2 data science libraries\",\"authors\":\"João Eduardo Montandon , Luciana Lourdes Silva , Cristiano Politowski , Daniel Prates , Arthur de Brito Bonifácio , Ghizlane El Boussaidi\",\"doi\":\"10.1016/j.jss.2025.112460\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Data Science (DS) has become a cornerstone for modern software, enabling data-driven decisions to improve companies services. Following modern software development practices, data scientists use third-party libraries to support their tasks. As the APIs provided by these tools often require an extensive list of arguments to be set up, data scientists rely on default values to simplify their usage. It turns out that these default values can change over time, leading to a specific type of breaking change, defined as Default Argument Breaking Change (DABC). This work reveals 93 DABCs in three Python libraries frequently used in Data Science tasks—Scikit Learn, NumPy, and Pandas—studying their potential impact on more than 500K client applications. We find out that the occurrence of DABCs varies significantly depending on the library; 35% of Scikit Learn clients are affected, while only 0.13% of NumPy clients are impacted. The main reason for introducing DABCs is to enhance API maintainability, but they often change the function’s behavior. We discuss the importance of managing DABCs in third-party DS libraries and provide insights for developers to mitigate the potential impact of these changes in their applications.</div></div>\",\"PeriodicalId\":51099,\"journal\":{\"name\":\"Journal of Systems and Software\",\"volume\":\"229 \",\"pages\":\"Article 112460\"},\"PeriodicalIF\":4.1000,\"publicationDate\":\"2025-05-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Systems and Software\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0164121225001281\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, SOFTWARE ENGINEERING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Systems and Software","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0164121225001281","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

摘要

数据科学（DS）已经成为现代软件的基石，使数据驱动的决策能够改善公司的服务。遵循现代软件开发实践，数据科学家使用第三方库来支持他们的任务。由于这些工具提供的api通常需要设置大量的参数列表，因此数据科学家依赖默认值来简化其使用。事实证明，这些默认值可以随着时间的推移而更改，从而导致特定类型的中断更改，定义为默认参数中断更改（DABC）。这项工作揭示了数据科学任务中经常使用的三个Python库（scikit Learn、NumPy和panda）中的93个dabc，并研究了它们对超过50万个客户端应用程序的潜在影响。研究发现，不同文库的DABCs发生情况差异显著；35%的Scikit Learn客户端受到影响，而只有0.13%的NumPy客户端受到影响。引入dabc的主要原因是为了增强API的可维护性，但是它们经常会改变函数的行为。我们讨论了在第三方DS库中管理ddc的重要性，并为开发人员提供了一些见解，以减轻这些更改对其应用程序的潜在影响。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Unboxing Default Argument Breaking Changes in 1 + 2 data science libraries

Data Science (DS) has become a cornerstone for modern software, enabling data-driven decisions to improve companies services. Following modern software development practices, data scientists use third-party libraries to support their tasks. As the APIs provided by these tools often require an extensive list of arguments to be set up, data scientists rely on default values to simplify their usage. It turns out that these default values can change over time, leading to a specific type of breaking change, defined as Default Argument Breaking Change (DABC). This work reveals 93 DABCs in three Python libraries frequently used in Data Science tasks—Scikit Learn, NumPy, and Pandas—studying their potential impact on more than 500K client applications. We find out that the occurrence of DABCs varies significantly depending on the library; 35% of Scikit Learn clients are affected, while only 0.13% of NumPy clients are impacted. The main reason for introducing DABCs is to enhance API maintainability, but they often change the function’s behavior. We discuss the importance of managing DABCs in third-party DS libraries and provide insights for developers to mitigate the potential impact of these changes in their applications.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Systems and Software 工程技术-计算机：理论方法

CiteScore

8.60

自引率

5.70%

发文量

193

审稿时长

16 weeks

期刊介绍： The Journal of Systems and Software publishes papers covering all aspects of software engineering and related hardware-software-systems issues. All articles should include a validation of the idea presented, e.g. through case studies, experiments, or systematic comparisons with other approaches already in practice. Topics of interest include, but are not limited to: •Methods and tools for, and empirical studies on, software requirements, design, architecture, verification and validation, maintenance and evolution •Agile, model-driven, service-oriented, open source and global software development •Approaches for mobile, multiprocessing, real-time, distributed, cloud-based, dependable and virtualized systems •Human factors and management concerns of software development •Data management and big data issues of software systems •Metrics and evaluation, data mining of software development resources •Business and economic aspects of software development processes The journal welcomes state-of-the-art surveys and reports of practical experience for all of these topics.