Understanding Software-2.0

ACM Transactions on Software Engineering and Methodology (TOSEM) Pub Date : 2021-07-23 DOI:10.1145/3453478

Malinda Dilhara, Ameya Ketkar, Danny Dig

{"title":"Understanding Software-2.0","authors":"Malinda Dilhara, Ameya Ketkar, Danny Dig","doi":"10.1145/3453478","DOIUrl":null,"url":null,"abstract":"Enabled by a rich ecosystem of Machine Learning (ML) libraries, programming using learned models, i.e., Software-2.0, has gained substantial adoption. However, we do not know what challenges developers encounter when they use ML libraries. With this knowledge gap, researchers miss opportunities to contribute to new research directions, tool builders do not invest resources where automation is most needed, library designers cannot make informed decisions when releasing ML library versions, and developers fail to use common practices when using ML libraries. We present the first large-scale quantitative and qualitative empirical study to shed light on how developers in Software-2.0 use ML libraries, and how this evolution affects their code. Particularly, using static analysis we perform a longitudinal study of 3,340 top-rated open-source projects with 46,110 contributors. To further understand the challenges of ML library evolution, we survey 109 developers who introduce and evolve ML libraries. Using this rich dataset we reveal several novel findings. Among others, we found an increasing trend of using ML libraries: The ratio of new Python projects that use ML libraries increased from 2% in 2013 to 50% in 2018. We identify several usage patterns including the following: (i) 36% of the projects use multiple ML libraries to implement various stages of the ML workflows, (ii) developers update ML libraries more often than the traditional libraries, (iii) strict upgrades are the most popular for ML libraries among other update kinds, (iv) ML library updates often result in cascading library updates, and (v) ML libraries are often downgraded (22.04% of cases). We also observed unique challenges when evolving and maintaining Software-2.0 such as (i) binary incompatibility of trained ML models and (ii) benchmarking ML models. Finally, we present actionable implications of our findings for researchers, tool builders, developers, educators, library vendors, and hardware vendors.","PeriodicalId":7398,"journal":{"name":"ACM Transactions on Software Engineering and Methodology (TOSEM)","volume":"2 1","pages":"1 - 42"},"PeriodicalIF":0.0000,"publicationDate":"2021-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Software Engineering and Methodology (TOSEM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3453478","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 14

Abstract

Enabled by a rich ecosystem of Machine Learning (ML) libraries, programming using learned models, i.e., Software-2.0, has gained substantial adoption. However, we do not know what challenges developers encounter when they use ML libraries. With this knowledge gap, researchers miss opportunities to contribute to new research directions, tool builders do not invest resources where automation is most needed, library designers cannot make informed decisions when releasing ML library versions, and developers fail to use common practices when using ML libraries. We present the first large-scale quantitative and qualitative empirical study to shed light on how developers in Software-2.0 use ML libraries, and how this evolution affects their code. Particularly, using static analysis we perform a longitudinal study of 3,340 top-rated open-source projects with 46,110 contributors. To further understand the challenges of ML library evolution, we survey 109 developers who introduce and evolve ML libraries. Using this rich dataset we reveal several novel findings. Among others, we found an increasing trend of using ML libraries: The ratio of new Python projects that use ML libraries increased from 2% in 2013 to 50% in 2018. We identify several usage patterns including the following: (i) 36% of the projects use multiple ML libraries to implement various stages of the ML workflows, (ii) developers update ML libraries more often than the traditional libraries, (iii) strict upgrades are the most popular for ML libraries among other update kinds, (iv) ML library updates often result in cascading library updates, and (v) ML libraries are often downgraded (22.04% of cases). We also observed unique challenges when evolving and maintaining Software-2.0 such as (i) binary incompatibility of trained ML models and (ii) benchmarking ML models. Finally, we present actionable implications of our findings for researchers, tool builders, developers, educators, library vendors, and hardware vendors.

查看原文本刊更多论文

理解软件- 2.0

在丰富的机器学习(ML)库生态系统的支持下，使用学习模型(即Software-2.0)进行编程已经获得了大量采用。然而，我们不知道开发人员在使用ML库时会遇到什么挑战。由于这种知识差距，研究人员错过了为新的研究方向做出贡献的机会，工具构建者无法在最需要自动化的地方投入资源，库设计人员在发布ML库版本时无法做出明智的决定，开发人员在使用ML库时无法使用常见的实践。我们提出了第一个大规模的定量和定性实证研究，以阐明Software-2.0中的开发人员如何使用ML库，以及这种演变如何影响他们的代码。特别是，使用静态分析，我们对4340个顶级开源项目进行了纵向研究，这些项目有46110个贡献者。为了进一步了解ML库发展的挑战，我们调查了109名引入和发展ML库的开发人员。利用这个丰富的数据集，我们揭示了几个新的发现。其中，我们发现使用ML库的趋势越来越明显:使用ML库的新Python项目的比例从2013年的2%增加到2018年的50%。我们确定了几种使用模式，包括以下:(i) 36%的项目使用多个ML库来实现ML工作流的各个阶段，(ii)开发人员比传统库更频繁地更新ML库，(iii)严格升级是ML库中最受欢迎的更新类型，(iv) ML库更新通常导致级联库更新，(v) ML库经常降级(22.04%的情况)。我们还观察到在发展和维护Software-2.0时面临的独特挑战，例如(i)训练过的机器学习模型的二进制不兼容性和(ii)对机器学习模型进行基准测试。最后，我们为研究人员、工具构建者、开发人员、教育工作者、图书馆供应商和硬件供应商提供了可操作的启示。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ACM Transactions on Software Engineering and Methodology (TOSEM)

自引率

0.00%

发文量