Empirical Software Engineering最新文献_第5页

Can we spot energy regressions using developers tests? 我们能否通过开发人员测试发现能量倒退？

IF 4.1 2区计算机科学

Empirical Software Engineering Pub Date : 2024-07-25 DOI: 10.1007/s10664-023-10429-1

Benjamin Danglot, Jean-Rémy Falleri, Romain Rouvoy

{"title":"Can we spot energy regressions using developers tests?","authors":"Benjamin Danglot, Jean-Rémy Falleri, Romain Rouvoy","doi":"10.1007/s10664-023-10429-1","DOIUrl":"https://doi.org/10.1007/s10664-023-10429-1","url":null,"abstract":"<h3 data-test=\"abstract-sub-heading\">\u0000<b>Context</b>\u0000</h3><p><i>Software Energy Consumption</i> is gaining more and more attention. In this paper, we tackle the problem of warning developers about the increase of SEC of their programs during <i>Continuous Integration</i> (CI).</p><h3 data-test=\"abstract-sub-heading\">\u0000<b>Objective</b>\u0000</h3><p>In this study, we investigate if the CI can leverage developers’ tests to perform <i>energy regression testing</i>. Energy regression is similar to performance regression but focuses on the energy consumption of the program instead of standard performance indicators, like execution time or memory consumption.</p><h3 data-test=\"abstract-sub-heading\">\u0000<b>Method</b>\u0000</h3><p>We perform an exploratory study of the usage of developers’ tests for energy regression testing. We first investigate if developers’ tests can be used to obtain stable SEC indicators. Then, we evaluate if comparing the SEC of developers’ tests between two versions can pinpoint energy regressions introduced by automated program mutations. Finally, we manually evaluate several real commits pinpointed by our approach.</p><h3 data-test=\"abstract-sub-heading\">\u0000<b>Results</b>\u0000</h3><p>Our study will pave the way for automated SEC regression tools that can be readily deployed inside an existing CI infrastructure to raise awareness of SEC issues among practitioners.</p>","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":"61 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141776685","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

On Refining the SZZ Algorithm with Bug Discussion Data 利用错误讨论数据完善 SZZ 算法

IF 4.1 2区计算机科学

Empirical Software Engineering Pub Date : 2024-07-24 DOI: 10.1007/s10664-024-10511-2

Pooja Rani, Fernando Petrulio, Alberto Bacchelli

{"title":"On Refining the SZZ Algorithm with Bug Discussion Data","authors":"Pooja Rani, Fernando Petrulio, Alberto Bacchelli","doi":"10.1007/s10664-024-10511-2","DOIUrl":"https://doi.org/10.1007/s10664-024-10511-2","url":null,"abstract":"<h3 data-test=\"abstract-sub-heading\">Context</h3><p>Researchers testing hypotheses related to factors leading to low-quality software often rely on historical data, specifically on details regarding when defects were introduced into a codebase of interest. The prevailing techniques to determine the introduction of defects revolve around variants of the <span>SZZ</span> algorithm. This algorithm leverages information on the lines modified during a bug-fixing commit and finds when these lines were last modified, thereby identifying bug-introducing commits.</p><h3 data-test=\"abstract-sub-heading\">Objectives</h3><p>Despite several improvements and variants, <span>SZZ</span> struggles with accuracy, especially in cases of unrelated modifications or that touch files not involved in the introduction of the bug in the version control systems (aka <i>tangled commit</i> and <i>ghost commits</i>).</p><h3 data-test=\"abstract-sub-heading\">Methods</h3><p>Our research investigates whether and how incorporating content retrieved from bug discussions can address these issues by identifying the related and external files and thus improve the efficacy of the <span>SZZ</span> algorithm.</p><h3 data-test=\"abstract-sub-heading\">Results</h3><p>To conduct our investigation, we take advantage of the links manually inserted by Mozilla developers in bug reports to signal which commits inserted bugs. Thus, we prepared the dataset, <i>RoTEB</i>, comprised of 12,472 bug reports. We first manually inspect a sample of 369 bug reports related to these bug-fixing or bug-introducing commits and investigate whether the files mentioned in these reports could be useful for <span>SZZ</span>. After we found evidence that the mentioned files are relevant, we augment <span>SZZ</span> with this information, using different strategies, and evaluate the resulting approach against multiple <span>SZZ</span> variations.</p><h3 data-test=\"abstract-sub-heading\">Conclusion</h3><p>We define a taxonomy outlining the rationale behind developers’ references to diverse files in their discussions. We observe that bug discussions often mention files relevant to enhancing the <span>SZZ</span> algorithm’s efficacy. Then, we verify that integrating these file references augments the precision of <span>SZZ</span> in pinpointing bug-introducing commits. Yet, it does not markedly influence recall. These results deepen our comprehension of the usefulness of bug discussions for <span>SZZ</span>. Future work can leverage our dataset and explore other techniques to further address the problem of tangled commits and ghost commits. Data & material: https://zenodo.org/records/11484723.</p>","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":"94 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141776686","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Test-based patch clustering for automatically-generated patches assessment 基于测试的补丁聚类，用于自动生成的补丁评估

IF 4.1 2区计算机科学

Empirical Software Engineering Pub Date : 2024-07-24 DOI: 10.1007/s10664-024-10503-2

Matias Martinez, Maria Kechagia, Anjana Perera, Justyna Petke, Federica Sarro, Aldeida Aleti

{"title":"Test-based patch clustering for automatically-generated patches assessment","authors":"Matias Martinez, Maria Kechagia, Anjana Perera, Justyna Petke, Federica Sarro, Aldeida Aleti","doi":"10.1007/s10664-024-10503-2","DOIUrl":"https://doi.org/10.1007/s10664-024-10503-2","url":null,"abstract":"<p>Previous studies have shown that Automated Program Repair (<span>apr</span>) techniques suffer from the overfitting problem. Overfitting happens when a patch is run and the test suite does not reveal any error, but the patch actually does not fix the underlying bug or it introduces a new defect that is not covered by the test suite. Therefore, the patches generated by <span>apr</span> tools need to be validated by human programmers, which can be very costly, and prevents <span>apr</span> tool adoption in practice. Our work aims to minimize the number of plausible patches that programmers have to review, thereby reducing the time required to find a correct patch. We introduce a novel light-weight test-based patch clustering approach called <span>xTestCluster</span>, which clusters patches based on their dynamic behavior. <span>xTestCluster</span> is applied after the patch generation phase in order to analyze the generated patches from one or more repair tools and to provide more information about those patches for facilitating patch assessment. The novelty of <span>xTestCluster</span> lies in using information from execution of newly generated test cases to cluster patches generated by multiple APR approaches. A cluster is formed of patches that fail on the same generated test cases. The output from <span>xTestCluster</span> gives developers <i>a)</i> a way of reducing the number of patches to analyze, as they can focus on analyzing a sample of patches from each cluster, <i>b)</i> additional information (new test cases and their results) attached to each patch. After analyzing 902 plausible patches from 21 Java <span>apr</span> tools, our results show that <span>xTestCluster</span> is able to reduce the number of patches to review and analyze with a median of 50%. <span>xTestCluster</span> can save a significant amount of time for developers that have to review the multitude of patches generated by <span>apr</span> tools, and provides them with new test cases that expose the differences in behavior between generated patches. Moreover, <span>xTestCluster</span> can complement other patch assessment techniques that help detect patch misclassifications.</p>","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":"35 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141776766","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Free open source communities sustainability: Does it make a difference in software quality? 自由开放源码社区的可持续性：它对软件质量有影响吗？

IF 4.1 2区计算机科学

Empirical Software Engineering Pub Date : 2024-07-23 DOI: 10.1007/s10664-024-10529-6

Adam Alami, Raúl Pardo, Johan Linåker

{"title":"Free open source communities sustainability: Does it make a difference in software quality?","authors":"Adam Alami, Raúl Pardo, Johan Linåker","doi":"10.1007/s10664-024-10529-6","DOIUrl":"https://doi.org/10.1007/s10664-024-10529-6","url":null,"abstract":"<h3 data-test=\"abstract-sub-heading\">Context</h3><p>Free and Open Source Software (FOSS) communities’ ability to stay viable and productive over time is pivotal for society as they maintain the building blocks that digital infrastructure, products, and services depend on. Sustainability may, however, be characterized from multiple aspects, and less is known how these aspects interplay and impact community outputs, and software quality specifically.</p><h3 data-test=\"abstract-sub-heading\">Objective</h3><p>This study, therefore, aims to empirically explore how the different aspects of FOSS sustainability impact software quality.</p><h3 data-test=\"abstract-sub-heading\">Method</h3><p>16 sustainability metrics across four categories were sampled and applied to a set of 217 OSS projects sourced from the Apache Software Foundation Incubator program. The impact of a decline in the sustainability metrics was analyzed against eight software quality metrics using Bayesian data analysis, which incorporates probability distributions to represent the regression coefficients and intercepts.</p><h3 data-test=\"abstract-sub-heading\">Results</h3><p>Findings suggest that selected sustainability metrics do not significantly affect defect density or code coverage. However, a positive impact of community age was observed on specific code quality metrics, such as risk complexity, number of very large files, and code duplication percentage. Interestingly, findings show that even when communities are experiencing sustainability, certain code quality metrics are negatively impacted.</p><h3 data-test=\"abstract-sub-heading\">Conclusion</h3><p>Findings imply that code quality practices are not consistently linked to sustainability, and defect management and prevention may be prioritized over the former. Results suggest that growth, resulting in a more complex and large codebase, combined with a probable lack of understanding of code quality standards, may explain the degradation in certain aspects of code quality.</p>","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":"165 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141776767","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Explaining poor performance of text-based machine learning models for vulnerability detection 解释基于文本的机器学习模型在漏洞检测中表现不佳的原因

IF 4.1 2区计算机科学

Empirical Software Engineering Pub Date : 2024-07-22 DOI: 10.1007/s10664-024-10519-8

Kollin Napier, Tanmay Bhowmik, Zhiqian Chen

{"title":"Explaining poor performance of text-based machine learning models for vulnerability detection","authors":"Kollin Napier, Tanmay Bhowmik, Zhiqian Chen","doi":"10.1007/s10664-024-10519-8","DOIUrl":"https://doi.org/10.1007/s10664-024-10519-8","url":null,"abstract":"<p>With an increase of severity in software vulnerabilities, machine learning models are being adopted to combat this threat. Given the possibilities towards usage of such models, research in this area has introduced various approaches. Although models may differ in performance, there is an overall lack of explainability in understanding how a model learns and predicts. Furthermore, recent research suggests that models perform poorly in detecting vulnerabilities when interpreting source code as text, known as “text-based” models. To help explain this poor performance, we explore the dimensions of explainability. From recent studies on text-based models, we experiment with removal of overlapping features present in training and testing datasets, deemed “cross-cutting”. We conduct scenario experiments removing such “cross-cutting” data and reassessing model performance. Based on the results, we examine how removal of these “cross-cutting” features may affect model performance. Our results show that removal of “cross-cutting” features may provide greater performance of models in general, thus leading to explainable dimensions regarding data dependency and agnostic models. Overall, we conclude that model performance can be improved, and explainable aspects of such models can be identified via empirical analysis of the models’ performance.</p>","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":"36 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141739973","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Towards Exploring the Limitations of Test Selection Techniques on Graph Neural Networks: An Empirical Study 探索图神经网络测试选择技术的局限性：实证研究

IF 4.1 2区计算机科学

Empirical Software Engineering Pub Date : 2024-07-22 DOI: 10.1007/s10664-024-10515-y

Xueqi Dang, Yinghua Li, Wei Ma, Yuejun Guo, Qiang Hu, Mike Papadakis, Maxime Cordy, Yves Le Traon

{"title":"Towards Exploring the Limitations of Test Selection Techniques on Graph Neural Networks: An Empirical Study","authors":"Xueqi Dang, Yinghua Li, Wei Ma, Yuejun Guo, Qiang Hu, Mike Papadakis, Maxime Cordy, Yves Le Traon","doi":"10.1007/s10664-024-10515-y","DOIUrl":"https://doi.org/10.1007/s10664-024-10515-y","url":null,"abstract":"<p>Graph Neural Networks (GNNs) have gained prominence in various domains, such as social network analysis, recommendation systems, and drug discovery, due to their ability to model complex relationships in graph-structured data. GNNs can exhibit incorrect behavior, resulting in severe consequences. Therefore, testing is necessary and pivotal. However, labeling all test inputs for GNNs can be prohibitively costly and time-consuming, especially when dealing with large and complex graphs. In response to these challenges, test selection has emerged as a strategic approach to alleviate labeling expenses. The objective of test selection is to select a subset of tests from the complete test set. While various test selection techniques have been proposed for traditional deep neural networks (DNNs), their adaptation to GNNs presents unique challenges due to the distinctions between DNN and GNN test data. Specifically, DNN test inputs are independent of each other, whereas GNN test inputs (nodes) exhibit intricate interdependencies. Therefore, it remains unclear whether DNN test selection approaches can perform effectively on GNNs. To fill the gap, we conduct an empirical study that systematically evaluates the effectiveness of various test selection methods in the context of GNNs, focusing on three critical aspects: <b>1) Misclassification detection</b>: selecting test inputs that are more likely to be misclassified; <b>2) Accuracy estimation</b>: selecting a small set of tests to precisely estimate the accuracy of the whole testing set; <b>3) Performance enhancement</b>: selecting retraining inputs to improve the GNN accuracy. Our empirical study encompasses 7 graph datasets and 8 GNN models, evaluating 22 test selection approaches. Our study includes not only node classification datasets but also graph classification datasets. Our findings reveal that: 1) In GNN misclassification detection, confidence-based test selection methods, which perform well in DNNs, do not demonstrate the same level of effectiveness; 2) In terms of GNN accuracy estimation, clustering-based methods, while consistently performing better than random selection, provide only slight improvements; 3) Regarding selecting inputs for GNN performance improvement, test selection methods, such as confidence-based and clustering-based test selection methods, demonstrate only slight effectiveness; 4) Concerning performance enhancement, node importance-based test selection methods are not suitable, and in many cases, they even perform worse than random selection.</p>","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":"92 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141739972","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Prioritizing test cases for deep learning-based video classifiers 确定基于深度学习的视频分类器测试用例的优先级

IF 4.1 2区计算机科学

Empirical Software Engineering Pub Date : 2024-07-22 DOI: 10.1007/s10664-024-10520-1

Yinghua Li, Xueqi Dang, Lei Ma, Jacques Klein, Tegawendé F. Bissyandé

{"title":"Prioritizing test cases for deep learning-based video classifiers","authors":"Yinghua Li, Xueqi Dang, Lei Ma, Jacques Klein, Tegawendé F. Bissyandé","doi":"10.1007/s10664-024-10520-1","DOIUrl":"https://doi.org/10.1007/s10664-024-10520-1","url":null,"abstract":"<p>The widespread adoption of video-based applications across various fields highlights their importance in modern software systems. However, in comparison to images or text, labelling video test cases for the purpose of assessing system accuracy can lead to increased expenses due to their temporal structure and larger volume. Test prioritization has emerged as a promising approach to mitigate the labeling cost, which prioritizes potentially misclassified test inputs so that such inputs can be identified earlier with limited time and manual labeling efforts. However, applying existing prioritization techniques to video test cases faces certain limitations: they do not account for the unique temporal information present in video data. Unlike static image datasets that only contain spatial information, video inputs consist of multiple frames that capture the dynamic changes of objects over time. In this paper, we propose VRank, the first test prioritization approach designed specifically for video test inputs. The fundamental idea behind VRank is that video-type tests with a higher probability of being misclassified by the evaluated DNN classifier are considered more likely to reveal faults and will be prioritized higher. To this end, we train a ranking model with the aim of predicting the probability of a given test input being misclassified by a DNN classifier. This prediction relies on four types of generated features: temporal features (TF), video embedding features (EF), prediction features (PF), and uncertainty features (UF). We rank all test inputs in the target test set based on their misclassification probabilities. Videos with a higher likelihood of being misclassified will be prioritized higher. We conducted an empirical evaluation to assess the performance of VRank, involving 120 subjects with both natural and noisy datasets. The experimental results reveal VRank outperforms all compared test prioritization methods, with an average improvement of 5.76%<span>(sim )</span>46.51% on natural datasets and 4.26%<span>(sim )</span>53.56% on noisy datasets.</p>","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":"45 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141739839","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Does using Bazel help speed up continuous integration builds? 使用 Bazel 是否有助于加快持续集成构建？

IF 4.1 2区计算机科学

Empirical Software Engineering Pub Date : 2024-07-19 DOI: 10.1007/s10664-024-10497-x

Shenyu Zheng, Bram Adams, Ahmed E. Hassan

{"title":"Does using Bazel help speed up continuous integration builds?","authors":"Shenyu Zheng, Bram Adams, Ahmed E. Hassan","doi":"10.1007/s10664-024-10497-x","DOIUrl":"https://doi.org/10.1007/s10664-024-10497-x","url":null,"abstract":"<p>A long continuous integration (CI) build forces developers to wait for CI feedback before starting subsequent development activities, leading to time wasted. In addition to a variety of build scheduling and test selection heuristics studied in the past, new artifact-based build technologies like Bazel have built-in support for advanced performance optimizations such as parallel build and incremental build (caching of build results). However, little is known about the extent to which new build technologies like Bazel deliver on their promised benefits, especially for long-build duration projects. In this study, we collected 383 Bazel projects from GitHub, then studied their parallel and incremental build usage of Bazel in popular CI services (GitHub Actions, CircleCI, Travis CI, or Buildkite), and compared the results with Maven projects. We conducted 3,500 experiments on 383 Bazel projects and analyzed the build logs of a subset of 70 buildable projects to evaluate the performance impact of Bazel’s parallel builds. Additionally, we performed 102,232 experiments on the 70 buildable projects’ last 100 commits to evaluate Bazel’s incremental build performance. Our results show that 31.23% of Bazel projects adopt a CI service but do not use Bazel in the CI service, while for those who do use Bazel in CI, 27.76% of them use other tools to facilitate Bazel’s execution. Compared to sequential builds, the median speedups for long-build duration projects are 2.00x, 3.84x, 7.36x, and 12.80x, at parallelism degrees 2, 4, 8, and 16, respectively, even though, compared to a clean build, applying incremental build achieves a median speedup of 4.22x (with a build system tool-independent CI cache) and 4.71x (with a build system tool-specific cache) for long-build duration projects. Our results provide guidance for developers to improve the usage of Bazel in their projects, and emphasize the importance of exploring modern build systems due to the current lack of literature and their potential advantages within contemporary software practices such as cloud computing and microservice.</p>","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":"42 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141739974","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An empirical study on cross-component dependent changes: A case study on the components of OpenStack 关于跨组件依赖性变更的实证研究：OpenStack 组件案例研究

IF 4.1 2区计算机科学

Empirical Software Engineering Pub Date : 2024-07-13 DOI: 10.1007/s10664-024-10488-y

Ali Arabat, Mohammed Sayagh

{"title":"An empirical study on cross-component dependent changes: A case study on the components of OpenStack","authors":"Ali Arabat, Mohammed Sayagh","doi":"10.1007/s10664-024-10488-y","DOIUrl":"https://doi.org/10.1007/s10664-024-10488-y","url":null,"abstract":"<p>Modern software systems are composed of several loosely coupled components. Typical examples of such systems are plugin-based systems, microservices, and modular software systems. Such types of software systems have several advantages motivating a large body of research to propose approaches for the migration from monolithic software systems to modular architecture (mainly microservices). However, a few prior works investigated how to assist practitioners post-migration. In fact, these studies reported that having independent components is difficult to achieve, leading to several evolution challenges that are still manually handled. In this paper, we conduct an empirical study on OpenStack and its 1,310 projects (aka., components) to better understand how the changes to a given component depend on changes of other components (aka., cross-component changes) so managers can better plan for their changes in a cross-component project, and researchers can design better solutions to help practitioners in such a co-evolution and the maintenance of multi-component software systems. We observe that the concept of ownership exists in the context of OpenStack, as different teams do not share the responsibility over the studied components of OpenStack. Despite that, dependencies across different components are not exceptional but exist in all releases. In fact, we observe that 52,069 OpenStack changes (almost 10% of all the changes) depend on changes in other components. Such a number of cross-component changes continuously increased over different years and releases, up to a certain release in which OpenStack decided to make a major refactoring of its project by archiving over 500 projects. We also found that a good percentage of cross-component changes (20.85%) end up being abandoned, leading to wasteful synchronization efforts between different teams. These dependent changes occur for different reasons that we qualitatively identified, among which configuration-related (34.64%) changes are the most common, while developers create cross-component changes for testing purposes then abandon such changes as the most prevalent category (38.45%). These cross-project changes lead to collaboration between different teams to synchronize their changes since 24.55% of the pairs of two cross-component changes are made by different developers, while the second change is reviewed by the developer of the first change of the pair (71.63%). Even when a developer makes both changes, that developer ends up working on a project that she/he is less familiar with. Our results shed light on how different components end up being dependent on each other in terms of their maintenance, which can help managers better plan their changes and guide researchers in proposing appropriate approaches for assisting in the maintenance of multi-component systems.</p>","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":"93 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141613761","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Fixing Dockerfile smells: an empirical study 修复 Dockerfile 的气味：实证研究

IF 4.1 2区计算机科学

Empirical Software Engineering Pub Date : 2024-07-06 DOI: 10.1007/s10664-024-10471-7

Giovanni Rosa, Federico Zappone, Simone Scalabrino, Rocco Oliveto

{"title":"Fixing Dockerfile smells: an empirical study","authors":"Giovanni Rosa, Federico Zappone, Simone Scalabrino, Rocco Oliveto","doi":"10.1007/s10664-024-10471-7","DOIUrl":"https://doi.org/10.1007/s10664-024-10471-7","url":null,"abstract":"<p>Docker is the <i>de facto</i> standard for software containerization. A Dockerfile contains the requirements to build a Docker image containing a target application. There are several best practice rules for writing Dockerfiles, but the developers do not always follow them. Violations of such practices, known as Dockerfile smells, can negatively impact the reliability and performance of Docker images. Previous studies showed that Dockerfile smells are widely diffused, and there is a lack of automatic tools that support developers in fixing them. However, it is still unclear what Dockerfile smells get fixed by developers and to what extent developers would be willing to fix smells in the first place. The aim of our study is twofold. First, we want to understand what Dockerfiles smells receive more attention from developers, i.e., are fixed more frequently in the history of open-source projects. Second, we want to check if developers are willing to accept changes aimed at fixing Dockerfile smells (e.g., generated by an automated tool), to understand if they care about them. We evaluated the survivability of Dockerfile smells from a total of 53,456 unique Dockerfiles, where we manually validated a large sample of smell-removing commits to understand (i) if developers performed the change with the intention of removing bad practices, and (ii) if they were aware of the removed smell. In the second part, we used a rule-based tool to automatically fix Dockerfile smells. Then, we proposed such fixes to developers via pull requests. Finally, we quantitatively and qualitatively evaluated the outcome after a monitoring period of more than 7 months. The results of our study showed that most developers pay more attention to changes aimed at improving the performance of Dockerfiles (image size and build time). Moreover, they are willing to accept the fixes for the most common smells, with some exceptions (e.g., missing version pinning for OS packages).</p>","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":"18 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141567176","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0