Empirical Software Engineering最新文献_第2页

An empirical study of token-based micro commits 基于代币的微提交实证研究

IF 4.1 2区计算机科学

Empirical Software Engineering Pub Date : 2024-09-04 DOI: 10.1007/s10664-024-10527-8

Masanari Kondo, Daniel M. German, Yasutaka Kamei, Naoyasu Ubayashi, Osamu Mizuno

{"title":"An empirical study of token-based micro commits","authors":"Masanari Kondo, Daniel M. German, Yasutaka Kamei, Naoyasu Ubayashi, Osamu Mizuno","doi":"10.1007/s10664-024-10527-8","DOIUrl":"https://doi.org/10.1007/s10664-024-10527-8","url":null,"abstract":"<p>In software development, developers frequently apply maintenance activities to the source code that change a few lines by a single commit. A good understanding of the characteristics of such small changes can support quality assurance approaches (e.g., automated program repair), as it is likely that small changes are addressing deficiencies in other changes; thus, understanding the reasons for creating small changes can help understand the types of errors introduced. Eventually, these reasons and the types of errors can be used to enhance quality assurance approaches for improving code quality. While prior studies used code churns to characterize and investigate the small changes, such a definition has a critical limitation. Specifically, it loses the information of changed tokens in a line. For example, this definition fails to distinguish the following two one-line changes: (1) changing a string literal to fix a displayed message and (2) changing a function call and adding a new parameter. These are definitely maintenance activities, but we deduce that researchers and practitioners are interested in supporting the latter change. To address this limitation, in this paper, we define <i>micro commits</i>, a type of small change based on changed tokens. Our goal is to quantify small changes using changed tokens. Changed tokens allow us to identify small changes more precisely. In fact, this token-level definition can distinguish the above example. We investigate defined micro commits in four OSS projects and understand their characteristics as the first empirical study on token-based micro commits. We find that micro commits mainly replace a single name or literal token, and micro commits are more likely used to fix bugs. Additionally, we propose the use of token-based information to support software engineering approaches in which very small changes significantly affect their effectiveness.</p>","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":"26 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142216784","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Software product line testing: a systematic literature review 软件产品生产线测试：系统文献综述

IF 4.1 2区计算机科学

Empirical Software Engineering Pub Date : 2024-09-02 DOI: 10.1007/s10664-024-10516-x

Halimeh Agh, Aidin Azamnouri, Stefan Wagner

{"title":"Software product line testing: a systematic literature review","authors":"Halimeh Agh, Aidin Azamnouri, Stefan Wagner","doi":"10.1007/s10664-024-10516-x","DOIUrl":"https://doi.org/10.1007/s10664-024-10516-x","url":null,"abstract":"<p>A Software Product Line (SPL) is a software development paradigm in which a family of software products shares a set of core assets. Testing has a vital role in both single-system development and SPL development in identifying potential faults by examining the behavior of a product or products, but it is especially challenging in SPL. There have been many research contributions in the SPL testing field; therefore, assessing the current state of research and practice is necessary to understand the progress in testing practices and to identify the gap between required techniques and existing approaches. This paper aims to survey existing research on SPL testing to provide researchers and practitioners with up-to-date evidence and issues that enable further development of the field. To this end, we conducted a Systematic Literature Review (SLR) with seven research questions in which we identified and analyzed 118 studies dating from 2003 to 2022. The results indicate that the literature proposes many techniques for specific aspects (e.g., controlling cost/effort in SPL testing); however, other elements (e.g., regression testing and non-functional testing) still need to be covered by existing research. Furthermore, most approaches are evaluated by only one empirical method, most of which are academic evaluations. This may jeopardize the adoption of approaches in industry. The results of this study can help identify gaps in SPL testing since specific points of SPL Engineering still need to be addressed entirely.</p>","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":"9 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142216785","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Consensus task interaction trace recommender to guide developers’ software navigation 共识任务交互跟踪推荐器为开发人员的软件导航提供指导

IF 4.1 2区计算机科学

Empirical Software Engineering Pub Date : 2024-09-02 DOI: 10.1007/s10664-024-10528-7

Layan Etaiwi, Pascal Sager, Yann-Gaël Guéhéneuc, Sylvie Hamel

{"title":"Consensus task interaction trace recommender to guide developers’ software navigation","authors":"Layan Etaiwi, Pascal Sager, Yann-Gaël Guéhéneuc, Sylvie Hamel","doi":"10.1007/s10664-024-10528-7","DOIUrl":"https://doi.org/10.1007/s10664-024-10528-7","url":null,"abstract":"<p>Developers must complete change tasks on large software systems for maintenance and development purposes. Having a custom software system with numerous instances that meet the growing client demand for features and functionalities increases the software complexity. Developers, especially newcomers, must spend a significant amount of time navigating through the source code and switching back and forth between files in order to understand such a system and find the parts relevant for performing current tasks. This navigation can be difficult, time-consuming and affect developers’ productivity. To help guide developers’ navigation towards successfully resolving tasks with minimal time and effort, we present a task-based recommendation approach that exploits aggregated developers’ interaction traces. Our novel approach, Consensus Task Interaction Trace Recommender (CITR), recommends file(s)-to-edit that help perform a set of tasks based on a tasks-related set of interaction traces obtained from developers who performed similar change tasks on the same or different custom instances of the same system. Our approach uses a consensus algorithm, which takes as input task-related interaction traces and recommends a consensus task interaction trace that developers can use to complete given similar change tasks that require editing (a) common file(s). To evaluate the efficiency of our approach, we perform three different evaluations. The first evaluation measures the accuracy of CITR recommendations. In the second evaluation, we assess to what extent CITR can help developers by conducting an observational controlled experiment in which two groups of developers performed evaluation tasks with and without the recommendations of CITR. In the third and last evaluation, we compare CITR to a state-of-the-art recommendation approach, MI. Results report with statistical significance that CITR can correctly recommend on average 73% of the files to be edited. Furthermore, they show that CITR can increase developers’ successful task completion rate. CITR outperforms MI by an average of 31% higher recommendation accuracy.</p>","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":"420 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142216786","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

On combining commit grouping and build skip prediction to reduce redundant continuous integration activity 关于结合提交分组和构建跳转预测以减少冗余的持续集成活动

IF 4.1 2区计算机科学

Empirical Software Engineering Pub Date : 2024-08-30 DOI: 10.1007/s10664-024-10477-1

Divya M. Kamath, Eduardo Fernandes, Bram Adams, Ahmed E. Hassan

{"title":"On combining commit grouping and build skip prediction to reduce redundant continuous integration activity","authors":"Divya M. Kamath, Eduardo Fernandes, Bram Adams, Ahmed E. Hassan","doi":"10.1007/s10664-024-10477-1","DOIUrl":"https://doi.org/10.1007/s10664-024-10477-1","url":null,"abstract":"<h3 data-test=\"abstract-sub-heading\">Context</h3><p>Continuous Integration (CI) is a resource intensive, widely used industry practice. The two most commonly used heuristics to reduce the number of builds are either by grouping multiple builds together or by skipping builds predicted to be safe. Yet, both techniques have their disadvantages in terms of missing build failures and respectively higher build turn-around time (delays).</p><h3 data-test=\"abstract-sub-heading\">Objective</h3><p>We aim to bring together these two lines of research, empirically comparing their advantages and disadvantages over time, and proposing and evaluating two ways in which these build avoidance heuristics can be combined more effectively, i.e., the ML-CI model based on machine learning and the Timeout Rule.</p><h3 data-test=\"abstract-sub-heading\">Method</h3><p>We empirically study the trade-off between reduction in the number of builds required and the speed of recognition of failing builds on a dataset of 79,482 builds from 20 open-source projects.</p><h3 data-test=\"abstract-sub-heading\">Results</h3><p>We find that both of our hybrid heuristics can provide a significant improvement in terms of less missed build failures and lower delays than the baseline heuristics. They substantially reduce the turn-around-time of commits by 96% in comparison to skipping heuristics, the Timeout Rule also enables a median of 26.10% less builds to be scheduled than grouping heuristics.</p><h3 data-test=\"abstract-sub-heading\">Conclusions</h3><p>Our hybrid approaches offer build engineers a better flexibility in terms of scheduling builds during CI without compromising the quality of the resulting software.</p>","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":"70 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142216789","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Challenges and practices of deep learning model reengineering: A case study on computer vision 深度学习模型再造的挑战与实践：计算机视觉案例研究

IF 4.1 2区计算机科学

Empirical Software Engineering Pub Date : 2024-08-20 DOI: 10.1007/s10664-024-10521-0

Wenxin Jiang, Vishnu Banna, Naveen Vivek, Abhinav Goel, Nicholas Synovic, George K. Thiruvathukal, James C. Davis

{"title":"Challenges and practices of deep learning model reengineering: A case study on computer vision","authors":"Wenxin Jiang, Vishnu Banna, Naveen Vivek, Abhinav Goel, Nicholas Synovic, George K. Thiruvathukal, James C. Davis","doi":"10.1007/s10664-024-10521-0","DOIUrl":"https://doi.org/10.1007/s10664-024-10521-0","url":null,"abstract":"<h3 data-test=\"abstract-sub-heading\">Context</h3><p>Many engineering organizations are reimplementing and extending deep neural networks from the research community. We describe this process as deep learning model reengineering. Deep learning model reengineering — reusing, replicating, adapting, and enhancing state-of-the-art deep learning approaches — is challenging for reasons including under-documented reference models, changing requirements, and the cost of implementation and testing.</p><h3 data-test=\"abstract-sub-heading\">Objective</h3><p>Prior work has characterized the challenges of deep learning model development, but as yet we know little about the deep learning model reengineering process and its common challenges. Prior work has examined DL systems from a “product” view, examining defects from projects regardless of the engineers’ purpose. Our study is focused on reengineering activities from a “process” view, and focuses on engineers specifically engaged in the reengineering process.</p><h3 data-test=\"abstract-sub-heading\">Method</h3><p>Our goal is to understand the characteristics and challenges of deep learning model reengineering. We conducted a mixed-methods case study of this phenomenon, focusing on the context of computer vision. Our results draw from two data sources: defects reported in open-source reeengineering projects, and interviews conducted with practitioners and the leaders of a reengineering team. From the defect data source, we analyzed 348 defects from 27 open-source deep learning projects. Meanwhile, our reengineering team replicated 7 deep learning models over two years; we interviewed 2 open-source contributors, 4 practitioners, and 6 reengineering team leaders to understand their experiences.</p><h3 data-test=\"abstract-sub-heading\">Results</h3><p>Our results describe how deep learning-based computer vision techniques are reengineered, quantitatively analyze the distribution of defects in this process, and qualitatively discuss challenges and practices. We found that most defects (58%) are reported by re-users, and that reproducibility-related defects tend to be discovered during training (68% of them are). Our analysis shows that most environment defects (88%) are interface defects, and most environment defects (46%) are caused by API defects. We found that training defects have diverse symptoms and root causes. We identified four main challenges in the DL reengineering process: model operationalization, performance debugging, portability of DL operations, and customized data pipeline. Integrating our quantitative and qualitative data, we propose a novel reengineering workflow.</p><h3 data-test=\"abstract-sub-heading\">Conclusions</h3><p>Our findings inform several conclusion, including: standardizing model reengineering practices, developing validation tools to support model reengineering, automated support beyond manual model reengineering, and measuring additional unknown aspects of model reengineering.</p>","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":"7 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142216810","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

How does parenthood affect an ICT practitioner’s work? A survey study with fathers 为人父母如何影响信息和通信技术从业人员的工作？一项针对父亲的调查研究

IF 4.1 2区计算机科学

Empirical Software Engineering Pub Date : 2024-08-19 DOI: 10.1007/s10664-024-10534-9

Larissa Rocha, Edna Dias Canedo, Claudia Pinto Pereira, Carla Bezerra, Fabiana Freitas Mendes

{"title":"How does parenthood affect an ICT practitioner’s work? A survey study with fathers","authors":"Larissa Rocha, Edna Dias Canedo, Claudia Pinto Pereira, Carla Bezerra, Fabiana Freitas Mendes","doi":"10.1007/s10664-024-10534-9","DOIUrl":"https://doi.org/10.1007/s10664-024-10534-9","url":null,"abstract":"<h3 data-test=\"abstract-sub-heading\">Context</h3><p>Many studies have investigated the perception of software development teams about gender bias, inclusion policies, and the impact of remote work on productivity. The studies indicate that mothers and fathers working in the software industry had to reconcile homework, work activities, and child care.</p><h3 data-test=\"abstract-sub-heading\">Objective</h3><p>This study investigates the impact of parenthood on Information and Communications Technology (ICT) and how the fathers perceive the mothers’ challenges. Recognizing their difficulties and knowing the mothers’ challenges can be the first step towards making the work environment friendlier for everyone.</p><h3 data-test=\"abstract-sub-heading\">Method</h3><p>We surveyed 155 fathers from industry and academia from 10 different countries, however, most of the respondents are from Brazil (92.3%). Data was analyzed quantitatively and qualitatively. We employed Grounded Theory to identify factors related to (i) paternity leave, (ii) working time after paternity, (iii) childcare-related activities, (iv) prejudice at work after paternity, (v) fathers’ perception of prejudice against mothers, (vi) challenges and difficulties in paternity. We also conduct a correlational and regression analysis to explore some research questions further.</p><h3 data-test=\"abstract-sub-heading\">Results</h3><p>In general, fathers do not suffer harassment or prejudice at work for being fathers. However, they perceive that mothers suffer distrust in the workplace and live with work overload because they have to dedicate themselves to many activities. They also suggested actions to mitigate parents’ difficulties.</p><h3 data-test=\"abstract-sub-heading\">Conclusions</h3><p>Despite some fathers wanting to participate more in taking care of their children, others do not even recognize the difficulties that mothers can face in the work. Therefore, it is important to explore the problems and implement actions to build a more parent-friendly work environment.</p>","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":"15 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142216807","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Recommendations for analysing and meta-analysing small sample size software engineering experiments 对小样本量软件工程实验进行分析和元分析的建议

IF 4.1 2区计算机科学

Empirical Software Engineering Pub Date : 2024-08-17 DOI: 10.1007/s10664-024-10504-1

Barbara Kitchenham, Lech Madeyski

{"title":"Recommendations for analysing and meta-analysing small sample size software engineering experiments","authors":"Barbara Kitchenham, Lech Madeyski","doi":"10.1007/s10664-024-10504-1","DOIUrl":"https://doi.org/10.1007/s10664-024-10504-1","url":null,"abstract":"<h3 data-test=\"abstract-sub-heading\">Context</h3><p>Software engineering (SE) experiments often have small sample sizes. This can result in data sets with non-normal characteristics, which poses problems as standard parametric meta-analysis, using the standardized mean difference (<i>StdMD</i>) effect size, assumes normally distributed sample data. Small sample sizes and non-normal data set characteristics can also lead to unreliable estimates of parametric effect sizes. Meta-analysis is even more complicated if experiments use complex experimental designs, such as two-group and four-group cross-over designs, which are popular in SE experiments.</p><h3 data-test=\"abstract-sub-heading\">Objective</h3><p>Our objective was to develop a validated and robust meta-analysis method that can help to address the problems of small sample sizes and complex experimental designs without relying upon data samples being normally distributed.</p><h3 data-test=\"abstract-sub-heading\">Method</h3><p>To illustrate the challenges, we used real SE data sets. We built upon previous research and developed a robust meta-analysis method able to deal with challenges typical for SE experiments. We validated our method via simulations comparing <i>StdMD</i> with two robust alternatives: the probability of superiority (<span>(hat{p})</span>) and Cliffs’ <i>d</i>.</p><h3 data-test=\"abstract-sub-heading\">Results</h3><p>We confirmed that many SE data sets are small and that small experiments run the risk of exhibiting non-normal properties, which can cause problems for analysing families of experiments. For simulations of individual experiments and meta-analyses of families of experiments, <span>(hat{p})</span> and Cliff’s <i>d</i> consistently outperformed <i>StdMD</i> in terms of negligible small sample bias. They also had better power for log-normal and Laplace samples, although lower power for normal and gamma samples. Tests based on <span>(hat{p})</span> always had better or equal power than tests based on Cliff’s <i>d</i>, and across all but one simulation condition, <span>(hat{p})</span> Type 1 error rates were less biased.</p><h3 data-test=\"abstract-sub-heading\">Conclusions</h3><p>Using <span>(hat{p})</span> is a low-risk option for analysing and meta-analysing data from small sample-size SE randomized experiments. Parametric methods are only preferable if you have prior knowledge of the data distribution.</p>","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":"281 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142216808","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Augmented testing to support manual GUI-based regression testing: An empirical study 支持基于图形用户界面的手动回归测试的增强测试：实证研究

IF 4.1 2区计算机科学

Empirical Software Engineering Pub Date : 2024-08-17 DOI: 10.1007/s10664-024-10522-z

Andreas Bauer, Julian Frattini, Emil Alégroth

{"title":"Augmented testing to support manual GUI-based regression testing: An empirical study","authors":"Andreas Bauer, Julian Frattini, Emil Alégroth","doi":"10.1007/s10664-024-10522-z","DOIUrl":"https://doi.org/10.1007/s10664-024-10522-z","url":null,"abstract":"<h3 data-test=\"abstract-sub-heading\">Context</h3><p>Manual graphical user interface (GUI) software testing presents a substantial part of the overall practiced testing efforts, despite various research efforts to further increase test automation. Augmented Testing (AT), a novel approach for GUI testing, aims to aid manual GUI-based testing through a tool-supported approach where an intermediary visual layer is rendered between the system under test (SUT) and the tester, superimposing relevant test information.</p><h3 data-test=\"abstract-sub-heading\">Objective</h3><p>The primary objective of this study is to gather empirical evidence regarding AT’s efficiency compared to manual GUI-based regression testing. Existing studies involving testing approaches under the AT definition primarily focus on exploratory GUI testing, leaving a gap in the context of regression testing. As a secondary objective, we investigate AT’s benefits, drawbacks, and usability issues when deployed with the demonstrator tool, Scout.</p><h3 data-test=\"abstract-sub-heading\">Method</h3><p>We conducted an experiment involving 13 industry professionals, from six companies, comparing AT to manual GUI-based regression testing. These results were complemented by interviews and Bayesian data analysis (BDA) of the study’s quantitative results.</p><h3 data-test=\"abstract-sub-heading\">Results</h3><p>The results of the Bayesian data analysis revealed that the use of AT shortens test durations in 70% of the cases on average, concluding that AT is more efficient. When comparing the means of the total duration to perform all tests, AT reduced the test duration by 36% in total. Participant interviews highlighted nine benefits and eleven drawbacks of AT, while observations revealed four usability issues.</p><h3 data-test=\"abstract-sub-heading\">Conclusion</h3><p>This study presents empirical evidence of improved efficiency using AT in the context of manual GUI-based regression testing. We further report AT’s benefits, drawbacks, and usability issues. The majority of identified usability issues and drawbacks can be attributed to the tool implementation of AT and, thus, can serve as valuable input for future tool development.</p>","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":"59 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142216811","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Static analysis driven enhancements for comprehension in machine learning notebooks 静态分析驱动增强机器学习笔记本的理解能力

IF 4.1 2区计算机科学

Empirical Software Engineering Pub Date : 2024-08-12 DOI: 10.1007/s10664-024-10525-w

Ashwin Prasad Shivarpatna Venkatesh, Samkutty Sabu, Mouli Chekkapalli, Jiawei Wang, Li Li, Eric Bodden

{"title":"Static analysis driven enhancements for comprehension in machine learning notebooks","authors":"Ashwin Prasad Shivarpatna Venkatesh, Samkutty Sabu, Mouli Chekkapalli, Jiawei Wang, Li Li, Eric Bodden","doi":"10.1007/s10664-024-10525-w","DOIUrl":"https://doi.org/10.1007/s10664-024-10525-w","url":null,"abstract":"<p>Jupyter notebooks have emerged as the predominant tool for data scientists to develop and share machine learning solutions, primarily using Python as the programming language. Despite their widespread adoption, a significant fraction of these notebooks, when shared on public repositories, suffer from insufficient documentation and a lack of coherent narrative. Such shortcomings compromise the readability and understandability of the notebook. Addressing this shortcoming, this paper introduces <span>HeaderGen</span>, a tool-based approach that automatically augments code cells in these notebooks with descriptive markdown headers, derived from a predefined taxonomy of machine learning operations. Additionally, it systematically classifies and displays function calls in line with this taxonomy. The mechanism that powers <span>HeaderGen</span> is an enhanced call graph analysis technique, building upon the foundational analysis available in <i>PyCG</i>. To improve precision, <span>HeaderGen</span> extends PyCG’s analysis with return-type resolution of external function calls, type inference, and flow-sensitivity. Furthermore, leveraging type information, <span>HeaderGen</span> employs pattern matching techniques on the code syntax to annotate code cells. We conducted an empirical evaluation on 15 real-world Jupyter notebooks sourced from Kaggle. The results indicate a high accuracy in call graph analysis, with precision at 95.6% and recall at 95.3%. The header generation has a precision of 85.7% and a recall rate of 92.8% with regard to headers created manually by experts. A user study corroborated the practical utility of <span>HeaderGen</span>, revealing that users found <span>HeaderGen</span> useful in tasks related to comprehension and navigation. To further evaluate the type inference capability of static analysis tools, we introduce <span>TypeEvalPy</span>, a framework for evaluating type inference tools for Python with an in-built micro-benchmark containing 154 code snippets and 845 type annotations in the ground truth. Our comparative analysis on four tools revealed that <span>HeaderGen</span> outperforms other tools in exact matches with the ground truth.</p>","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":"34 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141933882","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Causal inference of server- and client-side code smells in web apps evolution 网络应用程序演进中服务器和客户端代码气味的因果推理

IF 4.1 2区计算机科学

Empirical Software Engineering Pub Date : 2024-08-05 DOI: 10.1007/s10664-024-10478-0

Américo Rio, Fernando Brito e Abreu, Diana Mendes

{"title":"Causal inference of server- and client-side code smells in web apps evolution","authors":"Américo Rio, Fernando Brito e Abreu, Diana Mendes","doi":"10.1007/s10664-024-10478-0","DOIUrl":"https://doi.org/10.1007/s10664-024-10478-0","url":null,"abstract":"<h3 data-test=\"abstract-sub-heading\">Context</h3><p>Code smells (CS) are symptoms of poor design and implementation choices that may lead to increased defect incidence, decreased code comprehension, and longer times to release. Web applications and systems are seldom studied, probably due to the heterogeneity of platforms (server and client-side) and languages, and to study web code smells, we need to consider CS covering that diversity. Furthermore, the literature provides little evidence for the claim that CS are a symptom of poor design, leading to future problems in web apps.</p><h3 data-test=\"abstract-sub-heading\">Objective</h3><p>To study the quantitative evolution and inner relationship of CS in web apps on the server- and client-sides, and their impact on maintainability and app time-to-release (TTR).</p><h3 data-test=\"abstract-sub-heading\">Method</h3><p>We collected and analyzed 18 server-side, and 12 client-side code smells, aka web smells, from consecutive official releases of 12 PHP typical web apps, i.e., with server- and client-code in the same code base, summing 811 releases. Additionally, we collected metrics, maintenance issues, reported bugs, and release dates. We used several methodologies to devise causality relationships among the considered irregular time series, such as Granger-causality and Information Transfer Entropy(TE) with CS from previous one to four releases (lag 1 to 4).</p><h3 data-test=\"abstract-sub-heading\">Results</h3><p>The CS typically evolve the same way inside their group and its possible to analyze them as groups. The CS group trends are: Server, slowly decreasing; Client-side embed, decreasing and JavaScript,increasing. Studying the relationship between CS groups we found that the \"lack of code quality\", measured with CS density proxies, propagates from client code to server code and JavaScript in half of the applications. We found causality relationships between CS and issues. We also found causality from CS groups to bugs in Lag 1, decreasing in the subsequent lags. The values are 15% (lag1), 10% (lag2), and then decrease. The group of client-side embed CS still impacts up to 3 releases before. In group analysis, server-side CS and JavaScript contribute more to bugs. There are causality relationships from individual CS to TTR on lag 1, decreasing on lag 2, and from all CS groups to TTR in lag1, decreasing in the other lags, except for client CS.</p><h3 data-test=\"abstract-sub-heading\">Conclusions</h3><p>There is statistical inference between CS groups. There is also evidence of statistical inference from the CS to web applications’ issues, bugs, and TTR. Client and server-side CS contribute globally to the quality of web applications, this contribution is low, but significant. Depending on the outcome variable (issues, bugs, time-to-release), the contribution quantity from CS is between 10% and 20%.</p>","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":"76 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141933883","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0