Eduardo Guerra, Everaldo Gomes, Jeferson Ferreira, Igor Wiese, Phyllipe Lima, Marco Gerosa, Paulo Meirelles
{"title":"How do annotations affect Java code readability?","authors":"Eduardo Guerra, Everaldo Gomes, Jeferson Ferreira, Igor Wiese, Phyllipe Lima, Marco Gerosa, Paulo Meirelles","doi":"10.1007/s10664-024-10460-w","DOIUrl":"https://doi.org/10.1007/s10664-024-10460-w","url":null,"abstract":"<h3 data-test=\"abstract-sub-heading\">Context</h3><p>Code annotations have gained widespread popularity in programming languages, offering developers the ability to attach metadata to code elements to define custom behaviors. Many modern frameworks and APIs use annotations to keep integration less verbose and located nearer to the corresponding code element. Despite these advantages, practitioners’ anecdotal evidence suggests that annotations might negatively affect code readability.</p><h3 data-test=\"abstract-sub-heading\">Objective</h3><p>To better understand this effect, this paper systematically investigates the relationship between code annotations and code readability.</p><h3 data-test=\"abstract-sub-heading\">Method</h3><p>In a survey with software developers (n=332), we present 15 pairs of Java code snippets with and without code annotations. These pairs were designed considering five categories of annotation used in real-world Java frameworks and APIs. Survey participants selected the code snippet they considered more readable for each pair and answered an open question about how annotations affect the code’s readability.</p><h3 data-test=\"abstract-sub-heading\">Results</h3><p>Preferences were scattered for all categories of annotation usage, revealing no consensus among participants. The answers were spread even when segregated by participants’ programming or annotation-related experience. Nevertheless, some participants showed a consistent preference in favor or against annotations across all categories, which may indicate a personal preference. Our qualitative analysis of the open-ended questions revealed that participants often praise annotation impacts on design, maintainability, and productivity but expressed contrasting views on understandability and code clarity.</p><h3 data-test=\"abstract-sub-heading\">Conclusions</h3><p>Software developers and API designers can consider our results when deciding whether to use annotations, equipped with the insight that developers express contrasting views of the annotations’ impact on code readability.</p>","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":"150 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140888185","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Patterns of multi-container composition for service orchestration with Docker Compose","authors":"Kalvin Eng, Abram Hindle, Eleni Stroulia","doi":"10.1007/s10664-024-10462-8","DOIUrl":"https://doi.org/10.1007/s10664-024-10462-8","url":null,"abstract":"<p>Software design patterns present general code solutions to common software design problems. Modern software systems rely heavily on containers for running their constituent service components. Yet, despite the prevalence of ready-to-use Docker service images ready to participate in multi-container service compositions of applications, developers do not have much guidance on how to compose their own Docker service orchestrations. Thus in this work, we curate a dataset of successful projects that employ Docker Compose as an orchestration tool to run multiple service containers; then, we engage in qualitative and quantitative analysis of Docker Compose configurations. The collection of data and analysis enables the identification and naming of repeating multi-container composition patterns that are used in numerous successful open-source projects, much like software design patterns. These patterns highlight how software systems are orchestrated in the real-world and can give examples to anybody wishing to compose their own service orchestrations. These contributions also advance empirical research in software engineering patterns as evidence is provided about how Docker Compose is used.</p>","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":"15 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140887977","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shaoyu Yang, Xiang Chen, Ke Liu, Guang Yang, Chi Yu
{"title":"Automatic bi-modal question title generation for Stack Overflow with prompt learning","authors":"Shaoyu Yang, Xiang Chen, Ke Liu, Guang Yang, Chi Yu","doi":"10.1007/s10664-024-10466-4","DOIUrl":"https://doi.org/10.1007/s10664-024-10466-4","url":null,"abstract":"<p>When drafting question posts for Stack Overflow, developers may not accurately summarize the core problems in the question titles, which can cause these questions to not get timely help. Therefore, improving the quality of question titles has attracted the wide attention of researchers. An initial study aimed to automatically generate the titles by only analyzing the code snippets in the question body. However, this study ignored the helpful information in their corresponding problem descriptions. Therefore, we propose an approach <span>SOTitle+</span> by considering bi-modal information (i.e., the code snippets and the problem descriptions) in the question body. Then we formalize the title generation for different programming languages as separate but related tasks and utilize multi-task learning to solve these tasks. Later we fine-tune the pre-trained language model CodeT5 to automatically generate the titles. Unfortunately, the inconsistent inputs and optimization objectives between the pre-training task and our investigated task may make fine-tuning hard to fully explore the knowledge of the pre-trained model. To solve this issue, <span>SOTitle+</span> further prompt-tunes CodeT5 with hybrid prompts (i.e., mixture of hard and soft prompts). To verify the effectiveness of <span>SOTitle+</span>, we construct a large-scale high-quality corpus from recent data dumps shared by Stack Overflow. Our corpus includes 179,119 high-quality question posts for six popular programming languages. Experimental results show that <span>SOTitle+</span> can significantly outperform four state-of-the-art baselines in both automatic evaluation and human evaluation. In addition, our ablation studies also confirm the effectiveness of component settings (such as bi-modal information, prompt learning, hybrid prompts, and multi-task learning) of <span>SOTitle+</span>. Our work indicates that considering bi-modal information and prompt learning in Stack Overflow title generation is a promising exploration direction.</p>","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":"20 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140889820","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Manuel De Stefano, Dario Di Nucci, Fabio Palomba, Andrea De Lucia
{"title":"An empirical study into the effects of transpilation on quantum circuit smells","authors":"Manuel De Stefano, Dario Di Nucci, Fabio Palomba, Andrea De Lucia","doi":"10.1007/s10664-024-10461-9","DOIUrl":"https://doi.org/10.1007/s10664-024-10461-9","url":null,"abstract":"<p>Quantum computing is a promising field that can solve complex problems beyond traditional computers’ capabilities. Developing high-quality quantum software applications, called quantum software engineering, has recently gained attention. However, quantum software development faces challenges related to code quality. A recent study found that many open-source quantum programs are affected by quantum-specific code smells, with <i>long circuit</i> being the most common. While the study provided relevant insights into the prevalence of code smells in quantum circuits, it did not explore the potential effect of transpilation, a necessary step for executing quantum computer programs, on the emergence of code smells. Indeed, transpilation might alter those characteristics employed to detect the presence of a smell on a circuit. To address this limitation, we present a new study investigating the impact of transpilation on quantum-specific code smells and how different target gate sets affect the results. We conducted experiments on 17 open-source quantum programs alongside a set of 100 synthetic circuits. We found that transpilation can significantly alter the metrics that are used to detect code smells, even into previously smell-free circuits, with the <i>long circuit</i> smell being the most susceptible to transpilation. Furthermore, the choice of the gate set significantly influences the presence and severity of code smells in transpiled circuits, highlighting the need for careful gate set selection to mitigate their impact. These findings have implications for circuit optimization and high-quality quantum software development. Further research is needed to understand the consequences of code smells and their potential impact on quantum computations, considering the characteristics and constraints of different gate sets and hardware platforms.</p>","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":"9 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140829388","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Comparative analysis of real issues in open-source machine learning projects","authors":"Tuan Dung Lai, Anj Simmons, Scott Barnett, Jean-Guy Schneider, Rajesh Vasa","doi":"10.1007/s10664-024-10467-3","DOIUrl":"https://doi.org/10.1007/s10664-024-10467-3","url":null,"abstract":"<h3 data-test=\"abstract-sub-heading\">Context</h3><p>In the last decade of data-driven decision-making, Machine Learning (ML) systems reign supreme. Because of the different characteristics between ML and traditional Software Engineering systems, we do not know to what extent the issue-reporting needs are different, and to what extent these differences impact the issue resolution process.</p><h3 data-test=\"abstract-sub-heading\">Objective</h3><p>We aim to compare the differences between ML and non-ML issues in open-source applied AI projects in terms of resolution time and size of fix. This research aims to enhance the predictability of maintenance tasks by providing valuable insights for issue reporting and task scheduling activities.</p><h3 data-test=\"abstract-sub-heading\">Method</h3><p>We collect issue reports from Github repositories of open-source ML projects using an automatic approach, filter them using ML keywords and libraries, manually categorize them using an adapted deep learning bug taxonomy, and compare resolution time and fix size for ML and non-ML issues in a controlled sample.</p><h3 data-test=\"abstract-sub-heading\">Result</h3><p>147 ML issues and 147 non-ML issues are collected for analysis. We found that ML issues take more time to resolve than non-ML issues, the median difference is 14 days. There is no significant difference in terms of size of fix between ML and non-ML issues. No significant differences are found between different ML issue categories in terms of resolution time and size of fix.</p><h3 data-test=\"abstract-sub-heading\">Conclusion</h3><p>Our study provided evidence that the life cycle for ML issues is stretched, and thus further work is required to identify the reason. The results also highlighted the need for future work to design custom tooling to support faster resolution of ML issues.</p>","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":"5 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140829393","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"CoRT: Transformer-based code representations with self-supervision by predicting reserved words for code smell detection","authors":"Amal Alazba, Hamoud Aljamaan, Mohammad Alshayeb","doi":"10.1007/s10664-024-10445-9","DOIUrl":"https://doi.org/10.1007/s10664-024-10445-9","url":null,"abstract":"<h3 data-test=\"abstract-sub-heading\">Context</h3><p>Code smell detection is the process of identifying poorly designed and implemented code pieces. Machine learning-based approaches require enormous amounts of manually labeled data, which are costly and difficult to scale. Unsupervised semantic feature learning, or learning without manual annotation, is vital for effectively harvesting an enormous amount of available data.</p><h3 data-test=\"abstract-sub-heading\">Objective</h3><p>The objective of this study is to propose a new code smell detection approach that utilizes self-supervised learning to learn intermediate representations without the need for labels and then fine-tune these representations on multiple tasks.</p><h3 data-test=\"abstract-sub-heading\">Method</h3><p>We propose a Code Representation with Transformers (CoRT) to learn the semantic and structural features of the source code by training transformers to recognize masked reserved words that are applied to the code given as input. We empirically demonstrated that the defined proxy task provides a powerful method for learning semantic and structural features. We exhaustively evaluated our approach on four downstream tasks: detection of the Data Class, God Class, Feature Envy, and Long Method code smells. Moreover, we compare our results with those of two paradigms: supervised learning and a feature-based approach. Finally, we conducted a cross-project experiment to evaluate the generalizability of our method to unseen labeled data.</p><h3 data-test=\"abstract-sub-heading\">Results</h3><p>The results indicate that the proposed method has a high detection performance for code smells. For instance, the detection performance of CoRT on Data Class achieved a score of F1 between 88.08–99.4, Area Under Curve (AUC) between 89.62–99.88, and Matthews Correlation Coefficient (MCC) between 75.28–98.8, while God Class achieved a value of F1 ranges from 86.32–99.03, AUC of 92.1–99.85, and MCC of 76.15–98.09. Compared with the baseline model and feature-based approach, CoRT achieved better detection performance and had a high capability to detect code smells in unseen datasets.</p><h3 data-test=\"abstract-sub-heading\">Conclusions</h3><p>The proposed method has been shown to be effective in detecting class-level, and method-level code smells.</p>","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":"48 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140565389","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Taxonomy of inline code comment smells","authors":"","doi":"10.1007/s10664-023-10425-5","DOIUrl":"https://doi.org/10.1007/s10664-023-10425-5","url":null,"abstract":"<h3>Abstract</h3> <p>Code comments play a vital role in source code comprehension and software maintainability. It is common for developers to write comments to explain a code snippet, and commenting code is generally considered a good practice in software engineering. However, low-quality comments can have a detrimental effect on software quality or be ineffective for code understanding. This study aims to create a taxonomy of inline code comment smells and determine how frequently each smell type occurs in software projects. We conducted a multivocal literature review to define the initial taxonomy of inline comment smells. Afterward, we manually labeled 2447 inline comments from eight open-source projects where half of them were Java, and another half were Python projects. We created a taxonomy of 11 inline code comment smell types and found out that the smells exist in both Java and Python projects with varying degrees. Moreover, we conducted an online survey with 41 software practitioners to learn their opinions on these smells and their impact on code comprehension and software maintainability. The survey respondents generally agreed with the taxonomy; however, they reported that some smell types might have a positive effect on code comprehension in certain scenarios. We also opened pull requests and issues fixing the comment smells in the sampled projects, where we got a 27% acceptance rate. We share our manually labeled dataset online and provide implications for software engineering practitioners, researchers, and educators.</p>","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":"114 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140565365","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Christopher S. Timperley, Gijs van der Hoorn, André Santos, Harshavardhan Deshpande, Andrzej Wąsowski
{"title":"ROBUST: 221 bugs in the Robot Operating System","authors":"Christopher S. Timperley, Gijs van der Hoorn, André Santos, Harshavardhan Deshpande, Andrzej Wąsowski","doi":"10.1007/s10664-024-10440-0","DOIUrl":"https://doi.org/10.1007/s10664-024-10440-0","url":null,"abstract":"<p>As robotic systems such as autonomous cars and delivery drones assume greater roles and responsibilities within society, the likelihood and impact of catastrophic software failure within those systems is increased. To aid researchers in the development of new methods to measure and assure the safety and quality of robotics software, we systematically curated a dataset of 221 bugs across 7 popular and diverse software systems implemented via the Robot Operating System (ROS). We produce historically accurate recreations of each of the 221 defective software versions in the form of Docker images, and use a grounded theory approach to examine and categorize their corresponding faults, failures, and fixes. Finally, we reflect on the implications of our findings and outline future research directions for the community.</p>","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":"121 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-03-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140197558","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Qiong Feng, Shuwen Liu, Huan Ji, Xiaotian Ma, Peng Liang
{"title":"An empirical study of untangling patterns of two-class dependency cycles","authors":"Qiong Feng, Shuwen Liu, Huan Ji, Xiaotian Ma, Peng Liang","doi":"10.1007/s10664-023-10438-0","DOIUrl":"https://doi.org/10.1007/s10664-023-10438-0","url":null,"abstract":"<p>Dependency cycles pose a significant challenge to software quality and maintainability. However, there is limited understanding of how practitioners resolve dependency cycles in real-world scenarios. This paper presents an empirical study investigating the recurring patterns employed by software developers to resolve dependency cycles between two classes in practice. We analyzed the data from 38 open-source projects across different domains and manually inspected hundreds of cycle untangling cases. Our findings reveal that developers tend to employ five recurring patterns to address dependency cycles. The chosen patterns are not only determined by dependency relations between cyclic classes, but also highly related to their design context, i.e., how cyclic classes depend on or are depended by their neighbor classes. Through this empirical study, we also discovered three common counterintuitive solutions developers usually adopted during cycles’ handling. These recurring patterns and common counterintuitive solutions observed in dependency cycles’ practice can serve as a taxonomy to improve developers’ awareness and also be used as learning materials for students in software engineering and inexperienced developers. Our results also suggest that, in addition to considering the internal structure of dependency cycles, automatic tools need to consider the design context of cycles to provide better support for refactoring dependency cycles.</p>","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":"12 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140115349","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Machine learning-based test smell detection","authors":"Valeria Pontillo, Dario Amoroso d’Aragona, Fabiano Pecorelli, Dario Di Nucci, Filomena Ferrucci, Fabio Palomba","doi":"10.1007/s10664-023-10436-2","DOIUrl":"https://doi.org/10.1007/s10664-023-10436-2","url":null,"abstract":"<p>Test smells are symptoms of sub-optimal design choices adopted when developing test cases. Previous studies have proved their harmfulness for test code maintainability and effectiveness. Therefore, researchers have been proposing automated, heuristic-based techniques to detect them. However, the performance of these detectors is still limited and dependent on tunable thresholds. We design and experiment with a novel test smell detection approach based on machine learning to detect four test smells. First, we develop the largest dataset of manually-validated test smells to enable experimentation. Afterward, we train six machine learners and assess their capabilities in within- and cross-project scenarios. Finally, we compare the ML-based approach with state-of-the-art heuristic-based techniques. The key findings of the study report a negative result. The performance of the machine learning-based detector is significantly better than heuristic-based techniques, but none of the learners able to overcome an average F-Measure of 51%. We further elaborate and discuss the reasons behind this negative result through a qualitative investigation into the current issues and challenges that prevent the appropriate detection of test smells, which allowed us to catalog the next steps that the research community may pursue to improve test smell detection techniques.</p>","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":"31 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140034815","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}