Syful Islam, R. Kula, Christoph Treude, Bodin Chinthanet, T. Ishio, Kenichi Matsumoto
{"title":"Contrasting Third-Party Package Management User Experience","authors":"Syful Islam, R. Kula, Christoph Treude, Bodin Chinthanet, T. Ishio, Kenichi Matsumoto","doi":"10.26226/morressier.613b5419842293c031b5b64b","DOIUrl":"https://doi.org/10.26226/morressier.613b5419842293c031b5b64b","url":null,"abstract":"The management of third-party package dependencies is crucial to most technology stacks, with package managers acting as brokers to ensure that a verified package is correctly installed, configured, or removed from an application. Diversity in technology stacks has led to dozens of package ecosystems with their own management features. While recent studies have shown that developers struggle to migrate their dependencies, the common assumption is that package ecosystems are used without any issue. In this study, we explore 13 package ecosystems to understand whether their features correlate with the experience of their users. By studying experience through the questions that developers ask on the question-and-answer site Stack Overflow, we find that developer questions are grouped into three themes (i.e., Package management, Input-Output, and Package Usage). Our preliminary analysis indicates that specific features are correlated with the user experience. Our work lays out future directions to investigate the trade-offs involved in designing the ideal package ecosystem.","PeriodicalId":205629,"journal":{"name":"2021 IEEE International Conference on Software Maintenance and Evolution (ICSME)","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124835479","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Serena Elisa Ponta, Wolf Fischer, H. Plate, A. Sabetta
{"title":"The Used, the Bloated, and the Vulnerable: Reducing the Attack Surface of an Industrial Application","authors":"Serena Elisa Ponta, Wolf Fischer, H. Plate, A. Sabetta","doi":"10.26226/morressier.613b5418842293c031b5b612","DOIUrl":"https://doi.org/10.26226/morressier.613b5418842293c031b5b612","url":null,"abstract":"Software reuse may result in software bloat when significant portions of application dependencies are effectively unused. Several tools exist to remove unused (byte)code from an application or its dependencies, thus producing smaller artifacts and, potentially, reducing the overall attack surface. In this paper we evaluate the ability of three debloating tools to distinguish which dependency classes are necessary for an application to function correctly from those that could be safely removed. To do so, we conduct a case study on a real-world commercial Java application. Our study shows that the tools we used were able to correctly identify a considerable amount of redundant code, which could be removed without altering the results of the existing application tests. One of the redundant classes turned out to be (formerly) vulnerable, confirming that this technique has the potential to be applied for hardening purposes. However, by manually reviewing the results of our experiments, we observed that none of the tools can handle a widely used default mechanism for dynamic class loading.","PeriodicalId":205629,"journal":{"name":"2021 IEEE International Conference on Software Maintenance and Evolution (ICSME)","volume":"117 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131707034","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
D. Pogrebnoy, Ivan Kuznetsov, Yaroslav Golubev, Vladislav Tankov, T. Bryksin
{"title":"Sorrel: an IDE Plugin for Managing Licenses and Detecting License Incompatibilities","authors":"D. Pogrebnoy, Ivan Kuznetsov, Yaroslav Golubev, Vladislav Tankov, T. Bryksin","doi":"10.1109/ICSME52107.2021.00059","DOIUrl":"https://doi.org/10.1109/ICSME52107.2021.00059","url":null,"abstract":"Software development is a complex process that includes many different tasks besides just writing code. One of the aspects of software engineering is selecting and managing licenses for the given project. In this paper, we present SORREL-a plugin for managing licenses and detecting potential incompatibilities for IntelliJ IDEA, a popular Java IDE. The plugin scans the project in search of information about the project license and the licenses of its libraries. If the project does not yet have a license, the plugin provides the developer with recommendations for choosing the most suitable open license, and if there is a license, it informs the programmer about potential licensing violations. The tool makes it easier for developers to choose a proper license for a project and avoid most of the licensing errors-all inside the familiar IDE editor. The plugin and its source code are available online on GitHub: https://github.com/JetBrains-Research/sorrel. A demonstration video can be found at https://youtu.be/doUeAwPjcPE.","PeriodicalId":205629,"journal":{"name":"2021 IEEE International Conference on Software Maintenance and Evolution (ICSME)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116004767","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
P. Oliver, Michael Homer, Jens Dietrich, C. Anslow
{"title":"A Partial Reproduction of A Guided Genetic Algorithm for Automated Crash Reproduction","authors":"P. Oliver, Michael Homer, Jens Dietrich, C. Anslow","doi":"10.26226/morressier.613b5418842293c031b5b5d3","DOIUrl":"https://doi.org/10.26226/morressier.613b5418842293c031b5b5d3","url":null,"abstract":"This paper is a partial reproduction of work by Soltani et al. which presented EvoCrash, a tool for replicating software failures in Java by reproducing stack traces. EvoCrash uses a guided genetic algorithm to generate JUnit test cases capable of reproducing failures more reliably than existing coverage-based solutions. In this paper, we present the findings of our reproduction of the initial study exploring the effectiveness of EvoCrash and comparison to three existing solutions: STAR, JCHARMING, and MuCrash. We further explored the capabilities of EvoCrash on different programs to check for selection bias. We found that we can reproduce the crashes covered by EvoCrash in the original study while reproducing two additional crashes not reported as reproduced. We also find that EvoCrash was unsuccessful in reproducing several crashes from the JCHARMING paper, which were excluded from the original study. Both EvoCrash and JCHARMING could reproduce 73% of the crashes from the JCHARMING paper. We found that there was potentially some selection bias in the dataset for EvoCrash. We also found that some crashes had been reported as non-reproducible even when EvoCrash could reproduce them. We suggest this may be due to EvoCrash becoming stuck in a local optimum.","PeriodicalId":205629,"journal":{"name":"2021 IEEE International Conference on Software Maintenance and Evolution (ICSME)","volume":"179 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124093011","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Dialogue Management for Interactive API Search","authors":"Zachary Eberhart, Collin McMillan","doi":"10.26226/morressier.613b5418842293c031b5b5e8","DOIUrl":"https://doi.org/10.26226/morressier.613b5418842293c031b5b5e8","url":null,"abstract":"API search involves finding components in an API that are relevant to a programming task. For example, a programmer may need a function in a C library that opens a new network connection, then another function that sends data across that connection. Unfortunately, programmers often have trouble finding the API components that they need. A strong scientific consensus is emerging towards developing interactive tool support that responds to conversational feedback, emulating the experience of asking a fellow human programmer for help. A major barrier to creating these interactive tools is implementing dialogue management for API search. Dialogue management involves determining how a system should respond to user input, such as whether to ask a clarification question or to display potential results. In this paper, we present a dialogue manager for interactive API search that considers search results and dialogue history to select efficient actions. We implement two dialogue policies: a hand-crafted policy and a policy optimized via reinforcement learning. We perform a synthetics evaluation and a human evaluation comparing the policies to a generic single-turn, top-N policy used by source code search engines.","PeriodicalId":205629,"journal":{"name":"2021 IEEE International Conference on Software Maintenance and Evolution (ICSME)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117075245","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Ensemble Models for Neural Source Code Summarization of Subroutines","authors":"Alexander LeClair, Aakash Bansal, Collin McMillan","doi":"10.26226/morressier.613b5418842293c031b5b62e","DOIUrl":"https://doi.org/10.26226/morressier.613b5418842293c031b5b62e","url":null,"abstract":"A source code summary of a subroutine is a brief description of that subroutine. Summaries underpin a majority of documentation consumed by programmers, such as the method summaries in JavaDocs. Source code summarization is the task of writing these summaries. At present, most state-of-the-art approaches for code summarization are neural network-based solutions akin to seq2seq, graph2seq, and other encoder-decoder architectures. The input to the encoder is source code, while the decoder helps predict the natural language summary. While these models tend to be similar in structure, evidence is emerging that different models make different contributions to prediction quality - differences in model performance are orthogonal and complementary rather than uniform over the entire dataset. In this paper, we explore the orthogonal nature of different neural code summarization approaches and propose ensemble models to exploit this orthogonality for better overall performance. We demonstrate that a simple ensemble strategy boosts performance by up to 14.8%, and provide an explanation for this boost. The takeaway from this work is that a relatively small change to the inference procedure in most neural code summarization techniques leads to outsized improvements in prediction quality.","PeriodicalId":205629,"journal":{"name":"2021 IEEE International Conference on Software Maintenance and Evolution (ICSME)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129006298","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Mastropaolo, Emad Aghajani, L. Pascarella, G. Bavota
{"title":"An Empirical Study on Code Comment Completion","authors":"A. Mastropaolo, Emad Aghajani, L. Pascarella, G. Bavota","doi":"10.26226/morressier.613b5418842293c031b5b5da","DOIUrl":"https://doi.org/10.26226/morressier.613b5418842293c031b5b5da","url":null,"abstract":"Code comments play a prominent role in program comprehension activities. However, source code is not always documented and code and comments not always co-evolve. To deal with these issues, researchers have proposed techniques to automatically generate comments documenting a given code at hand. The most recent works in the area applied deep learning (DL) techniques to support such a task. Despite the achieved advances, the empirical evaluations of these approaches show that they are still far from a performance level that would make them valuable for developers. We tackle a simpler and related problem: Code comment completion. Instead of generating a comment for a given code from scratch, we investigate the extent to which state-of-the-art techniques can help developers in writing comments faster. We present a large-scale study in which we empirically assess how a simple n-gram model and the recently proposed Text-To-Text Transfer Transformer (T5) architecture can perform in autocompleting a code comment the developer is typing. The achieved results show the superiority of the T5 model, despite the n-gram model being a competitive solution.","PeriodicalId":205629,"journal":{"name":"2021 IEEE International Conference on Software Maintenance and Evolution (ICSME)","volume":"289 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114382313","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Anthony S Peruma, V. Arnaoudova, Christian D. Newman
{"title":"IDEAL: An Open-Source Identifier Name Appraisal Tool","authors":"Anthony S Peruma, V. Arnaoudova, Christian D. Newman","doi":"10.26226/morressier.613b5418842293c031b5b5ca","DOIUrl":"https://doi.org/10.26226/morressier.613b5418842293c031b5b5ca","url":null,"abstract":"Developers must comprehend the code they will maintain, meaning that the code must be legible and reasonably self-descriptive. Unfortunately, there is still a lack of research and tooling that supports developers in understanding their naming practices; whether the names they choose make sense, whether they are consistent, and whether they convey the information required of them. In this paper, we present IDEAL, a tool that will provide feedback to developers about their identifier naming practices. Among its planned features, it will support linguistic anti-pattern detection, which is what will be discussed in this paper. IDEAL is designed to, and will, be extended to cover further anti-patterns, naming structures, and practices in the near future. IDEAL is open-source and publicly available, with a demo video available at: https://youtu.be/fVoOYGe50zg","PeriodicalId":205629,"journal":{"name":"2021 IEEE International Conference on Software Maintenance and Evolution (ICSME)","volume":"167 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113995289","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On the Evaluation of Commit Message Generation Models: An Experimental Study","authors":"Wei Tao, Yanlin Wang, Ensheng Shi, Lun Du, Hongyu Zhang, Dongmei Zhang, Wenqiang Zhang","doi":"10.26226/morressier.613b5419842293c031b5b634","DOIUrl":"https://doi.org/10.26226/morressier.613b5419842293c031b5b634","url":null,"abstract":"Commit messages are natural language descriptions of code changes, which are important for program understanding and maintenance. However, writing commit messages manually is time-consuming and laborious, especially when the code is updated frequently. Various approaches utilizing generation or retrieval techniques have been proposed to automatically generate commit messages. To achieve a better understanding of how the existing approaches perform in solving this problem, this paper conducts a systematic and in-depth analysis of the state-of-the-art models and datasets. We find that: (1) Different variants of the BLEU metric are used in previous works, which affects the evaluation and understanding of existing methods. (2) Most existing datasets are crawled only from Java repositories while repositories in other programming languages are not sufficiently explored. (3) Dataset splitting strategies can influence the performance of existing models by a large margin. Some models show better performance when the datasets are split by commit, while other models perform better when the datasets are split by timestamp or by project. Based on our findings, we conduct a human evaluation and find the BLEU metric that best correlates with the human scores for the task. We also collect a large-scale, information-rich, and multi-language commit message dataset MCMD and evaluate existing models on this dataset. Furthermore, we conduct extensive experiments under different dataset splitting strategies and suggest the suitable models under different scenarios. Based on the experimental results and findings, we provide feasible suggestions for comprehensively evaluating commit message generation models and discuss possible future research directions. We believe this work can help practitioners and researchers better evaluate and select models for automatic commit message generation. Our source code and data are available at https://github.com/DeepSoftwareAnalytics/CommitMsgEmpirical.","PeriodicalId":205629,"journal":{"name":"2021 IEEE International Conference on Software Maintenance and Evolution (ICSME)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115442192","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Md Omar Faruk Rokon, Pei Yan, Risul Islam, M. Faloutsos
{"title":"Repo2Vec: A Comprehensive Embedding Approach for Determining Repository Similarity","authors":"Md Omar Faruk Rokon, Pei Yan, Risul Islam, M. Faloutsos","doi":"10.26226/morressier.613b5418842293c031b5b614","DOIUrl":"https://doi.org/10.26226/morressier.613b5418842293c031b5b614","url":null,"abstract":"How can we identify similar repositories and clusters among a large online archive, such as GitHub? Determining repository similarity is an essential building block in studying the dynamics and the evolution of such software ecosystems. The key challenge is to determine the right representation for the diverse repository features in a way that: (a) it captures all aspects of the available information, and (b) it is readily usable by ML algorithms. We propose Repo2Vec, a comprehensive embedding approach to represent a repository as a distributed vector by combining features from three types of information sources. As our key novelty, we consider three types of information: (a) metadata, (b) the structure of the repository, and (c) the source code. We also introduce a series of embedding approaches to represent and combine these information types into a single embedding. We evaluate our method with two real datasets from GitHub for a combined 1013 repositories. First, we show that our method outperforms previous methods in terms of precision (93 % vs 78 %), with nearly twice as many Strongly Similar repositories and 30 % fewer False Positives. Second, we show how Repo2Vec provides a solid basis for: (a) distinguishing between malware and benign repositories, and (b) identifying a meaningful hierarchical clustering. For example, we achieve 98 % precision, and 96 % recall in distinguishing malware and benign repositories. Overall, our work is a fundamental building block for enabling many repository analysis functions such as repository categorization by target platform or intention, detecting code-reuse and clones, and identifying lineage and evolution.","PeriodicalId":205629,"journal":{"name":"2021 IEEE International Conference on Software Maintenance and Evolution (ICSME)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132186177","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}