R. Meloca, G. Pinto, Leonardo Baiser, Marco Mattos, Ivanilton Polato, I. Wiese, D. Germán
{"title":"Understanding the Usage, Impact, and Adoption of Non-OSI Approved Licenses","authors":"R. Meloca, G. Pinto, Leonardo Baiser, Marco Mattos, Ivanilton Polato, I. Wiese, D. Germán","doi":"10.1145/3196398.3196427","DOIUrl":"https://doi.org/10.1145/3196398.3196427","url":null,"abstract":"The software license is one of the most important non-executable pieces of any software system. However, due to its non-technical nature, developers often misuse or misunderstand software licenses. Although previous studies reported problems related to licenses clashes and inconsistencies, in this paper we shed the light on an important but yet overlooked issue: the use of non-approved open-source licenses. Such licenses claim to be open-source, but have not been formally approved by the Open Source Initiative (OSI). When a developer releases a software under a non-approved license, even if the interest is to make it open-source, the original author might not be granting the rights required by those who use the software. To uncover the reasons behind the use of non-approved licenses, we conducted a mix-method study, mining data from 657K open-source projects and their 4,367K versions, and surveying 76 developers that published some of these projects. Although 1,058,554 of the project versions employ at least one non-approved license, non-approved licenses account for 21.51% of license usage. We also observed that it is not uncommon for developers to change from a non-approved to an approved license. When asked, some developers mentioned that this transition was due to a better understanding of the disadvantages of using an non-approved license. This perspective is particularly important since developers often rely on package managers to easily and quickly get their dependencies working.","PeriodicalId":6639,"journal":{"name":"2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR)","volume":"56 1","pages":"270-280"},"PeriodicalIF":0.0,"publicationDate":"2018-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89229995","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Michele Tufano, Cody Watson, G. Bavota, M. D. Penta, Martin White, D. Poshyvanyk
{"title":"Deep Learning Similarities from Different Representations of Source Code","authors":"Michele Tufano, Cody Watson, G. Bavota, M. D. Penta, Martin White, D. Poshyvanyk","doi":"10.1145/3196398.3196431","DOIUrl":"https://doi.org/10.1145/3196398.3196431","url":null,"abstract":"Assessing the similarity between code components plays a pivotal role in a number of Software Engineering (SE) tasks, such as clone detection, impact analysis, refactoring, etc. Code similarity is generally measured by relying on manually defined or hand-crafted features, e.g., by analyzing the overlap among identifiers or comparing the Abstract Syntax Trees of two code components. These features represent a best guess at what SE researchers can utilize to exploit and reliably assess code similarity for a given task. Recent work has shown, when using a stream of identifiers to represent the code, that Deep Learning (DL) can effectively replace manual feature engineering for the task of clone detection. However, source code can be represented at different levels of abstraction: identifiers, Abstract Syntax Trees, Control Flow Graphs, and Bytecode. We conjecture that each code representation can provide a different, yet orthogonal view of the same code fragment, thus, enabling a more reliable detection of similarities in code. In this paper, we demonstrate how SE tasks can benefit from a DL-based approach, which can automatically learn code similarities from different representations.","PeriodicalId":6639,"journal":{"name":"2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR)","volume":"125 1","pages":"542-553"},"PeriodicalIF":0.0,"publicationDate":"2018-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77170197","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
D. Widder, Michael C Hilton, Christian Kästner, Bogdan Vasilescu
{"title":"I'm Leaving You, Travis: A Continuous Integration Breakup Story","authors":"D. Widder, Michael C Hilton, Christian Kästner, Bogdan Vasilescu","doi":"10.1145/3196398.3196422","DOIUrl":"https://doi.org/10.1145/3196398.3196422","url":null,"abstract":"Continuous Integration (CI) services, which can automatically build, test, and deploy software projects, are an invaluable asset in distributed teams, increasing productivity and helping to maintain code quality. Prior work has shown that CI pipelines can be sophisticated, and choosing and configuring a CI system involves tradeoffs. As CI technology matures, new CI tool offerings arise to meet the distinct wants and needs of software teams, as they negotiate a path through these tradeoffs, depending on their context. In this paper, we begin to uncover these nuances, and tell the story of open-source projects falling out of love with Travis, the earliest and most popular cloud-based CI system. Using logistic regression, we quantify the effects that open-source community factors and project technical factors have on the rate of Travis abandonment. We find that increased build complexity reduces the chances of abandonment, that larger projects abandon at higher rates, and that a project's dominant language has significant but varying effects. Finally, we find the surprising result that metrics of configuration attempts and knowledge dispersion in the project do not appear to affect the rate of abandonment.","PeriodicalId":6639,"journal":{"name":"2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR)","volume":"26 1","pages":"165-169"},"PeriodicalIF":0.0,"publicationDate":"2018-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74144137","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Toward Predicting Architectural Significance of Implementation Issues","authors":"Arman Shahbazian, Daye Nam, N. Medvidović","doi":"10.1145/3196398.3196440","DOIUrl":"https://doi.org/10.1145/3196398.3196440","url":null,"abstract":"In a software system's development lifecycle, engineers make numerous design decisions that subsequently cause architectural change in the system. Previous studies have shown that, more often than not, these architectural changes are unintentional by-products of continual software maintenance tasks. The result of inadvertent architectural changes is accumulation of technical debt and deterioration of software quality. Despite their important implications, there is a relative shortage of techniques, tools, and empirical studies pertaining to architectural design decisions. In this paper, we take a step toward addressing that scarcity by using the information in the issue and code repositories of open-source software systems to investigate the cause and frequency of such architectural design decisions. Furthermore, building on these results, we develop a predictive model that is able to identify the architectural significance of newly submitted issues, thereby helping engineers to prevent the adverse effects of architectural decay. The results of this study are based on the analysis of 21,062 issues affecting 301 versions of 5 large open-source systems for which the code changes and issues were publicly accessible.","PeriodicalId":6639,"journal":{"name":"2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR)","volume":"169 1","pages":"215-219"},"PeriodicalIF":0.0,"publicationDate":"2018-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76120621","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Profiling Call Changes Via Motif Mining","authors":"B. Russo","doi":"10.1145/3196398.3196426","DOIUrl":"https://doi.org/10.1145/3196398.3196426","url":null,"abstract":"Components' interactions in software systems evolve over time increasing in complexity and size. Developers might have hard time to master such complexity during their maintenance activities incrementing the risk to make mistakes. Understanding changes of such interactions helps developer plan their re-factoring activities. In this study, we propose a method to study the occurrence of motifs in call graphs and their role in the evolution of a system. In our settings, motifs are patterns of class calls that can arise for many reasons as, for example, by implementing design choices. By mining motifs of the call graph obtained from each system's release, we were able to profile the evolution of 68 releases of five open source systems and show that 1) systems have common motifs that occur non-randomly and persistently over their releases, 2) motifs can be used to describe the evolution of calls, compare systems and eventually reveal releases that underwent major changes, 3) there are no specific motif types that include design patterns in all systems under study, but each system has motifs that likely include them, motifs that do not include them at all, and motifs that include a design pattern and occur only once in every release. Some of the findings resemble the ones for biological / physical systems and, as such, path the way to study the evolution of call graphs as dynamical systems (i.e., as system regulated by analytic functions).","PeriodicalId":6639,"journal":{"name":"2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR)","volume":"40 1","pages":"203-214"},"PeriodicalIF":0.0,"publicationDate":"2018-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75035115","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Justin Middleton, E. Murphy-Hill, Demetrius Green, A. Meade, R. Mayer, D. White, Steve McDonald
{"title":"Which Contributions Predict Whether Developers are Accepted into GitHub Teams","authors":"Justin Middleton, E. Murphy-Hill, Demetrius Green, A. Meade, R. Mayer, D. White, Steve McDonald","doi":"10.1145/3196398.3196429","DOIUrl":"https://doi.org/10.1145/3196398.3196429","url":null,"abstract":"Open-source software (OSS) often evolves from volunteer contributions, so OSS development teams must cooperate with their communities to attract new developers. However, in view of the myriad ways that developers interact over platforms for OSS development, observers of these communities may have trouble discerning, and thus learning from, the successful patterns of developer-to-team interactions that lead to eventual team acceptance. In this work, we study project communities on GitHub to discover which forms of software contribution characterize developers who begin as development team outsiders and eventually join the team, in contrast to developers who remain team outsiders. From this, we identify and compare the forms of contribution, such as pull requests and several forms of discussion comments, that influence whether new developers join OSS teams, and we discuss the implications that these behavioral patterns have for the focus of designers and educators.","PeriodicalId":6639,"journal":{"name":"2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR)","volume":"36 1","pages":"403-413"},"PeriodicalIF":0.0,"publicationDate":"2018-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81240098","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Feature Location Using Crowd-Based Screencasts","authors":"P. Moslehi, Bram Adams, J. Rilling","doi":"10.1145/3196398.3196439","DOIUrl":"https://doi.org/10.1145/3196398.3196439","url":null,"abstract":"Crowd-based multi-media documents such as screencasts have emerged as a source for documenting requirements of agile software projects. For example, screencasts can describe buggy scenarios of a software product, or present new features in an upcoming release. Unfortunately, the binary format of videos makes traceability between the video content and other related software artifacts (e.g., source code, bug reports) difficult. In this paper, we propose an LDA-based feature location approach that takes as input a set of screencasts (i.e., the GUI text and/or spoken words) to establish traceability link between the features described in the screencasts and source code fragments implementing them. We report on a case study conducted on 10 WordPress screencasts, to evaluate the applicability of our approach in linking these screencasts to their relevant source code artifacts. We find that the approach is able to successfully pinpoint relevant source code files at the top 10 hits using speech and GUI text. We also found that term frequency rebalancing can reduce noise and yield more precise results.","PeriodicalId":6639,"journal":{"name":"2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR)","volume":"63 1","pages":"192-202"},"PeriodicalIF":0.0,"publicationDate":"2018-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82341118","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The Hidden Cost of Code Completion: Understanding the Impact of the Recommendation-List Length on its Efficiency","authors":"Xianhao Jin, Francisco Servant","doi":"10.1145/3196398.3196474","DOIUrl":"https://doi.org/10.1145/3196398.3196474","url":null,"abstract":"Automatic code completion is a useful and popular technique that software developers use to write code more effectively and efficiently. However, while the benefits of code completion are clear, its cost is yet not well understood. We hypothesize the existence of a hidden cost of code completion, which mostly impacts developers when code completion techniques produce long recommendations. We study this hidden cost of code completion by evaluating how the length of the recommendation list affects other factors that may cause inefficiencies in the process. We study how common long recommendations are, whether they often provide low-ranked correct items, whether they incur longer time to be assessed, and whether they were more prevalent when developers did not select any item in the list. In our study, we observe evidence for all these factors, confirming the existence of a hidden cost of code completion.","PeriodicalId":6639,"journal":{"name":"2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR)","volume":"49 1","pages":"70-73"},"PeriodicalIF":0.0,"publicationDate":"2018-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86091358","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Evaluation of Open-Source Software Microbenchmark Suites for Continuous Performance Assessment","authors":"Christoph Laaber, P. Leitner","doi":"10.1145/3196398.3196407","DOIUrl":"https://doi.org/10.1145/3196398.3196407","url":null,"abstract":"Continuous integration (CI) emphasizes quick feedback to developers. This is at odds with current practice of performance testing, which predominantely focuses on long-running tests against entire systems in production-like environments. Alternatively, software microbenchmarking attempts to establish a performance baseline for small code fragments in short time. This paper investigates the quality of microbenchmark suites with a focus on suitability to deliver quick performance feedback and CI integration. We study ten open-source libraries written in Java and Go with benchmark suite sizes ranging from 16 to 983 tests, and runtimes between 11 minutes and 8.75 hours. We show that our study subjects include benchmarks with result variability of 50% or higher, indicating that not all benchmarks are useful for reliable discovery of slowdowns. We further artificially inject actual slowdowns into public API methods of the study subjects and test whether test suites are able to discover them. We introduce a performance-test quality metric called the API benchmarking score (ABS). ABS represents a benchmark suite's ability to find slowdowns among a set of defined core API methods. Resulting benchmarking scores (i.e., fraction of discovered slowdowns) vary between 10% and 100% for the study subjects. This paper's methodology and results can be used to (1) assess the quality of existing microbenchmark suites, (2) select a set of tests to be run as part of CI, and (3) suggest or generate benchmarks for currently untested parts of an API.","PeriodicalId":6639,"journal":{"name":"2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR)","volume":"1 1","pages":"119-130"},"PeriodicalIF":0.0,"publicationDate":"2018-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89430274","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Maleknaz Nayebi, Konstantin Kuznetsov, Paul Chen, A. Zeller, G. Ruhe
{"title":"Anatomy of Functionality Deletion: An Exploratory Study on Mobile Apps","authors":"Maleknaz Nayebi, Konstantin Kuznetsov, Paul Chen, A. Zeller, G. Ruhe","doi":"10.1145/3196398.3196410","DOIUrl":"https://doi.org/10.1145/3196398.3196410","url":null,"abstract":"One of Lehman's laws of software evolution is that the functionality of programs has to increase over time to maintain user satisfaction. In the domain of mobile apps, though, too much functionality can easily impact usability, resource consumption, and maintenance effort. Hence, does the law of continuous growth apply there? This paper shows that in mobile apps, deletion of functionality is actually common, challenging Lehman's law. We analyzed user driven requests for deletions which were found in 213,866 commits from 1,519 open source Android mobile apps from a total of 14,238 releases. We applied hybrid (open and closed) card sorting and created taxonomies for nature and causes of deletions. We found that functionality deletions are mostly motivated by unneeded functionality, poor user experience, and compatibility issues. We also performed a survey with 106 mobile app developers. We found that 78.3% of developers consider deletion of functionality to be equally or more important than the addition of new functionality. Developers confirmed that they plan for deletions. This implies the need to re-think the process of planning for the next release, overcoming the simplistic assumptions to exclusively look at adding functionality to maximize the value of upcoming releases. Our work is the first to study the phenomenon of functionality deletion and opens the door to a wider perspective on software evolution.","PeriodicalId":6639,"journal":{"name":"2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR)","volume":"2 1","pages":"243-253"},"PeriodicalIF":0.0,"publicationDate":"2018-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82943920","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}