{"title":"Raising MSR Researchers: An Experience Report on Teaching a Graduate Seminar Course in Mining Software Repositories (MSR)","authors":"A. Hassan","doi":"10.1145/2901739.2901780","DOIUrl":"https://doi.org/10.1145/2901739.2901780","url":null,"abstract":"This experience report discusses my views on raising MSR researchers through a graduate-level seminar course. A key goal of this report is to kick start a discussion on this topic within our growing community. A discussion for which there is rarely a suitable venue. Yet, it is an essential discussion to have as a community grows, especially given the rapid growth of the MSR community over the past decade.","PeriodicalId":6621,"journal":{"name":"2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR)","volume":"21 1","pages":"121-125"},"PeriodicalIF":0.0,"publicationDate":"2016-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79935239","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
T. Chen, Weiyi Shang, Jinqiu Yang, A. Hassan, Michael W. Godfrey, Mohamed N. Nasser, P. Flora
{"title":"An Empirical Study on the Practice of Maintaining Object-Relational Mapping Code in Java Systems","authors":"T. Chen, Weiyi Shang, Jinqiu Yang, A. Hassan, Michael W. Godfrey, Mohamed N. Nasser, P. Flora","doi":"10.1145/2901739.2901758","DOIUrl":"https://doi.org/10.1145/2901739.2901758","url":null,"abstract":"Databases have become one of the most important components in modern software systems. For example, web services, cloud computing systems, and online transaction processing systems all rely heavily on databases. To abstract the complexity of accessing a database, developers make use of Object-Relational Mapping (ORM) frameworks. ORM frameworks provide an abstraction layer between the application logic and the underlying database. Such abstraction layer automatically maps objects in Object-Oriented Languages to database records, which significantly reduces the amount of boilerplate code that needs to be written. Despite the advantages of using ORM frameworks, we observe several difficulties in maintaining ORM code (i.e., code that makes use of ORM frameworks) when cooperating with our industrial partner. After conducting studies on other open source systems, we find that such difficulties are common in other Java systems. Our study finds that i) ORM cannot completely encapsulate database accesses in objects or abstract the underlying database technology, thus may cause ORM code changes more scattered; ii) ORM code changes are more frequent than regular code, but there is a lack of tools that help developers verify ORM code at compilation time; iii) we find that changes to ORM code are more commonly due to performance or security reasons; however, traditional static code analyzers need to be extended to capture the peculiarities of ORM code in order to detect such problems. Our study highlights the hidden maintenance costs of using ORM frameworks, and provides some initial insights about potential approaches to help maintain ORM code. Future studies should carefully examine ORM code, especially given the rising use of ORM in modern software systems.","PeriodicalId":6621,"journal":{"name":"2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR)","volume":"120 1","pages":"165-176"},"PeriodicalIF":0.0,"publicationDate":"2016-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87813392","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Grouping Android Tag Synonyms on Stack Overflow","authors":"S. Beyer, M. Pinzger","doi":"10.1145/2901739.2901750","DOIUrl":"https://doi.org/10.1145/2901739.2901750","url":null,"abstract":"On Stack Overflow, more than 38,000 diverse tags are used to classify posts. The Stack Overflow community provides tag synonyms to reduce the number of tags that have the same or similar meaning. In our previous research, we used those synonym pairs to derive a number of strategies to create tag synonyms automatically.In this work, we continue this line of research and present an approach to group tag synonyms to meaningful topics. We represent our synonyms as directed, weighted graphs, and investigate several graph community detection algorithms to build meaningful groups of tags, also called tag communities.We apply our approach to the tags obtained from Android-related Stack Overflow posts and evaluate the resulting tag communities quantitatively with various community metrics. In addition, we evaluate our approach qualitatively through a manual inspection and comparison of a random sample of tag communities. Our results show that we can cluster the Android tags to 2,481 meaningful tag communities. We also show how these tag communities can be used to derive trends of topics of Android-related questions on Stack Overflow.","PeriodicalId":6621,"journal":{"name":"2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR)","volume":"16 1","pages":"430-440"},"PeriodicalIF":0.0,"publicationDate":"2016-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81031262","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On Mining Crowd-Based Speech Documentation","authors":"P. Moslehi, Bram Adams, J. Rilling","doi":"10.1145/2901739.2901771","DOIUrl":"https://doi.org/10.1145/2901739.2901771","url":null,"abstract":"Despite the globalization of software development, relevant documentation of a project, such as requirements and design documents, often still is missing, incomplete or outdated. However, parts of that documentation can be found outside the project, where it is fragmented across hundreds of textual web documents like blog posts, email messages and forum posts, as well as multimedia documents such as screencasts and podcasts. Since dissecting and filtering multimedia information based on its relevancy to a given project is an inherently difficult task, it is necessary to provide an automated approach for mining this crowd-based documentation. In this paper, we are interested in mining the speech part of YouTube screencasts, since this part typically contains the rationale and insights of a screencast. We introduce a methodology that transcribes and analyzes the transcribed text using various Information Extraction (IE) techniques, and present a case study to illustrate the applicability of our mining methodology. In this case study, we extract use case scenarios from WordPress tutorial videos and show how their content can supplement existing documentation. We then evaluate how well existing rankings of video content are able to pinpoint the most relevant videos for a given scenario.","PeriodicalId":6621,"journal":{"name":"2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR)","volume":"6 1","pages":"259-268"},"PeriodicalIF":0.0,"publicationDate":"2016-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90093468","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jin Guo, Mona Rahimi, J. Cleland-Huang, A. Rasin, J. Hayes, Michael Vierhauser
{"title":"Cold-Start Software Analytics","authors":"Jin Guo, Mona Rahimi, J. Cleland-Huang, A. Rasin, J. Hayes, Michael Vierhauser","doi":"10.1145/2901739.2901740","DOIUrl":"https://doi.org/10.1145/2901739.2901740","url":null,"abstract":"Software project artifacts such as source code, requirements, and change logs represent a gold-mine of actionable information. As a result, software analytic solutions have been developed to mine repositories and answer questions such as \"who is the expert?,'' \"which classes are fault prone?,'' or even \"who are the domain experts for these fault-prone classes?'' Analytics often require training and configuring in order to maximize performance within the context of each project. A cold-start problem exists when a function is applied within a project context without first configuring the analytic functions on project-specific data. This scenario exists because of the non-trivial effort necessary to instrument a project environment with candidate tools and algorithms and to empirically evaluate alternate configurations. We address the cold-start problem by comparatively evaluating `best-of-breed' and `profile-driven' solutions, both of which reuse known configurations in new project contexts. We describe and evaluate our approach against 20 project datasets for the three analytic areas of artifact connectivity, fault-prediction, and finding the expert, and show that the best-of-breed approach outperformed the profile-driven approach in all three areas; however, while it delivered acceptable results for artifact connectivity and find the expert, both techniques underperformed for cold-start fault prediction.","PeriodicalId":6621,"journal":{"name":"2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR)","volume":"8 1","pages":"142-153"},"PeriodicalIF":0.0,"publicationDate":"2016-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82877783","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Valerio Cosentino, Javier Luis Cánovas Izquierdo, Jordi Cabot
{"title":"Findings from GitHub: Methods, Datasets and Limitations","authors":"Valerio Cosentino, Javier Luis Cánovas Izquierdo, Jordi Cabot","doi":"10.1145/2901739.2901776","DOIUrl":"https://doi.org/10.1145/2901739.2901776","url":null,"abstract":"GitHub, one of the most popular social coding platforms, is the platform of reference when mining Open Source repositories to learn from past experiences. In the last years, a number of research papers have been published reporting findings based on data mined from GitHub. As the community continues to deepen in its understanding of software engineering thanks to the analysis performed on this platform, we believe it is worthwhile to reflect how research papers have addressed the task of mining GitHub repositories over the last years. In this regard, we present a meta-analysis of 93 research papers which addresses three main dimensions of those papers: i) the empirical methods employed, ii) the datasets they used and iii) the limitations reported. Results of our meta-analysis show some concerns regarding the dataset collection process and size, the low level of replicability, poor sampling techniques, lack of longitudinal studies and scarce variety of methodologies.","PeriodicalId":6621,"journal":{"name":"2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR)","volume":"57 1","pages":"137-141"},"PeriodicalIF":0.0,"publicationDate":"2016-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77126347","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Got Technical Debt? Surfacing Elusive Technical Debt in Issue Trackers","authors":"S. Bellomo, R. Nord, I. Ozkaya, Mary Popeck","doi":"10.1145/2901739.2901754","DOIUrl":"https://doi.org/10.1145/2901739.2901754","url":null,"abstract":"Concretely communicating technical debt and its consequences is of common interest to both researchers and software engineers. In the absence of validated tools and techniques to achieve this goal with repeatable results, developers resort to ad hoc practices. Most commonly they report using issue trackers or their existing backlog management practices to capture and track technical debt. In a manual examination of 1,264 issues from four issue trackers from open source industry and government projects, we identified 109 examples of technical debt. Our study reveals that technical debt and its related concepts have entered the vernacular of developers as they discuss development tasks through issue trackers. Even when issues are not explicitly tagged as technical debt, it is possible to identify technical debt items in these issue trackers using a categorization method we developed. We use our results and data to motivate an improved definition and an approach to explicitly report technical debt in issue trackers.","PeriodicalId":6621,"journal":{"name":"2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR)","volume":"90 1","pages":"327-338"},"PeriodicalIF":0.0,"publicationDate":"2016-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80453649","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Examining Programmer Practices for Locally Handling Exceptions","authors":"Mary Beth Kery, Claire Le Goues, B. Myers","doi":"10.1145/2901739.2903497","DOIUrl":"https://doi.org/10.1145/2901739.2903497","url":null,"abstract":"Many have argued that the current try/catch mechanism for handling exceptions in Java is flawed. A major complaint is that programmers often write minimal and low quality handlers. We used the Boa tool to examine a large number of Java projects on GitHub to provide empirical evidence about how programmers currently deal with exceptions. We found that programmers handle exceptions locally in catch blocks much of the time, rather than propagating by throwing an Exception. Programmers make heavy use of actions like Log, Print, Return, or Throw in catch blocks, and also frequently copy code between handlers. We found bad practices like empty catch blocks or catching Exception are indeed widespread. We discuss evidence that programmers may misjudge risk when catching Exception, and face a tension between handlers that directly address local program statement failure and handlers that consider the program-wide implications of an exception. Some of these issues might be ad-dressed by future tools which autocomplete more complete handlers.","PeriodicalId":6621,"journal":{"name":"2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR)","volume":"42 1","pages":"484-487"},"PeriodicalIF":0.0,"publicationDate":"2016-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73675466","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Daniel Avery, K. Dam, Bastin Tony Roy Savarimuthu, A. Ghose
{"title":"Externalization of Software Behavior by the Mining of Norms","authors":"Daniel Avery, K. Dam, Bastin Tony Roy Savarimuthu, A. Ghose","doi":"10.1145/2901739.2901744","DOIUrl":"https://doi.org/10.1145/2901739.2901744","url":null,"abstract":"Open Source Software Development (OSSD) often suffers from conflicting views and actions due to the perceived flat and open ecology of an open source community. This often manifests itself as a lack of codified knowledge that is easily accessible for community members. How decisions are made and expectations of a software system are often described in detail through the many forms of social communications that take place within a community. These social interactions form norms which are influential in dictating what behaviors are expected in a community and of the system. In this paper, we provide a tool which mines these social interactions (in the form of bug reports) and extract norms of the system, externalizing this information into a codified form that allows others within the community to be aware of without having witnessed the social interactions.","PeriodicalId":6621,"journal":{"name":"2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR)","volume":"20 1","pages":"223-234"},"PeriodicalIF":0.0,"publicationDate":"2016-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74767863","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Daniel Izquierdo-Cortazar, Lars Kurth, Jesus M. Gonzalez-Barahona, Santiago Dueñas, Nelson Sekitoleko
{"title":"Characterization of the Xen Project Code Review Process: an Experience Report","authors":"Daniel Izquierdo-Cortazar, Lars Kurth, Jesus M. Gonzalez-Barahona, Santiago Dueñas, Nelson Sekitoleko","doi":"10.1145/2901739.2901778","DOIUrl":"https://doi.org/10.1145/2901739.2901778","url":null,"abstract":"Many software development projects have introduced manda-tory code review for every change to the code. This meansthat the project needs to devote a significant effort to re-view all proposed changes, and that their merging into thecode base may get considerably delayed. Therefore, all thoseprojects need to understand how code review is working, andthe delays it is causing in time to merge.This is the case in the Xen project, which performs peerreview using mailing lists. During the first half of 2015, somepeople in the project observed a large and sustained increasein the number of messages related to code review, which hadstarted some years before. This observation led to concernson whether the code review process was having some trouble,and too large an impact on the overall development process.Those concerns were addressed with a quantitative study,which is presented in this paper. Based on the informa-tion in code review messages, some metrics were defined toinfer delays imposed by code review. The study producedquantitative data suitable for informed discussion, which theproject is using to understand its code review process, andto take decisions to improve it.","PeriodicalId":6621,"journal":{"name":"2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR)","volume":"242 1","pages":"386-390"},"PeriodicalIF":0.0,"publicationDate":"2016-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75100079","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}