2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR)最新文献_第4页

A Large-Scale Study on Repetitiveness, Containment, and Composability of Routines in Open-Source Projects 开源项目中例程的重复性、包容性和可组合性的大规模研究

2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR) Pub Date : 2016-05-14 DOI: 10.1145/2901739.2901759

A. Nguyen, H. Nguyen, T. Nguyen

{"title":"A Large-Scale Study on Repetitiveness, Containment, and Composability of Routines in Open-Source Projects","authors":"A. Nguyen, H. Nguyen, T. Nguyen","doi":"10.1145/2901739.2901759","DOIUrl":"https://doi.org/10.1145/2901739.2901759","url":null,"abstract":"Source code in software systems has been shown to have a good degree of repetitiveness at the lexical, syntactical, and API usage levels. This paper presents a large-scale study on the repetitiveness, containment, and composability of source code at the semantic level. We collected a large dataset consisting of 9,224 Java projects with 2.79M class files, 17.54M methods with 187M SLOCs. For each method in a project, we build the program dependency graph (PDG) to represent a routine, and compare PDGs with one another as well as the subgraphs within them. We found that within a project, 12.1% of the routines are repeated, and most of them repeat from 2–7 times. As entirety, the routines are quite project-specific with only 3.3% of them exactly repeating in 1–4 other projects with at most 8 times. We also found that 26.1% and 7.27% of the routines are contained in other routine(s), i.e., implemented as part of other routine(s) elsewhere within a project and in other projects, respectively. Except for trivial routines, their repetitiveness and containment is independent of their complexity. Defining a subroutine via a per-variable slicing subgraph in a PDG, we found that 14.3% of all routines have all of their subroutines repeated. A high percentage of subroutines in a routine can be found/reused elsewhere. We collected 8,764,971 unique subroutines (with 323,564 unique JDK subroutines) as basic units for code searching/synthesis. We also provide practical implications of our findings to automated tools.","PeriodicalId":6621,"journal":{"name":"2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR)","volume":"21 1","pages":"362-373"},"PeriodicalIF":0.0,"publicationDate":"2016-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84252999","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 16

Sentiment Analysis in Tickets for IT Support IT支持票务中的情感分析

2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR) Pub Date : 2016-05-14 DOI: 10.1145/2901739.2901781

Cássio Castaldi Araújo Blaz, Karin Becker

引用次数: 36

Feature Toggles: Practitioner Practices and a Case Study 功能切换:从业者实践和案例研究

2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR) Pub Date : 2016-05-14 DOI: 10.1145/2901739.2901745

Md Tajmilur Rahman, Louis-Philippe Querel, Peter C. Rigby, Bram Adams

{"title":"Feature Toggles: Practitioner Practices and a Case Study","authors":"Md Tajmilur Rahman, Louis-Philippe Querel, Peter C. Rigby, Bram Adams","doi":"10.1145/2901739.2901745","DOIUrl":"https://doi.org/10.1145/2901739.2901745","url":null,"abstract":"Continuous delivery and rapid releases have led to innovative techniques for integrating new features and bug fixes into a new release faster. To reduce the probability of integration conflicts, major software companies, including Google, Facebook and Netflix, use feature toggles to incrementally integrate and test new features instead of integrating the feature only when it’s ready. Even after release, feature toggles allow operations managers to quickly disable a new feature that is behaving erratically or to enable certain features only for certain groups of customers. Since literature on feature toggles is surprisingly slim, this paper tries to understand the prevalence and impact of feature toggles. First, we conducted a quantitative analysis of feature toggle usage across 39 releases of Google Chrome (spanning five years of release history). Then, we studied the technical debt involved with feature toggles by mining a spreadsheet used by Google developers for feature toggle maintenance. Finally, we performed thematic analysis of videos and blog posts of release engineers at major software companies in order to further understand the strengths and drawbacks of feature toggles in practice. We also validated our findings with four Google developers. We find that toggles can reconcile rapid releases with long-term feature development and allow flexible control over which features to deploy. However they also introduce technical debt and additional maintenance for developers.","PeriodicalId":6621,"journal":{"name":"2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR)","volume":"74 1","pages":"201-211"},"PeriodicalIF":0.0,"publicationDate":"2016-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90271724","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 60

Using Dynamic and Contextual Features to Predict Issue Lifetime in GitHub Projects 使用动态和上下文特性来预测GitHub项目中的问题生命周期

2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR) Pub Date : 2016-05-14 DOI: 10.1145/2901739.2901751

R. Kikas, M. Dumas, Dietmar Pfahl

{"title":"Using Dynamic and Contextual Features to Predict Issue Lifetime in GitHub Projects","authors":"R. Kikas, M. Dumas, Dietmar Pfahl","doi":"10.1145/2901739.2901751","DOIUrl":"https://doi.org/10.1145/2901739.2901751","url":null,"abstract":"Methods for predicting issue lifetime can help software project managers to prioritize issues and allocate resources accordingly. Previous studies on issue lifetime prediction have focused on models built from static features, meaning features calculated at one snapshot of the issue's lifetime based on data associated to the issue itself. However, during its lifetime, an issue typically receives comments from various stakeholders, which may carry valuable insights into its perceived priority and difficulty and may thus be exploited to update lifetime predictions. Moreover, the lifetime of an issue depends not only on characteristics of the issue itself, but also on the state of the project as a whole. Hence, issue lifetime prediction may benefit from taking into account features capturing the issue's context (contextual features). In this work, we analyze issues from more than 4000 GitHub projects and build models to predict, at different points in an issue's lifetime, whether or not the issue will close within a given calendric period, by combining static, dynamic and contextual features. The results show that dynamic and contextual features complement the predictive power of static ones, particularly for long-term predictions.","PeriodicalId":6621,"journal":{"name":"2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR)","volume":"118 1","pages":"291-302"},"PeriodicalIF":0.0,"publicationDate":"2016-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79522980","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 72

Mining Duplicate Questions of Stack Overflow 挖掘堆栈溢出的重复问题

2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR) Pub Date : 2016-05-14 DOI: 10.1145/2901739.2901770

Md Ahasanuzzaman, M. Asaduzzaman, C. Roy, Kevin A. Schneider

{"title":"Mining Duplicate Questions of Stack Overflow","authors":"Md Ahasanuzzaman, M. Asaduzzaman, C. Roy, Kevin A. Schneider","doi":"10.1145/2901739.2901770","DOIUrl":"https://doi.org/10.1145/2901739.2901770","url":null,"abstract":"Stack Overflow is a popular question answering site that is focused on programming problems. Despite efforts to prevent asking questions that have already been answered, the site contains duplicate questions. This may cause developers to unnecessarily wait for a question to be answered when it has already been asked and answered. The site currently depends on its moderators and users with high reputation to manually mark those questions as duplicates, which not only results in delayed responses but also requires additional efforts. In this paper, we first perform a manual investigation to understand why users submit duplicate questions in Stack Overflow. Based on our manual investigation we propose a classification technique that uses a number of carefully chosen features to identify duplicate questions. Evaluation using a large number of questions shows that our technique can detect duplicate questions with reasonable accuracy. We also compare our technique with DupPredictor, a state-of-the-art technique for detecting duplicate questions, and we found that our proposed technique has a better recall rate than that technique.","PeriodicalId":6621,"journal":{"name":"2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR)","volume":"33 1","pages":"402-412"},"PeriodicalIF":0.0,"publicationDate":"2016-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79709463","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 107

AndroZoo: Collecting Millions of Android Apps for the Research Community AndroZoo:为研究社区收集数百万Android应用程序

2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR) Pub Date : 2016-05-14 DOI: 10.1145/2901739.2903508

Kevin Allix, Tegawendé F. Bissyandé, Jacques Klein, Yves Le Traon

引用次数: 696

Understanding the Exception Handling Strategies of Java Libraries: An Empirical Study 理解Java库的异常处理策略:一个实证研究

2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR) Pub Date : 2016-05-14 DOI: 10.1145/2901739.2901757

Demóstenes Sena, Roberta Coelho, U. Kulesza, R. Bonifácio

{"title":"Understanding the Exception Handling Strategies of Java Libraries: An Empirical Study","authors":"Demóstenes Sena, Roberta Coelho, U. Kulesza, R. Bonifácio","doi":"10.1145/2901739.2901757","DOIUrl":"https://doi.org/10.1145/2901739.2901757","url":null,"abstract":"This paper presents an empirical study whose goal was to investigate the exception handling strategies adopted by Java libraries and their potential impact on the client applications. In this study, exception flow analysis was used in combination with manual inspections in order: (i) to characterize the exception handling strategies of existing Java libraries from the perspective of their users; and (ii) to identify exception handling anti-patterns. We extended an existing static analysis tool to reason about exception flows and handler actions of 656 Java libraries selected from 145 categories in the Maven Central Repository. The study findings suggest a current trend of a high number of undocumented API runtime exceptions (i.e., @throws in Javadoc) and Unintended Handler problem. Moreover, we could also identify a considerable number of occurrences of exception handling anti-patterns (e.g. Catch and Ignore). Finally, we have also analyzed 647 bug issues of the 7 most popular libraries and identified that 20.71% of the reports are defects related to the problems of the exception strategies and anti-patterns identified in our study. The results of this study point to the need of tools to better understand and document the exception handling behavior of libraries.","PeriodicalId":6621,"journal":{"name":"2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR)","volume":"16 1","pages":"212-222"},"PeriodicalIF":0.0,"publicationDate":"2016-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75372666","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 31

Mining the Modern Code Review Repositories: A Dataset of People, Process and Product 挖掘现代代码评审存储库:人、过程和产品的数据集

2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR) Pub Date : 2016-05-14 DOI: 10.1145/2901739.2903504

Xin Yang, R. Kula, Norihiro Yoshida, Hajimu Iida

引用次数: 51

The Unreasonable Effectiveness of Traditional Information Retrieval in Crash Report Deduplication 崩溃报告重复数据删除中传统信息检索的有效性不合理

2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR) Pub Date : 2016-05-14 DOI: 10.1145/2901739.2901766

Hazel Victoria Campbell, E. Santos, Abram Hindle

{"title":"The Unreasonable Effectiveness of Traditional Information Retrieval in Crash Report Deduplication","authors":"Hazel Victoria Campbell, E. Santos, Abram Hindle","doi":"10.1145/2901739.2901766","DOIUrl":"https://doi.org/10.1145/2901739.2901766","url":null,"abstract":"Organizations like Mozilla, Microsoft, and Apple are floodedwith thousands of automated crash reports per day. Although crash reports contain valuable information for debugging, there are often too many for developers to examineindividually. Therefore, in industry, crash reports are oftenautomatically grouped together in buckets. Ubuntu’s repository contains crashes from hundreds of software systemsavailable with Ubuntu. A variety of crash report bucketing methods are evaluated using data collected by Ubuntu’sApport automated crash reporting system. The trade-off between precision and recall of numerous scalable crash deduplication techniques is explored. A set of criteria that acrash deduplication method must meet is presented and several methods that meet these criteria are evaluated on anew dataset. The evaluations presented in this paper showthat using off-the-shelf information retrieval techniques, thatwere not designed to be used with crash reports, outperformother techniques which are specifically designed for the taskof crash bucketing at realistic industrial scales. This researchindicates that automated crash bucketing still has a lot ofroom for improvement, especially in terms of identifier tokenization.","PeriodicalId":6621,"journal":{"name":"2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR)","volume":"14 1","pages":"269-280"},"PeriodicalIF":0.0,"publicationDate":"2016-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79954975","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 24

Analyzing Developer Sentiment in Commit Logs 分析提交日志中的开发人员情绪

2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR) Pub Date : 2016-05-14 DOI: 10.1145/2901739.2903501

Vinayak Sinha, A. Lazar, Bonita Sharif

引用次数: 90