Confusion Detection in Code Reviews

2017 IEEE International Conference on Software Maintenance and Evolution (ICSME) Pub Date : 2017-11-08 DOI:10.1109/ICSME.2017.40

Felipe Ebert, F. C. Filho, Nicole Novielli, Alexander Serebrenik

{"title":"Confusion Detection in Code Reviews","authors":"Felipe Ebert, F. C. Filho, Nicole Novielli, Alexander Serebrenik","doi":"10.1109/ICSME.2017.40","DOIUrl":null,"url":null,"abstract":"Code reviews are an important mechanism for assuring quality of source code changes. Reviewers can either add general comments pertaining to the entire change or pinpoint concerns or shortcomings about a specific part of the change using inline comments. Recent studies show that reviewers often do not understand the change being reviewed and its context.Our ultimate goal is to identify the factors that confuse code reviewers and understand how confusion impacts the efficiency and effectiveness of code review(er)s. As the first step towards this goal we focus on the identification of confusion in developers' comments. Based on an existing theoretical framework categorizing expressions of confusion, we manually classify 800 comments from code reviews of the Android project. We observe that confusion can be reasonably well-identified by humans: raters achieve moderate agreement (Fleiss' kappa 0.59 for the general comments and 0.49 for the inline ones). Then, for each kind of comment we build a series of automatic classifiers that, depending on the goals of the further analysis, can be trained to achieve high precision (0.875 for the general comments and 0.615 for the inline ones), high recall (0.944 for the general comments and 0.988 for the inline ones), or substantial precision and recall (0.696 and 0.542 for the general comments and 0.434 and 0.583 for the inline ones, respectively). These results motivate further research on the impact of confusion on the code review process. Moreover, other researchers can employ the proposed classifiers to analyze confusion in other contexts where software development-related discussions occur, such as mailing lists.","PeriodicalId":147888,"journal":{"name":"2017 IEEE International Conference on Software Maintenance and Evolution (ICSME)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"37","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE International Conference on Software Maintenance and Evolution (ICSME)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSME.2017.40","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 37

Abstract

Code reviews are an important mechanism for assuring quality of source code changes. Reviewers can either add general comments pertaining to the entire change or pinpoint concerns or shortcomings about a specific part of the change using inline comments. Recent studies show that reviewers often do not understand the change being reviewed and its context.Our ultimate goal is to identify the factors that confuse code reviewers and understand how confusion impacts the efficiency and effectiveness of code review(er)s. As the first step towards this goal we focus on the identification of confusion in developers' comments. Based on an existing theoretical framework categorizing expressions of confusion, we manually classify 800 comments from code reviews of the Android project. We observe that confusion can be reasonably well-identified by humans: raters achieve moderate agreement (Fleiss' kappa 0.59 for the general comments and 0.49 for the inline ones). Then, for each kind of comment we build a series of automatic classifiers that, depending on the goals of the further analysis, can be trained to achieve high precision (0.875 for the general comments and 0.615 for the inline ones), high recall (0.944 for the general comments and 0.988 for the inline ones), or substantial precision and recall (0.696 and 0.542 for the general comments and 0.434 and 0.583 for the inline ones, respectively). These results motivate further research on the impact of confusion on the code review process. Moreover, other researchers can employ the proposed classifiers to analyze confusion in other contexts where software development-related discussions occur, such as mailing lists.

查看原文本刊更多论文

代码审查中的混淆检测

代码审查是保证源代码更改质量的重要机制。审稿人既可以添加与整个变更相关的一般注释，也可以使用内联注释确定变更特定部分的关注点或缺点。最近的研究表明，审稿人经常不了解被审评的变更及其背景。我们的最终目标是确定混淆代码审查者的因素，并理解混淆是如何影响代码审查者的效率和有效性的。作为实现这一目标的第一步，我们将重点放在识别开发人员评论中的混淆。基于现有的混淆表达分类理论框架，我们手动对来自Android项目代码评审的800条评论进行分类。我们观察到，混淆可以被人类很好地识别:评分者达到适度的一致性(Fleiss的kappa为一般评论0.59，内联评论0.49)。然后，对于每一种评论，我们构建了一系列自动分类器，根据进一步分析的目标，可以训练它们达到高精度(一般评论为0.875，内联评论为0.615)，高召回率(一般评论为0.944，内联评论为0.988)，或大量精度和召回率(一般评论分别为0.696和0.542，内联评论分别为0.434和0.583)。这些结果激发了对混淆对代码审查过程影响的进一步研究。此外，其他研究人员可以使用所提出的分类器来分析与软件开发相关的讨论发生的其他上下文中的混淆，例如邮件列表。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2017 IEEE International Conference on Software Maintenance and Evolution (ICSME)

自引率

0.00%

发文量