Leveraging Reviewer Experience in Code Review Comment Generation

arXiv - CS - Software Engineering Pub Date : 2024-09-17 DOI:arxiv-2409.10959

Hong Yi Lin, Patanamon Thongtanunam, Christoph Treude, Michael W. Godfrey, Chunhua Liu, Wachiraphan Charoenwet

{"title":"Leveraging Reviewer Experience in Code Review Comment Generation","authors":"Hong Yi Lin, Patanamon Thongtanunam, Christoph Treude, Michael W. Godfrey, Chunhua Liu, Wachiraphan Charoenwet","doi":"arxiv-2409.10959","DOIUrl":null,"url":null,"abstract":"Modern code review is a ubiquitous software quality assurance process aimed\nat identifying potential issues within newly written code. Despite its\neffectiveness, the process demands large amounts of effort from the human\nreviewers involved. To help alleviate this workload, researchers have trained\ndeep learning models to imitate human reviewers in providing natural language\ncode reviews. Formally, this task is known as code review comment generation.\nPrior work has demonstrated improvements in this task by leveraging machine\nlearning techniques and neural models, such as transfer learning and the\ntransformer architecture. However, the quality of the model generated reviews\nremain sub-optimal due to the quality of the open-source code review data used\nin model training. This is in part due to the data obtained from open-source\nprojects where code reviews are conducted in a public forum, and reviewers\npossess varying levels of software development experience, potentially\naffecting the quality of their feedback. To accommodate for this variation, we\npropose a suite of experience-aware training methods that utilise the\nreviewers' past authoring and reviewing experiences as signals for review\nquality. Specifically, we propose experience-aware loss functions (ELF), which\nuse the reviewers' authoring and reviewing ownership of a project as weights in\nthe model's loss function. Through this method, experienced reviewers' code\nreviews yield larger influence over the model's behaviour. Compared to the SOTA\nmodel, ELF was able to generate higher quality reviews in terms of accuracy,\ninformativeness, and comment types generated. The key contribution of this work\nis the demonstration of how traditional software engineering concepts such as\nreviewer experience can be integrated into the design of AI-based automated\ncode review models.","PeriodicalId":501278,"journal":{"name":"arXiv - CS - Software Engineering","volume":"18 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Software Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.10959","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Modern code review is a ubiquitous software quality assurance process aimed at identifying potential issues within newly written code. Despite its effectiveness, the process demands large amounts of effort from the human reviewers involved. To help alleviate this workload, researchers have trained deep learning models to imitate human reviewers in providing natural language code reviews. Formally, this task is known as code review comment generation. Prior work has demonstrated improvements in this task by leveraging machine learning techniques and neural models, such as transfer learning and the transformer architecture. However, the quality of the model generated reviews remain sub-optimal due to the quality of the open-source code review data used in model training. This is in part due to the data obtained from open-source projects where code reviews are conducted in a public forum, and reviewers possess varying levels of software development experience, potentially affecting the quality of their feedback. To accommodate for this variation, we propose a suite of experience-aware training methods that utilise the reviewers' past authoring and reviewing experiences as signals for review quality. Specifically, we propose experience-aware loss functions (ELF), which use the reviewers' authoring and reviewing ownership of a project as weights in the model's loss function. Through this method, experienced reviewers' code reviews yield larger influence over the model's behaviour. Compared to the SOTA model, ELF was able to generate higher quality reviews in terms of accuracy, informativeness, and comment types generated. The key contribution of this work is the demonstration of how traditional software engineering concepts such as reviewer experience can be integrated into the design of AI-based automated code review models.

查看原文本刊更多论文

在代码审查注释生成中利用审查员经验

现代代码审查是一种无处不在的软件质量保证流程，旨在识别新编写代码中的潜在问题。尽管效果显著，但这一过程要求相关的人工审核人员付出大量精力。为了帮助减轻这一工作量，研究人员训练了深度学习模型来模仿人类审查员提供自然语言代码审查。先前的工作已经证明，利用机器学习技术和神经模型（如迁移学习和变换器架构）可以改进这项任务。然而，由于模型训练中使用的开源代码评论数据的质量问题，模型生成的评论质量仍未达到最佳。这部分是由于从开源项目中获取的数据是在公共论坛上进行代码审查的，而审查者拥有不同程度的软件开发经验，这可能会影响他们的反馈质量。为了适应这种差异，我们提出了一套经验感知训练方法，利用审阅者过去的编写和审阅经验作为审阅质量的信号。具体来说，我们提出了经验感知损失函数（ELF），它将审稿人在项目中的创作和审稿所有权作为模型损失函数的权重。通过这种方法，经验丰富的审稿人的代码评审对模型行为的影响更大。与 SOTA 模型相比，ELF 能够在准确性、信息量和评论类型方面生成更高质量的评论。这项工作的主要贡献在于展示了如何将审阅者经验等传统软件工程概念集成到基于人工智能的自动代码审阅模型的设计中。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

arXiv - CS - Software Engineering

自引率

0.00%

发文量