/icomment: bugs or bad comments?/

Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles Pub Date : 2007-10-14 DOI:10.1145/1294261.1294276

Lin Tan, Ding Yuan, G. Krishna, Yuanyuan Zhou

{"title":"/*icomment: bugs or bad comments?*/","authors":"Lin Tan, Ding Yuan, G. Krishna, Yuanyuan Zhou","doi":"10.1145/1294261.1294276","DOIUrl":null,"url":null,"abstract":"Commenting source code has long been a common practice in software development. Compared to source code, comments are more direct, descriptive and easy-to-understand. Comments and sourcecode provide relatively redundant and independent information regarding a program's semantic behavior. As software evolves, they can easily grow out-of-sync, indicating two problems: (1) bugs -the source code does not follow the assumptions and requirements specified by correct program comments; (2) bad comments - comments that are inconsistent with correct code, which can confuse and mislead programmers to introduce bugs in subsequent versions. Unfortunately, as most comments are written in natural language, no solution has been proposed to automatically analyze commentsand detect inconsistencies between comments and source code. This paper takes the first step in automatically analyzing commentswritten in natural language to extract implicit program rulesand use these rules to automatically detect inconsistencies between comments and source code, indicating either bugs or bad comments. Our solution, iComment, combines Natural Language Processing(NLP), Machine Learning, Statistics and Program Analysis techniques to achieve these goals. We evaluate iComment on four large code bases: Linux, Mozilla, Wine and Apache. Our experimental results show that iComment automatically extracts 1832 rules from comments with 90.8-100% accuracy and detects 60 comment-code inconsistencies, 33 newbugs and 27 bad comments, in the latest versions of the four programs. Nineteen of them (12 bugs and 7 bad comments) have already been confirmed by the corresponding developers while the others are currently being analyzed by the developers.","PeriodicalId":20672,"journal":{"name":"Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles","volume":"20 1","pages":"145-158"},"PeriodicalIF":0.0000,"publicationDate":"2007-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"278","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1294261.1294276","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 278

Abstract

Commenting source code has long been a common practice in software development. Compared to source code, comments are more direct, descriptive and easy-to-understand. Comments and sourcecode provide relatively redundant and independent information regarding a program's semantic behavior. As software evolves, they can easily grow out-of-sync, indicating two problems: (1) bugs -the source code does not follow the assumptions and requirements specified by correct program comments; (2) bad comments - comments that are inconsistent with correct code, which can confuse and mislead programmers to introduce bugs in subsequent versions. Unfortunately, as most comments are written in natural language, no solution has been proposed to automatically analyze commentsand detect inconsistencies between comments and source code. This paper takes the first step in automatically analyzing commentswritten in natural language to extract implicit program rulesand use these rules to automatically detect inconsistencies between comments and source code, indicating either bugs or bad comments. Our solution, iComment, combines Natural Language Processing(NLP), Machine Learning, Statistics and Program Analysis techniques to achieve these goals. We evaluate iComment on four large code bases: Linux, Mozilla, Wine and Apache. Our experimental results show that iComment automatically extracts 1832 rules from comments with 90.8-100% accuracy and detects 60 comment-code inconsistencies, 33 newbugs and 27 bad comments, in the latest versions of the four programs. Nineteen of them (12 bugs and 7 bad comments) have already been confirmed by the corresponding developers while the others are currently being analyzed by the developers.

查看原文本刊更多论文

/* iccomment: bug还是坏评论?* /

在软件开发中，注释源代码一直是一种常见的做法。与源代码相比，注释更加直接、描述性和易于理解。注释和源代码提供了有关程序语义行为的相对冗余和独立的信息。随着软件的发展，它们很容易变得不同步，这表明了两个问题:(1)bug——源代码没有遵循正确的程序注释所指定的假设和要求;(2)不良注释——与正确代码不一致的注释，这些注释会混淆并误导程序员在后续版本中引入错误。不幸的是，由于大多数注释都是用自然语言编写的，因此没有提出任何解决方案来自动分析注释并检测注释与源代码之间的不一致。本文在自动分析用自然语言编写的注释方面迈出了第一步，提取隐式程序规则，并使用这些规则自动检测注释和源代码之间的不一致，指出错误或坏注释。我们的解决方案iComment结合了自然语言处理(NLP)、机器学习、统计和程序分析技术来实现这些目标。我们在四个大型代码库上评估iccomment: Linux、Mozilla、Wine和Apache。实验结果表明，iComment在四个程序的最新版本中自动从注释中提取1832条规则，准确率为90.8-100%，并检测出60个注释代码不一致，33个新bug和27个坏评论。其中19个(12个bug和7个坏评论)已经被相应的开发人员确认，而其他的正在由开发人员进行分析。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles

自引率

0.00%

发文量