尺寸和复杂性度量的阈值:从缺陷密度角度的案例研究

2016 IEEE International Conference on Software Quality, Reliability and Security (QRS) Pub Date : 2016-10-12 DOI:10.1109/QRS.2016.31

Kazuhiro Yamashita, Changyun Huang, M. Nagappan, Yasutaka Kamei, A. Mockus, A. Hassan, Naoyasu Ubayashi

{"title":"尺寸和复杂性度量的阈值:从缺陷密度角度的案例研究","authors":"Kazuhiro Yamashita, Changyun Huang, M. Nagappan, Yasutaka Kamei, A. Mockus, A. Hassan, Naoyasu Ubayashi","doi":"10.1109/QRS.2016.31","DOIUrl":null,"url":null,"abstract":"Practical guidelines on what code has better quality are in great demand. For example, it is reasonable to expect the most complex code to be buggy. Structuring code into reasonably sized files and classes also appears to be prudent. Many attempts to determine (or declare) risk thresholds for various code metrics have been made. In this paper we want to examine the applicability of such thresholds. Hence, we replicate a recently published technique for calculating metric thresholds to determine high-risk files based on code size (LOC and number of methods), and complexity (cyclomatic complexity and module interface coupling) using a very large set of open and closed source projects written primarily in Java. We relate the threshold-derived risk to (a) the probability that a file would have a defect, and (b) the defect density of the files in the high-risk group. We find that the probability of a file having a defect is higher in the very high-risk group with a few exceptions. This is particularly pronounced when using size thresholds. Surprisingly, the defect density was uniformly lower in the very high-risk group of files. Our results suggest that, as expected, less code is associated with fewer defects. However, the same amount of code in large and complex files was associated with fewer defects than when located in smaller and less complex files. Hence we conclude that risk thresholds for size and complexity metrics have to be used with caution if at all. Our findings have immediate practical implications: the redistribution of Java code into smaller and less complex files may be counterproductive.","PeriodicalId":412973,"journal":{"name":"2016 IEEE International Conference on Software Quality, Reliability and Security (QRS)","volume":"114 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"21","resultStr":"{\"title\":\"Thresholds for Size and Complexity Metrics: A Case Study from the Perspective of Defect Density\",\"authors\":\"Kazuhiro Yamashita, Changyun Huang, M. Nagappan, Yasutaka Kamei, A. Mockus, A. Hassan, Naoyasu Ubayashi\",\"doi\":\"10.1109/QRS.2016.31\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Practical guidelines on what code has better quality are in great demand. For example, it is reasonable to expect the most complex code to be buggy. Structuring code into reasonably sized files and classes also appears to be prudent. Many attempts to determine (or declare) risk thresholds for various code metrics have been made. In this paper we want to examine the applicability of such thresholds. Hence, we replicate a recently published technique for calculating metric thresholds to determine high-risk files based on code size (LOC and number of methods), and complexity (cyclomatic complexity and module interface coupling) using a very large set of open and closed source projects written primarily in Java. We relate the threshold-derived risk to (a) the probability that a file would have a defect, and (b) the defect density of the files in the high-risk group. We find that the probability of a file having a defect is higher in the very high-risk group with a few exceptions. This is particularly pronounced when using size thresholds. Surprisingly, the defect density was uniformly lower in the very high-risk group of files. Our results suggest that, as expected, less code is associated with fewer defects. However, the same amount of code in large and complex files was associated with fewer defects than when located in smaller and less complex files. Hence we conclude that risk thresholds for size and complexity metrics have to be used with caution if at all. Our findings have immediate practical implications: the redistribution of Java code into smaller and less complex files may be counterproductive.\",\"PeriodicalId\":412973,\"journal\":{\"name\":\"2016 IEEE International Conference on Software Quality, Reliability and Security (QRS)\",\"volume\":\"114 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-10-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"21\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 IEEE International Conference on Software Quality, Reliability and Security (QRS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/QRS.2016.31\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE International Conference on Software Quality, Reliability and Security (QRS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/QRS.2016.31","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 21

摘要

关于什么代码质量更好的实用指南是非常需要的。例如，最复杂的代码有bug是合理的。将代码结构化为大小合理的文件和类似乎也是谨慎的做法。已经进行了许多尝试来确定(或声明)各种代码度量的风险阈值。在本文中，我们想检验这些阈值的适用性。因此，我们复制了最近发布的一种计算度量阈值的技术，该技术基于代码大小(LOC和方法数量)和复杂性(圈复杂度和模块接口耦合)来确定高风险文件，使用了大量主要用Java编写的开放和闭源项目。我们将阈值衍生的风险与(a)文件有缺陷的概率，以及(b)高风险组中文件的缺陷密度联系起来。我们发现，除了少数例外，在高危组中，文件有缺陷的概率更高。在使用大小阈值时，这一点尤为明显。令人惊讶的是，在非常高风险的文件组中，缺陷密度一致较低。我们的结果表明，正如预期的那样，更少的代码与更少的缺陷相关联。然而，与位于较小且不太复杂的文件中相比，位于较大且复杂的文件中的相同数量的代码与更少的缺陷相关联。因此，我们得出结论，规模和复杂性度量的风险阈值必须谨慎使用。我们的发现具有直接的实际意义:将Java代码重新分配到更小、更不复杂的文件中可能会适得其反。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Thresholds for Size and Complexity Metrics: A Case Study from the Perspective of Defect Density

Practical guidelines on what code has better quality are in great demand. For example, it is reasonable to expect the most complex code to be buggy. Structuring code into reasonably sized files and classes also appears to be prudent. Many attempts to determine (or declare) risk thresholds for various code metrics have been made. In this paper we want to examine the applicability of such thresholds. Hence, we replicate a recently published technique for calculating metric thresholds to determine high-risk files based on code size (LOC and number of methods), and complexity (cyclomatic complexity and module interface coupling) using a very large set of open and closed source projects written primarily in Java. We relate the threshold-derived risk to (a) the probability that a file would have a defect, and (b) the defect density of the files in the high-risk group. We find that the probability of a file having a defect is higher in the very high-risk group with a few exceptions. This is particularly pronounced when using size thresholds. Surprisingly, the defect density was uniformly lower in the very high-risk group of files. Our results suggest that, as expected, less code is associated with fewer defects. However, the same amount of code in large and complex files was associated with fewer defects than when located in smaller and less complex files. Hence we conclude that risk thresholds for size and complexity metrics have to be used with caution if at all. Our findings have immediate practical implications: the redistribution of Java code into smaller and less complex files may be counterproductive.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2016 IEEE International Conference on Software Quality, Reliability and Security (QRS)

自引率

0.00%

发文量