基于度量的代码气味检测的启发式方法和机器学习方法的比较

2019 IEEE/ACM 27th International Conference on Program Comprehension (ICPC) Pub Date : 2019-05-01 DOI:10.1109/ICPC.2019.00023

Fabiano Pecorelli, Fabio Palomba, D. D. Nucci, A. D. Lucia

{"title":"基于度量的代码气味检测的启发式方法和机器学习方法的比较","authors":"Fabiano Pecorelli, Fabio Palomba, D. D. Nucci, A. D. Lucia","doi":"10.1109/ICPC.2019.00023","DOIUrl":null,"url":null,"abstract":"Code smells represent poor implementation choices performed by developers when enhancing source code. Their negative impact on source code maintainability and comprehensibility has been widely shown in the past and several techniques to automatically detect them have been devised. Most of these techniques are based on heuristics, namely they compute a set of code metrics and combine them by creating detection rules; while they have a reasonable accuracy, a recent trend is represented by the use of machine learning where code metrics are used as predictors of the smelliness of code artefacts. Despite the recent advances in the field, there is still a noticeable lack of knowledge of whether machine learning can actually be more accurate than traditional heuristic-based approaches. To fill this gap, in this paper we propose a large-scale study to empirically compare the performance of heuristic-based and machine-learning-based techniques for metric-based code smell detection. We consider five code smell types and compare machine learning models with DECOR, a state-of-the-art heuristic-based approach. Key findings emphasize the need of further research aimed at improving the effectiveness of both machine learning and heuristic approaches for code smell detection: while DECOR generally achieves better performance than a machine learning baseline, its precision is still too low to make it usable in practice.","PeriodicalId":6853,"journal":{"name":"2019 IEEE/ACM 27th International Conference on Program Comprehension (ICPC)","volume":"99 1","pages":"93-104"},"PeriodicalIF":0.0000,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"65","resultStr":"{\"title\":\"Comparing Heuristic and Machine Learning Approaches for Metric-Based Code Smell Detection\",\"authors\":\"Fabiano Pecorelli, Fabio Palomba, D. D. Nucci, A. D. Lucia\",\"doi\":\"10.1109/ICPC.2019.00023\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Code smells represent poor implementation choices performed by developers when enhancing source code. Their negative impact on source code maintainability and comprehensibility has been widely shown in the past and several techniques to automatically detect them have been devised. Most of these techniques are based on heuristics, namely they compute a set of code metrics and combine them by creating detection rules; while they have a reasonable accuracy, a recent trend is represented by the use of machine learning where code metrics are used as predictors of the smelliness of code artefacts. Despite the recent advances in the field, there is still a noticeable lack of knowledge of whether machine learning can actually be more accurate than traditional heuristic-based approaches. To fill this gap, in this paper we propose a large-scale study to empirically compare the performance of heuristic-based and machine-learning-based techniques for metric-based code smell detection. We consider five code smell types and compare machine learning models with DECOR, a state-of-the-art heuristic-based approach. Key findings emphasize the need of further research aimed at improving the effectiveness of both machine learning and heuristic approaches for code smell detection: while DECOR generally achieves better performance than a machine learning baseline, its precision is still too low to make it usable in practice.\",\"PeriodicalId\":6853,\"journal\":{\"name\":\"2019 IEEE/ACM 27th International Conference on Program Comprehension (ICPC)\",\"volume\":\"99 1\",\"pages\":\"93-104\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"65\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 IEEE/ACM 27th International Conference on Program Comprehension (ICPC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICPC.2019.00023\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE/ACM 27th International Conference on Program Comprehension (ICPC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICPC.2019.00023","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 65

摘要

代码气味表示开发人员在增强源代码时执行的不良实现选择。它们对源代码的可维护性和可理解性的负面影响在过去已经被广泛地展示出来，并且已经设计了一些自动检测它们的技术。这些技术大多基于启发式，即它们计算一组代码度量，并通过创建检测规则将它们组合起来;虽然它们具有合理的准确性，但最近的趋势是机器学习的使用，其中代码度量被用作代码工件气味的预测因子。尽管该领域最近取得了进展，但人们仍然明显缺乏关于机器学习是否真的比传统的启发式方法更准确的知识。为了填补这一空白，在本文中，我们提出了一项大规模的研究，以经验比较基于启发式和基于机器学习的基于度量的代码气味检测技术的性能。我们考虑了五种代码气味类型，并将机器学习模型与DECOR(一种最先进的基于启发式的方法)进行比较。关键发现强调了进一步研究的必要性，旨在提高机器学习和启发式方法在代码气味检测方面的有效性:虽然DECOR通常比机器学习基线实现更好的性能，但其精度仍然太低，无法在实践中使用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Comparing Heuristic and Machine Learning Approaches for Metric-Based Code Smell Detection

Code smells represent poor implementation choices performed by developers when enhancing source code. Their negative impact on source code maintainability and comprehensibility has been widely shown in the past and several techniques to automatically detect them have been devised. Most of these techniques are based on heuristics, namely they compute a set of code metrics and combine them by creating detection rules; while they have a reasonable accuracy, a recent trend is represented by the use of machine learning where code metrics are used as predictors of the smelliness of code artefacts. Despite the recent advances in the field, there is still a noticeable lack of knowledge of whether machine learning can actually be more accurate than traditional heuristic-based approaches. To fill this gap, in this paper we propose a large-scale study to empirically compare the performance of heuristic-based and machine-learning-based techniques for metric-based code smell detection. We consider five code smell types and compare machine learning models with DECOR, a state-of-the-art heuristic-based approach. Key findings emphasize the need of further research aimed at improving the effectiveness of both machine learning and heuristic approaches for code smell detection: while DECOR generally achieves better performance than a machine learning baseline, its precision is still too low to make it usable in practice.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2019 IEEE/ACM 27th International Conference on Program Comprehension (ICPC)

自引率

0.00%

发文量