我们在识别自我承认的技术债务方面取得了多大的进展?综合实证研究

ACM Transactions on Software Engineering and Methodology (TOSEM) Pub Date : 2021-07-23 DOI:10.1145/3447247

Zhaoqiang Guo, Shiran Liu, Jinping Liu, Yanhui Li, Lin Chen, Hongmin Lu, Yuming Zhou

{"title":"我们在识别自我承认的技术债务方面取得了多大的进展?综合实证研究","authors":"Zhaoqiang Guo, Shiran Liu, Jinping Liu, Yanhui Li, Lin Chen, Hongmin Lu, Yuming Zhou","doi":"10.1145/3447247","DOIUrl":null,"url":null,"abstract":"Background. Self-admitted technical debt (SATD) is a special kind of technical debt that is intentionally introduced and remarked by code comments. Those technical debts reduce the quality of software and increase the cost of subsequent software maintenance. Therefore, it is necessary to find out and resolve these debts in time. Recently, many automatic approaches have been proposed to identify SATD. Problem. Popular IDEs support a number of predefined task annotation tags for indicating SATD in comments, which have been used in many projects. However, such clear prior knowledge is neglected by existing SATD identification approaches when identifying SATD. Objective. We aim to investigate how far we have really progressed in the field of SATD identification by comparing existing approaches with a simple approach that leverages the predefined task tags to identify SATD. Method. We first propose a simple heuristic approach that fuzzily Matches task Annotation Tags (MAT) in comments to identify SATD. In nature, MAT is an unsupervised approach, which does not need any data to train a prediction model and has a good understandability. Then, we examine the real progress in SATD identification by comparing MAT against existing approaches. Result. The experimental results reveal that: (1) MAT has a similar or even superior performance for SATD identification compared with existing approaches, regardless of whether non-effort-aware or effort-aware evaluation indicators are considered; (2) the SATDs (or non-SATDs) correctly identified by existing approaches are highly overlapped with those identified by MAT; and (3) supervised approaches misclassify many SATDs marked with task tags as non-SATDs, which can be easily corrected by their combinations with MAT. Conclusion. It appears that the problem of SATD identification has been (unintentionally) complicated by our community, i.e., the real progress in SATD comments identification is not being achieved as it might have been envisaged. We hence suggest that, when many task tags are used in the comments of a target project, future SATD identification studies should use MAT as an easy-to-implement baseline to demonstrate the usefulness of any newly proposed approach.","PeriodicalId":7398,"journal":{"name":"ACM Transactions on Software Engineering and Methodology (TOSEM)","volume":"71 1","pages":"1 - 56"},"PeriodicalIF":0.0000,"publicationDate":"2021-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"19","resultStr":"{\"title\":\"How Far Have We Progressed in Identifying Self-admitted Technical Debts? A Comprehensive Empirical Study\",\"authors\":\"Zhaoqiang Guo, Shiran Liu, Jinping Liu, Yanhui Li, Lin Chen, Hongmin Lu, Yuming Zhou\",\"doi\":\"10.1145/3447247\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background. Self-admitted technical debt (SATD) is a special kind of technical debt that is intentionally introduced and remarked by code comments. Those technical debts reduce the quality of software and increase the cost of subsequent software maintenance. Therefore, it is necessary to find out and resolve these debts in time. Recently, many automatic approaches have been proposed to identify SATD. Problem. Popular IDEs support a number of predefined task annotation tags for indicating SATD in comments, which have been used in many projects. However, such clear prior knowledge is neglected by existing SATD identification approaches when identifying SATD. Objective. We aim to investigate how far we have really progressed in the field of SATD identification by comparing existing approaches with a simple approach that leverages the predefined task tags to identify SATD. Method. We first propose a simple heuristic approach that fuzzily Matches task Annotation Tags (MAT) in comments to identify SATD. In nature, MAT is an unsupervised approach, which does not need any data to train a prediction model and has a good understandability. Then, we examine the real progress in SATD identification by comparing MAT against existing approaches. Result. The experimental results reveal that: (1) MAT has a similar or even superior performance for SATD identification compared with existing approaches, regardless of whether non-effort-aware or effort-aware evaluation indicators are considered; (2) the SATDs (or non-SATDs) correctly identified by existing approaches are highly overlapped with those identified by MAT; and (3) supervised approaches misclassify many SATDs marked with task tags as non-SATDs, which can be easily corrected by their combinations with MAT. Conclusion. It appears that the problem of SATD identification has been (unintentionally) complicated by our community, i.e., the real progress in SATD comments identification is not being achieved as it might have been envisaged. We hence suggest that, when many task tags are used in the comments of a target project, future SATD identification studies should use MAT as an easy-to-implement baseline to demonstrate the usefulness of any newly proposed approach.\",\"PeriodicalId\":7398,\"journal\":{\"name\":\"ACM Transactions on Software Engineering and Methodology (TOSEM)\",\"volume\":\"71 1\",\"pages\":\"1 - 56\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-07-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"19\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Transactions on Software Engineering and Methodology (TOSEM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3447247\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Software Engineering and Methodology (TOSEM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3447247","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 19

摘要

背景。自承认技术债(SATD)是一种特殊的技术债，它是由代码注释有意引入和注释的。这些技术债务降低了软件的质量，增加了后续软件维护的成本。因此，有必要及时发现和解决这些债务。近年来，人们提出了许多自动识别SATD的方法。问题。流行的ide支持许多预定义的任务注释标记，用于在注释中指示SATD，这些标记已在许多项目中使用。然而，现有的SATD识别方法在识别SATD时忽略了这种明确的先验知识。目标。我们的目标是通过比较现有方法和利用预定义任务标签识别SATD的简单方法，来调查我们在SATD识别领域的真正进展。方法。我们首先提出了一种简单的启发式方法，在注释中模糊匹配任务注释标签(MAT)来识别SATD。本质上，MAT是一种无监督的方法，它不需要任何数据来训练预测模型，具有很好的可理解性。然后，我们通过比较MAT与现有方法来检查SATD识别的实际进展。结果。实验结果表明:(1)无论考虑非努力意识评价指标还是努力意识评价指标，与现有方法相比，MAT对SATD的识别都具有相似甚至更好的性能;(2)现有方法正确识别的SATDs(或非SATDs)与MAT识别的SATDs高度重叠;(3)监督方法将许多带有任务标签的satd错误地分类为非satd，这可以通过与MAT的组合来很容易地纠正。我们的社区似乎(无意中)把SATD识别问题复杂化了，也就是说，SATD评论识别的真正进展并没有像设想的那样实现。因此，我们建议，当在目标项目的注释中使用许多任务标签时，未来的SATD识别研究应该使用MAT作为易于实现的基线，以证明任何新提出的方法的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

How Far Have We Progressed in Identifying Self-admitted Technical Debts? A Comprehensive Empirical Study

Background. Self-admitted technical debt (SATD) is a special kind of technical debt that is intentionally introduced and remarked by code comments. Those technical debts reduce the quality of software and increase the cost of subsequent software maintenance. Therefore, it is necessary to find out and resolve these debts in time. Recently, many automatic approaches have been proposed to identify SATD. Problem. Popular IDEs support a number of predefined task annotation tags for indicating SATD in comments, which have been used in many projects. However, such clear prior knowledge is neglected by existing SATD identification approaches when identifying SATD. Objective. We aim to investigate how far we have really progressed in the field of SATD identification by comparing existing approaches with a simple approach that leverages the predefined task tags to identify SATD. Method. We first propose a simple heuristic approach that fuzzily Matches task Annotation Tags (MAT) in comments to identify SATD. In nature, MAT is an unsupervised approach, which does not need any data to train a prediction model and has a good understandability. Then, we examine the real progress in SATD identification by comparing MAT against existing approaches. Result. The experimental results reveal that: (1) MAT has a similar or even superior performance for SATD identification compared with existing approaches, regardless of whether non-effort-aware or effort-aware evaluation indicators are considered; (2) the SATDs (or non-SATDs) correctly identified by existing approaches are highly overlapped with those identified by MAT; and (3) supervised approaches misclassify many SATDs marked with task tags as non-SATDs, which can be easily corrected by their combinations with MAT. Conclusion. It appears that the problem of SATD identification has been (unintentionally) complicated by our community, i.e., the real progress in SATD comments identification is not being achieved as it might have been envisaged. We hence suggest that, when many task tags are used in the comments of a target project, future SATD identification studies should use MAT as an easy-to-implement baseline to demonstrate the usefulness of any newly proposed approach.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ACM Transactions on Software Engineering and Methodology (TOSEM)

自引率

0.00%

发文量