用于自我承认技术债务检测的大型语言模型 ChatGPT 与小型深度学习模型：为什么不在一起？

Software: Practice and Experience Pub Date : 2024-06-28 DOI:10.1002/spe.3360

Jun Li, Lixian Li, Jin Liu, Xiao Yu, Xiao Liu, Jacky Wai Keung

{"title":"用于自我承认技术债务检测的大型语言模型 ChatGPT 与小型深度学习模型：为什么不在一起？","authors":"Jun Li, Lixian Li, Jin Liu, Xiao Yu, Xiao Liu, Jacky Wai Keung","doi":"10.1002/spe.3360","DOIUrl":null,"url":null,"abstract":"SummaryGiven the increasing complexity and volume of Self‐Admitted Technical Debts (SATDs), how to efficiently detect them becomes critical in software engineering practice for improving code quality and project efficiency. Although current deep learning methods have achieved good performance in detecting SATDs in code comments, they lack explanation. Large language models such as ChatGPT are increasingly being applied to text classification tasks due to their ability to provide explanations for classification results, but it is unclear how effective ChatGPT is for SATD classification. As the first in‐depth study of ChatGPT for SATD detection, we evaluate ChatGPT's effectiveness, compare it with small deep learning models, and find that ChatGPT performs better on Recall, while small models perform better on Precision. Furthermore, to enhance the performance of these approaches, we propose a novel fusion approach named FSATD which combines ChatGPT with small models for SATD detection so as to provide reliable explanations. Through extensive experiments on 62,276 comments from 10 open‐source projects, we show that FSATD outperforms existing methods in performance of F1‐score in cross‐project scenarios. Additionally, FSATD allows for flexible adjustment of fusion strategies, adapting to different requirements of various application scenarios, and can achieve the best Precision, Recall, or F1‐score.","PeriodicalId":21899,"journal":{"name":"Software: Practice and Experience","volume":"5 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Large language model ChatGPT versus small deep learning models for self‐admitted technical debt detection: Why not together?\",\"authors\":\"Jun Li, Lixian Li, Jin Liu, Xiao Yu, Xiao Liu, Jacky Wai Keung\",\"doi\":\"10.1002/spe.3360\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"SummaryGiven the increasing complexity and volume of Self‐Admitted Technical Debts (SATDs), how to efficiently detect them becomes critical in software engineering practice for improving code quality and project efficiency. Although current deep learning methods have achieved good performance in detecting SATDs in code comments, they lack explanation. Large language models such as ChatGPT are increasingly being applied to text classification tasks due to their ability to provide explanations for classification results, but it is unclear how effective ChatGPT is for SATD classification. As the first in‐depth study of ChatGPT for SATD detection, we evaluate ChatGPT's effectiveness, compare it with small deep learning models, and find that ChatGPT performs better on Recall, while small models perform better on Precision. Furthermore, to enhance the performance of these approaches, we propose a novel fusion approach named FSATD which combines ChatGPT with small models for SATD detection so as to provide reliable explanations. Through extensive experiments on 62,276 comments from 10 open‐source projects, we show that FSATD outperforms existing methods in performance of F1‐score in cross‐project scenarios. Additionally, FSATD allows for flexible adjustment of fusion strategies, adapting to different requirements of various application scenarios, and can achieve the best Precision, Recall, or F1‐score.\",\"PeriodicalId\":21899,\"journal\":{\"name\":\"Software: Practice and Experience\",\"volume\":\"5 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-06-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Software: Practice and Experience\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1002/spe.3360\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Software: Practice and Experience","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1002/spe.3360","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

摘要鉴于自认技术债务（SATD）的复杂性和数量都在不断增加，如何有效地检测它们成为软件工程实践中提高代码质量和项目效率的关键。虽然目前的深度学习方法在检测代码注释中的 SATD 方面取得了不错的成绩，但它们缺乏解释。由于 ChatGPT 等大型语言模型能够为分类结果提供解释，因此越来越多地应用于文本分类任务，但目前还不清楚 ChatGPT 在 SATD 分类中的效果如何。作为首次针对 ChatGPT 在 SATD 检测方面的深入研究，我们评估了 ChatGPT 的有效性，并将其与小型深度学习模型进行了比较，结果发现 ChatGPT 在 Recall 方面表现更好，而小型模型在 Precision 方面表现更好。此外，为了提高这些方法的性能，我们提出了一种名为 FSATD 的新型融合方法，它将 ChatGPT 与小型模型结合起来进行 SATD 检测，从而提供可靠的解释。通过对 10 个开源项目的 62276 条评论进行广泛实验，我们发现 FSATD 在跨项目场景下的 F1 分数表现优于现有方法。此外，FSATD 还能灵活调整融合策略，适应各种应用场景的不同要求，并能获得最佳精度、召回率或 F1 分数。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Large language model ChatGPT versus small deep learning models for self‐admitted technical debt detection: Why not together?

SummaryGiven the increasing complexity and volume of Self‐Admitted Technical Debts (SATDs), how to efficiently detect them becomes critical in software engineering practice for improving code quality and project efficiency. Although current deep learning methods have achieved good performance in detecting SATDs in code comments, they lack explanation. Large language models such as ChatGPT are increasingly being applied to text classification tasks due to their ability to provide explanations for classification results, but it is unclear how effective ChatGPT is for SATD classification. As the first in‐depth study of ChatGPT for SATD detection, we evaluate ChatGPT's effectiveness, compare it with small deep learning models, and find that ChatGPT performs better on Recall, while small models perform better on Precision. Furthermore, to enhance the performance of these approaches, we propose a novel fusion approach named FSATD which combines ChatGPT with small models for SATD detection so as to provide reliable explanations. Through extensive experiments on 62,276 comments from 10 open‐source projects, we show that FSATD outperforms existing methods in performance of F1‐score in cross‐project scenarios. Additionally, FSATD allows for flexible adjustment of fusion strategies, adapting to different requirements of various application scenarios, and can achieve the best Precision, Recall, or F1‐score.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Software: Practice and Experience

自引率

0.00%

发文量