When less is more: on the value of “co-training” for semi-supervised software defect predictors

IF 3.5 2区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Empirical Software Engineering Pub Date : 2024-02-24 DOI:10.1007/s10664-023-10418-4

Suvodeep Majumder, Joymallya Chakraborty, Tim Menzies

{"title":"When less is more: on the value of “co-training” for semi-supervised software defect predictors","authors":"Suvodeep Majumder, Joymallya Chakraborty, Tim Menzies","doi":"10.1007/s10664-023-10418-4","DOIUrl":null,"url":null,"abstract":"<p>Labeling a module defective or non-defective is an expensive task. Hence, there are often limits on how much-labeled data is available for training. Semi-supervised classifiers use far fewer labels for training models. However, there are numerous semi-supervised methods, including self-labeling, co-training, maximal-margin, and graph-based methods, to name a few. Only a handful of these methods have been tested in SE for (e.g.) predicting defects– and even there, those methods have been tested on just a handful of projects. This paper applies a wide range of 55 semi-supervised learners to over 714 projects. We find that semi-supervised “co-training methods” work significantly better than other approaches. Specifically, after labeling, just 2.5% of data, then make predictions that are competitive to those using 100% of the data. That said, co-training needs to be used cautiously since the specific choice of co-training methods needs to be carefully selected based on a user’s specific goals. Also, we warn that a commonly-used co-training method (“multi-view”– where different learners get different sets of columns) does not improve predictions (while adding too much to the run time costs 11 hours vs. 1.8 hours). It is an open question, worthy of future work, to test if these reductions can be seen in other areas of software analytics. To assist with exploring other areas, all the codes used are available at https://github.com/ai-se/Semi-Supervised.</p>","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":"27 1","pages":""},"PeriodicalIF":3.5000,"publicationDate":"2024-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Empirical Software Engineering","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s10664-023-10418-4","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

Abstract

Labeling a module defective or non-defective is an expensive task. Hence, there are often limits on how much-labeled data is available for training. Semi-supervised classifiers use far fewer labels for training models. However, there are numerous semi-supervised methods, including self-labeling, co-training, maximal-margin, and graph-based methods, to name a few. Only a handful of these methods have been tested in SE for (e.g.) predicting defects– and even there, those methods have been tested on just a handful of projects. This paper applies a wide range of 55 semi-supervised learners to over 714 projects. We find that semi-supervised “co-training methods” work significantly better than other approaches. Specifically, after labeling, just 2.5% of data, then make predictions that are competitive to those using 100% of the data. That said, co-training needs to be used cautiously since the specific choice of co-training methods needs to be carefully selected based on a user’s specific goals. Also, we warn that a commonly-used co-training method (“multi-view”– where different learners get different sets of columns) does not improve predictions (while adding too much to the run time costs 11 hours vs. 1.8 hours). It is an open question, worthy of future work, to test if these reductions can be seen in other areas of software analytics. To assist with exploring other areas, all the codes used are available at https://github.com/ai-se/Semi-Supervised.

Abstract Image

查看原文本刊更多论文

少即是多：半监督软件缺陷预测器的 "共同训练 "价值

标记模块是否有缺陷是一项昂贵的任务。因此，用于训练的标注数据往往有限。半监督分类器在训练模型时使用的标签数量要少得多。然而，半监督方法有很多，包括自标记法、联合训练法、最大边际法和基于图的方法等等。其中只有少数几种方法在 SE 中进行过预测缺陷等方面的测试，即便如此，这些方法也只在少数几个项目中进行过测试。本文在超过 714 个项目中应用了 55 种半监督学习器。我们发现，半监督 "联合训练方法 "的效果明显优于其他方法。具体来说，只需标注 2.5% 的数据，就能做出与使用 100% 数据的预测相媲美的预测。尽管如此，协同训练仍需谨慎使用，因为协同训练方法的具体选择需要根据用户的具体目标来谨慎选择。此外，我们还警告说，一种常用的联合训练方法（"多视图"--不同的学习者获得不同的列集）并不能提高预测效果（同时运行时间成本增加过多，分别为 11 小时和 1.8 小时）。测试软件分析的其他领域是否也能实现这些改进是一个有待解决的问题，值得在今后的工作中加以研究。为了帮助探索其他领域，所有使用的代码都可以在 https://github.com/ai-se/Semi-Supervised 上找到。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Empirical Software Engineering 工程技术-计算机：软件工程

CiteScore

8.50

自引率

12.20%

发文量

169

审稿时长

>12 weeks

期刊介绍： Empirical Software Engineering provides a forum for applied software engineering research with a strong empirical component, and a venue for publishing empirical results relevant to both researchers and practitioners. Empirical studies presented here usually involve the collection and analysis of data and experience that can be used to characterize, evaluate and reveal relationships between software development deliverables, practices, and technologies. Over time, it is expected that such empirical results will form a body of knowledge leading to widely accepted and well-formed theories. The journal also offers industrial experience reports detailing the application of software technologies - processes, methods, or tools - and their effectiveness in industrial settings. Empirical Software Engineering promotes the publication of industry-relevant research, to address the significant gap between research and practice.