A Comparative Study on the Stability of Software Metric Selection Techniques

2012 11th International Conference on Machine Learning and Applications Pub Date : 2012-12-12 DOI:10.1109/ICMLA.2012.142

Huanjing Wang, T. Khoshgoftaar, Randall Wald, Amri Napolitano

{"title":"A Comparative Study on the Stability of Software Metric Selection Techniques","authors":"Huanjing Wang, T. Khoshgoftaar, Randall Wald, Amri Napolitano","doi":"10.1109/ICMLA.2012.142","DOIUrl":null,"url":null,"abstract":"In large software projects, software quality prediction is an important aspect of the development cycle to help focus quality assurance efforts on the modules most likely to contain faults. To perform software quality prediction, various software metrics are collected during the software development cycle, and models are built using these metrics. However, not all features (metrics) make the same contribution to the class attribute (e.g., faulty/not faulty). Thus, selecting a subset of metrics that are relevant to the class attribute is a critical step. As many feature selection algorithms exist, it is important to find ones which will produce consistent results even as the underlying data is changed, this quality of producing consistent results is referred to as \"stability.\" In this paper, we investigate the stability of seven feature selection techniques in the context of software quality classification. We compare four approaches for varying the underlying data to evaluate stability: the traditional approach of generating many sub samples of the original data and comparing the features selected from each, an earlier approach developed by our research group which compares the features selected from sub samples of the data with those selected from the original, and two newly-proposed approaches based on comparing two sub samples which are specifically designed to have same number of instances and a specified level of overlap, with one of these new approaches comparing within each pair while the other compares the generated sub samples with the original dataset. The empirical validation is carried out on sixteen software metrics datasets. Our results show that ReliefF is the most stable feature selection technique. Results also show that the level of overlap, degree of perturbation, and feature subset size do affect the stability of feature selection methods. Finally, we find that all four approaches of evaluating stability produce similar results in terms of which feature selection techniques are best under different circumstances.","PeriodicalId":157399,"journal":{"name":"2012 11th International Conference on Machine Learning and Applications","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 11th International Conference on Machine Learning and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMLA.2012.142","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

Abstract

In large software projects, software quality prediction is an important aspect of the development cycle to help focus quality assurance efforts on the modules most likely to contain faults. To perform software quality prediction, various software metrics are collected during the software development cycle, and models are built using these metrics. However, not all features (metrics) make the same contribution to the class attribute (e.g., faulty/not faulty). Thus, selecting a subset of metrics that are relevant to the class attribute is a critical step. As many feature selection algorithms exist, it is important to find ones which will produce consistent results even as the underlying data is changed, this quality of producing consistent results is referred to as "stability." In this paper, we investigate the stability of seven feature selection techniques in the context of software quality classification. We compare four approaches for varying the underlying data to evaluate stability: the traditional approach of generating many sub samples of the original data and comparing the features selected from each, an earlier approach developed by our research group which compares the features selected from sub samples of the data with those selected from the original, and two newly-proposed approaches based on comparing two sub samples which are specifically designed to have same number of instances and a specified level of overlap, with one of these new approaches comparing within each pair while the other compares the generated sub samples with the original dataset. The empirical validation is carried out on sixteen software metrics datasets. Our results show that ReliefF is the most stable feature selection technique. Results also show that the level of overlap, degree of perturbation, and feature subset size do affect the stability of feature selection methods. Finally, we find that all four approaches of evaluating stability produce similar results in terms of which feature selection techniques are best under different circumstances.

查看原文本刊更多论文

软件度量选择技术稳定性的比较研究

在大型软件项目中，软件质量预测是开发周期的一个重要方面，它有助于将质量保证工作集中在最有可能包含错误的模块上。为了执行软件质量预测，在软件开发周期中收集各种软件度量标准，并使用这些度量标准构建模型。然而，并不是所有的特性(度量)对类属性做出相同的贡献(例如，有缺陷/没有缺陷)。因此，选择与类属性相关的度量子集是关键的一步。由于存在许多特征选择算法，因此即使底层数据发生变化，也要找到能够产生一致结果的算法，这一点很重要，这种产生一致结果的质量被称为“稳定性”。本文研究了软件质量分类中7种特征选择技术的稳定性。我们比较了四种改变底层数据来评估稳定性的方法:传统的方法是生成原始数据的许多子样本并比较从每个子样本中选择的特征，本课题组开发的较早的方法是将从数据的子样本中选择的特征与从原始数据中选择的特征进行比较，以及两种新提出的基于比较两个子样本的方法，这两个子样本专门设计为具有相同数量的实例和指定的重叠程度。其中一种新方法在每对中进行比较，而另一种方法将生成的子样本与原始数据集进行比较。在16个软件度量数据集上进行了实证验证。结果表明，ReliefF是最稳定的特征选择技术。结果还表明，重叠程度、扰动程度和特征子集大小会影响特征选择方法的稳定性。最后，我们发现，就特征选择技术在不同情况下的最佳效果而言，所有四种评估稳定性的方法产生了相似的结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2012 11th International Conference on Machine Learning and Applications

自引率

0.00%

发文量