{"title":"软件缺陷预测的半监督方法","authors":"Huihua Lu, B. Cukic, M. Culp","doi":"10.1109/COMPSAC.2014.65","DOIUrl":null,"url":null,"abstract":"Accurate detection of software components that need to be exposed to additional verification and validation offers the path to high quality products while minimizing non essential software assurance expenditures. In this type of quality modeling we assume that software modules with known fault content developed in similar environment are available. Supervised learning algorithms are the traditional methods of choice for training on existing modules. The models are then used to predict fault content for newly developed software components prior to product release. However, one needs to realize that establishing whether a module contains a fault or not, only to be used for model training, can be expensive. The basic idea behind semi-supervised learning is to learn from a small number of software modules with known fault content and supplement model training with modules for which the fault information is not available, thus reducing the overall cost of quality assurance. In this study, we investigate the performance of semi-supervised learning for software fault prediction. A preprocessing strategy, multidimensional scaling, is embedded in the approach to reduce the dimensional complexity of software metrics used for prediction. Our results show that the dimension-reduction with semi-supervised learning algorithm preforms significantly better than one of the best performing supervised learning algorithm - random forest - in situations when few modules with known fault content are available. We compare our results with the published benchmarks and clearly demonstrate performance benefits.","PeriodicalId":106871,"journal":{"name":"2014 IEEE 38th Annual Computer Software and Applications Conference","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"18","resultStr":"{\"title\":\"A Semi-supervised Approach to Software Defect Prediction\",\"authors\":\"Huihua Lu, B. Cukic, M. Culp\",\"doi\":\"10.1109/COMPSAC.2014.65\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Accurate detection of software components that need to be exposed to additional verification and validation offers the path to high quality products while minimizing non essential software assurance expenditures. In this type of quality modeling we assume that software modules with known fault content developed in similar environment are available. Supervised learning algorithms are the traditional methods of choice for training on existing modules. The models are then used to predict fault content for newly developed software components prior to product release. However, one needs to realize that establishing whether a module contains a fault or not, only to be used for model training, can be expensive. The basic idea behind semi-supervised learning is to learn from a small number of software modules with known fault content and supplement model training with modules for which the fault information is not available, thus reducing the overall cost of quality assurance. In this study, we investigate the performance of semi-supervised learning for software fault prediction. A preprocessing strategy, multidimensional scaling, is embedded in the approach to reduce the dimensional complexity of software metrics used for prediction. Our results show that the dimension-reduction with semi-supervised learning algorithm preforms significantly better than one of the best performing supervised learning algorithm - random forest - in situations when few modules with known fault content are available. We compare our results with the published benchmarks and clearly demonstrate performance benefits.\",\"PeriodicalId\":106871,\"journal\":{\"name\":\"2014 IEEE 38th Annual Computer Software and Applications Conference\",\"volume\":\"12 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-07-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"18\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 IEEE 38th Annual Computer Software and Applications Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/COMPSAC.2014.65\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE 38th Annual Computer Software and Applications Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/COMPSAC.2014.65","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Semi-supervised Approach to Software Defect Prediction
Accurate detection of software components that need to be exposed to additional verification and validation offers the path to high quality products while minimizing non essential software assurance expenditures. In this type of quality modeling we assume that software modules with known fault content developed in similar environment are available. Supervised learning algorithms are the traditional methods of choice for training on existing modules. The models are then used to predict fault content for newly developed software components prior to product release. However, one needs to realize that establishing whether a module contains a fault or not, only to be used for model training, can be expensive. The basic idea behind semi-supervised learning is to learn from a small number of software modules with known fault content and supplement model training with modules for which the fault information is not available, thus reducing the overall cost of quality assurance. In this study, we investigate the performance of semi-supervised learning for software fault prediction. A preprocessing strategy, multidimensional scaling, is embedded in the approach to reduce the dimensional complexity of software metrics used for prediction. Our results show that the dimension-reduction with semi-supervised learning algorithm preforms significantly better than one of the best performing supervised learning algorithm - random forest - in situations when few modules with known fault content are available. We compare our results with the published benchmarks and clearly demonstrate performance benefits.