Misha Kakkar, Eleni Constantinou, Apostolos Ampatzoglou, G. Robles, Jesus M. Gonzalez-Barahona, Daniel Izquierdo-Cortazar
求助PDF
{"title":"Combining Data Preprocessing Methods With Imputation Techniques for Software Defect Prediction","authors":"Misha Kakkar, Eleni Constantinou, Apostolos Ampatzoglou, G. Robles, Jesus M. Gonzalez-Barahona, Daniel Izquierdo-Cortazar","doi":"10.4018/IJOSSP.2018010101","DOIUrl":null,"url":null,"abstract":"SoftwareDefectPrediction(SDP)modelsareusedtopredict,whethersoftwareiscleanorbuggy usingthehistoricaldatacollectedfromvarioussoftwarerepositories.Thedatacollectedfromsuch repositories may contain some missing values. In order to estimate missing values, imputation techniquesareused,whichutilizesthecompleteobservedvaluesinthedataset.Theobjectiveof thisstudyis to identify thebest-suitedimputationtechniqueforhandlingmissingvalues inSDP dataset.Inadditiontoidentifyingtheimputationtechnique,theauthorshaveinvestigatedforthemost appropriatecombinationofimputationtechniqueanddatapreprocessingmethodforbuildingSDP model.Inthisstudy,fourcombinationsofimputationtechniqueanddatapreprocessingmethodsare examinedusingtheimprovedNASAdatasets.Thesecombinationsareusedalongwithfivedifferent machine-learningalgorithmstodevelopmodels.Theperformanceof theseSDPmodelsarethen comparedusingtraditionalperformanceindicators.Experimentresultsshowthatamongdifferent imputationtechniques,linearregressiongivesthemostaccurateimputedvalue.Thecombination oflinearregressionwithcorrelationbasedfeatureselectoroutperformsallothercombinations.To validatethesignificanceofdatapreprocessingmethodswithimputationthefindingsareappliedto opensourceprojects.Itwasconcludedthattheresultisinconsistencywiththeaboveconclusion. KeyWORDS Feature Selection, Instance Selection, Missing Value Imputation, Software Defect Prediction","PeriodicalId":53605,"journal":{"name":"International Journal of Open Source Software and Processes","volume":"4 1","pages":"1-19"},"PeriodicalIF":0.0000,"publicationDate":"2018-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Open Source Software and Processes","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4018/IJOSSP.2018010101","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"Computer Science","Score":null,"Total":0}
引用次数: 7
引用
批量引用
软件缺陷预测的数据预处理与Imputation相结合
SoftwareDefectPrediction(SDP)modelsareusedtopredict、whethersoftwareiscleanorbuggy usingthehistoricaldatacollectedfromvarioussoftwarerepositories。Thedatacollectedfromsuch repositories_可能_包含_一些_缺失的_值。> > order > >估计> >缺失的值,> > impuation> > techniquesareused,whichutilizesthecompleteobservedvaluesinthedataset。Theobjectiveof thisstudyis to identify_ thebest-suitedimputationtechniqueforhandlingmissingvalues inSDP数据集。Inadditiontoidentifyingtheimputationtechnique,theauthorshaveinvestigatedforthemost appropriatecombinationofimputationtechniqueanddatapreprocessingmethodforbuildingSDP模型。Inthisstudy,fourcombinationsofimputationtechniqueanddatapreprocessingmethodsare examinedusingtheimprovedNASAdatasets。Thesecombinationsareusedalongwithfivedifferent machine-learningalgorithmstodevelopmodels。Theperformanceof theseSDPmodelsarethen comparedusingtraditionalperformanceindicators。Experimentresultsshowthatamongdifferent imputationtechniques,linearregressiongivesthemostaccurateimputedvalue。Thecombination oflinearregressionwithcorrelationbasedfeatureselectoroutperformsallothercombinations。To validatethesignificanceofdatapreprocessingmethodswithimputationthefindingsareappliedto opensourceprojects.Itwasconcludedthattheresultisinconsistencywiththeaboveconclusion。关键词特征选择,实例选择,缺失值插值,软件缺陷预测
本文章由计算机程序翻译,如有差异,请以英文原文为准。