Misha Kakkar, Eleni Constantinou, Apostolos Ampatzoglou, G. Robles, Jesus M. Gonzalez-Barahona, Daniel Izquierdo-Cortazar
求助PDF
{"title":"软件缺陷预测的数据预处理与Imputation相结合","authors":"Misha Kakkar, Eleni Constantinou, Apostolos Ampatzoglou, G. Robles, Jesus M. Gonzalez-Barahona, Daniel Izquierdo-Cortazar","doi":"10.4018/IJOSSP.2018010101","DOIUrl":null,"url":null,"abstract":"SoftwareDefectPrediction(SDP)modelsareusedtopredict,whethersoftwareiscleanorbuggy usingthehistoricaldatacollectedfromvarioussoftwarerepositories.Thedatacollectedfromsuch repositories may contain some missing values. In order to estimate missing values, imputation techniquesareused,whichutilizesthecompleteobservedvaluesinthedataset.Theobjectiveof thisstudyis to identify thebest-suitedimputationtechniqueforhandlingmissingvalues inSDP dataset.Inadditiontoidentifyingtheimputationtechnique,theauthorshaveinvestigatedforthemost appropriatecombinationofimputationtechniqueanddatapreprocessingmethodforbuildingSDP model.Inthisstudy,fourcombinationsofimputationtechniqueanddatapreprocessingmethodsare examinedusingtheimprovedNASAdatasets.Thesecombinationsareusedalongwithfivedifferent machine-learningalgorithmstodevelopmodels.Theperformanceof theseSDPmodelsarethen comparedusingtraditionalperformanceindicators.Experimentresultsshowthatamongdifferent imputationtechniques,linearregressiongivesthemostaccurateimputedvalue.Thecombination oflinearregressionwithcorrelationbasedfeatureselectoroutperformsallothercombinations.To validatethesignificanceofdatapreprocessingmethodswithimputationthefindingsareappliedto opensourceprojects.Itwasconcludedthattheresultisinconsistencywiththeaboveconclusion. KeyWORDS Feature Selection, Instance Selection, Missing Value Imputation, Software Defect Prediction","PeriodicalId":53605,"journal":{"name":"International Journal of Open Source Software and Processes","volume":"4 1","pages":"1-19"},"PeriodicalIF":0.0000,"publicationDate":"2018-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Combining Data Preprocessing Methods With Imputation Techniques for Software Defect Prediction\",\"authors\":\"Misha Kakkar, Eleni Constantinou, Apostolos Ampatzoglou, G. Robles, Jesus M. Gonzalez-Barahona, Daniel Izquierdo-Cortazar\",\"doi\":\"10.4018/IJOSSP.2018010101\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"SoftwareDefectPrediction(SDP)modelsareusedtopredict,whethersoftwareiscleanorbuggy usingthehistoricaldatacollectedfromvarioussoftwarerepositories.Thedatacollectedfromsuch repositories may contain some missing values. In order to estimate missing values, imputation techniquesareused,whichutilizesthecompleteobservedvaluesinthedataset.Theobjectiveof thisstudyis to identify thebest-suitedimputationtechniqueforhandlingmissingvalues inSDP dataset.Inadditiontoidentifyingtheimputationtechnique,theauthorshaveinvestigatedforthemost appropriatecombinationofimputationtechniqueanddatapreprocessingmethodforbuildingSDP model.Inthisstudy,fourcombinationsofimputationtechniqueanddatapreprocessingmethodsare examinedusingtheimprovedNASAdatasets.Thesecombinationsareusedalongwithfivedifferent machine-learningalgorithmstodevelopmodels.Theperformanceof theseSDPmodelsarethen comparedusingtraditionalperformanceindicators.Experimentresultsshowthatamongdifferent imputationtechniques,linearregressiongivesthemostaccurateimputedvalue.Thecombination oflinearregressionwithcorrelationbasedfeatureselectoroutperformsallothercombinations.To validatethesignificanceofdatapreprocessingmethodswithimputationthefindingsareappliedto opensourceprojects.Itwasconcludedthattheresultisinconsistencywiththeaboveconclusion. KeyWORDS Feature Selection, Instance Selection, Missing Value Imputation, Software Defect Prediction\",\"PeriodicalId\":53605,\"journal\":{\"name\":\"International Journal of Open Source Software and Processes\",\"volume\":\"4 1\",\"pages\":\"1-19\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Open Source Software and Processes\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.4018/IJOSSP.2018010101\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"Computer Science\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Open Source Software and Processes","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4018/IJOSSP.2018010101","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"Computer Science","Score":null,"Total":0}
引用次数: 7
引用
批量引用
Combining Data Preprocessing Methods With Imputation Techniques for Software Defect Prediction
SoftwareDefectPrediction(SDP)modelsareusedtopredict,whethersoftwareiscleanorbuggy usingthehistoricaldatacollectedfromvarioussoftwarerepositories.Thedatacollectedfromsuch repositories may contain some missing values. In order to estimate missing values, imputation techniquesareused,whichutilizesthecompleteobservedvaluesinthedataset.Theobjectiveof thisstudyis to identify thebest-suitedimputationtechniqueforhandlingmissingvalues inSDP dataset.Inadditiontoidentifyingtheimputationtechnique,theauthorshaveinvestigatedforthemost appropriatecombinationofimputationtechniqueanddatapreprocessingmethodforbuildingSDP model.Inthisstudy,fourcombinationsofimputationtechniqueanddatapreprocessingmethodsare examinedusingtheimprovedNASAdatasets.Thesecombinationsareusedalongwithfivedifferent machine-learningalgorithmstodevelopmodels.Theperformanceof theseSDPmodelsarethen comparedusingtraditionalperformanceindicators.Experimentresultsshowthatamongdifferent imputationtechniques,linearregressiongivesthemostaccurateimputedvalue.Thecombination oflinearregressionwithcorrelationbasedfeatureselectoroutperformsallothercombinations.To validatethesignificanceofdatapreprocessingmethodswithimputationthefindingsareappliedto opensourceprojects.Itwasconcludedthattheresultisinconsistencywiththeaboveconclusion. KeyWORDS Feature Selection, Instance Selection, Missing Value Imputation, Software Defect Prediction