Cuauhtémoc López Martín, Arturo Chavoya-Pena, M. Meda-Campaña
{"title":"A machine learning technique for predicting the productivity of practitioners from individually developed software projects","authors":"Cuauhtémoc López Martín, Arturo Chavoya-Pena, M. Meda-Campaña","doi":"10.1109/SNPD.2014.6888690","DOIUrl":null,"url":null,"abstract":"Context: Productivity management of software developers is a challenge in Information and Communication Technology. Predictions of productivity can be useful to determine corrective actions and to assist managers in evaluating improvement alternatives. Productivity prediction models have been based on statistical regressions, statistical time series, fuzzy logic, and machine learning. Goal: To propose a machine learning model termed general regression neural network (GRNN) for predicting the productivity of software practitioners. Hypothesis: Prediction accuracy of a GRNN is better than a statistical regression model when these two models are applied for predicting productivity of software practitioners who have individually developed their software projects. Method: A sample obtained from 396 software projects developed between the years 2005 and 2011 by 99 practitioners was used for training the models, whereas a sample of 60 projects developed by 15 practitioners in the first months of 2012 was used for testing the models. All projects were developed based upon a disciplined development process within a controlled environment. The accuracy of the GRNN was compared against that of a multiple regression model (MLR). The criteria for evaluating the accuracy of these two models were the Magnitude of Error Relative to the estimate and a t-paired statistical test. Results: Prediction accuracy of an GRNN was statistically better than that of an MLR model at the 99% confidence level. Conclusion: An GRNN could be applied for predicting the productivity of practitioners when New and Changed lines of code, reused code, and programming language experience of practitioners are used as independent variables.","PeriodicalId":272932,"journal":{"name":"15th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD)","volume":"91 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"15th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SNPD.2014.6888690","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8
Abstract
Context: Productivity management of software developers is a challenge in Information and Communication Technology. Predictions of productivity can be useful to determine corrective actions and to assist managers in evaluating improvement alternatives. Productivity prediction models have been based on statistical regressions, statistical time series, fuzzy logic, and machine learning. Goal: To propose a machine learning model termed general regression neural network (GRNN) for predicting the productivity of software practitioners. Hypothesis: Prediction accuracy of a GRNN is better than a statistical regression model when these two models are applied for predicting productivity of software practitioners who have individually developed their software projects. Method: A sample obtained from 396 software projects developed between the years 2005 and 2011 by 99 practitioners was used for training the models, whereas a sample of 60 projects developed by 15 practitioners in the first months of 2012 was used for testing the models. All projects were developed based upon a disciplined development process within a controlled environment. The accuracy of the GRNN was compared against that of a multiple regression model (MLR). The criteria for evaluating the accuracy of these two models were the Magnitude of Error Relative to the estimate and a t-paired statistical test. Results: Prediction accuracy of an GRNN was statistically better than that of an MLR model at the 99% confidence level. Conclusion: An GRNN could be applied for predicting the productivity of practitioners when New and Changed lines of code, reused code, and programming language experience of practitioners are used as independent variables.