{"title":"Data Sets and Data Quality in Software Engineering: Eight Years On","authors":"G. Liebchen, M. Shepperd","doi":"10.1145/2972958.2972967","DOIUrl":"https://doi.org/10.1145/2972958.2972967","url":null,"abstract":"Context: We revisit our review of data quality within the context of empirical software engineering eight years on from our PROMISE 2008 article. Objective: To assess the extent and types of techniques used to manage quality within data sets. We consider this a particularly interesting question in the context of initiatives to promote sharing and secondary analysis of data sets. Method: We update the 2008 mapping study through four subsequently published reviews and a snowballing exercise. Results: The original study located only 23 articles explicitly considering data quality. This picture has changed substantially as our updated review now finds 283 articles, however, our estimate is that this still represents perhaps 1% of the total empirical software engineering literature. Conclusions: It appears the community is now taking the issue of data quality more seriously and there is more work exploring techniques to automatically detect (and sometimes repair) noise problems. However, there is still little systematic work to evaluate the various data sets that are widely used for secondary analysis; addressing this would be of considerable benefit. It should also be a priority to work collab-oratively with practitioners to add new, higher quality data to the existing corpora.","PeriodicalId":176848,"journal":{"name":"Proceedings of the The 12th International Conference on Predictive Models and Data Analytics in Software Engineering","volume":"143 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121421528","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Porru, Alessandro Murgia, S. Demeyer, M. Marchesi, R. Tonelli
{"title":"Estimating Story Points from Issue Reports","authors":"S. Porru, Alessandro Murgia, S. Demeyer, M. Marchesi, R. Tonelli","doi":"10.1145/2972958.2972959","DOIUrl":"https://doi.org/10.1145/2972958.2972959","url":null,"abstract":"Estimating the effort of software engineering tasks is notoriously hard but essential for project planning. The agile community often adopts issue reports to describe tasks, and story points to estimate task effort. In this paper, we propose a machine learning classifier for estimating the story points required to address an issue. Through empirical evaluation on one industrial project and eight open source projects, we demonstrate that such classifier is feasible. We show that ---after an initial training on over 300 issue reports--- the classifier estimates a new issue in less than 15 seconds with a mean magnitude of relative error between 0.16 and 0.61. In addition, issue type, summary, description, and related components prove to be project dependent features pivotal for story point estimation.","PeriodicalId":176848,"journal":{"name":"Proceedings of the The 12th International Conference on Predictive Models and Data Analytics in Software Engineering","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129177647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Hidden Markov Models for the Prediction of Developer Involvement Dynamics and Workload","authors":"V. Honsel, S. Herbold, J. Grabowski","doi":"10.1145/2972958.2972960","DOIUrl":"https://doi.org/10.1145/2972958.2972960","url":null,"abstract":"The evolution of software projects is driven by developers who are in control of the developed artifacts. When analyzing the behavior of developers, the observable behaviors are, e.g., commits, messages, or bug assignments. For defining dynamic activities and workload of developers, we consider underlying characteristics, which means the level of involvement according to their role in the project. In this paper, we propose to employ Hidden Markov Models (HMMs) to model this underlying behavior given the observable behavior as input. For this, we observe monthly commits, bugfixes, mailing list activity, and bug comments for each developer over the project duration. As output we get a model for each developer describing how likely it is to be in a low, medium, or high contribution state of every point in time. As a result, we discovered that same developer types exhibit similar models in terms of state patterns and transition matrices, which represent their involvement dynamics. Although the workload of the different developer roles related to this is more complex to model, we created a general model which performs nearly as well as individual developer contribution models. Moreover, to demonstrate the practical applicability, we present an example of the usage of our approach in project planning.","PeriodicalId":176848,"journal":{"name":"Proceedings of the The 12th International Conference on Predictive Models and Data Analytics in Software Engineering","volume":"108 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124128719","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Search Based Training Data Selection For Cross Project Defect Prediction","authors":"Seyedrebvar Hosseini, Burak Turhan, M. Mäntylä","doi":"10.1145/2972958.2972964","DOIUrl":"https://doi.org/10.1145/2972958.2972964","url":null,"abstract":"Context: Previous studies have shown that steered training data or dataset selection can lead to better performance for cross project defect prediction (CPDP). On the other hand, data quality is an issue to consider in CPDP. Aim: We aim at utilising the Nearest Neighbor (NN)-Filter, embedded in a genetic algorithm, for generating evolving training datasets to tackle CPDP, while accounting for potential noise in defect labels. Method: We propose a new search based training data (i.e., instance) selection approach for CPDP called GIS (Genetic Instance Selection) that looks for solutions to optimize a combined measure of F-Measure and GMean, on a validation set generated by (NN)-filter. The genetic operations consider the similarities in features and address possible noise in assigned defect labels. We use 13 datasets from PROMISE repository in order to compare the performance of GIS with benchmark CPDP methods, namely (NN)-filter and naive CPDP, as well as with within project defect prediction (WPDP). Results: Our results show that GIS is significantly better than (NN)-Filter in terms of F-Measure (p -- value ≪ 0.001, Cohen's d = 0.697) and GMean (p -- value ≪ 0.001, Cohen's d = 0.946). It also outperforms the naive CPDP approach in terms of F-Measure (p -- value ≪ 0.001, Cohen's d = 0.753) and GMean (p -- value ≪ 0.001, Cohen's d = 0.994). In addition, the performance of our approach is better than that of WPDP, again considering F-Measure (p -- value ≪ 0.001, Cohen's d = 0.227) and GMean (p -- value ≪ 0.001, Cohen's d = 0.595) values. Conclusions: We conclude that search based instance selection is a promising way to tackle CPDP. Especially, the performance comparison with the within project scenario encourages further investigation of our approach. However, the performance of GIS is based on high recall in the expense of low precision. Using different optimization goals, e.g. targeting high precision, would be a future direction to investigate.","PeriodicalId":176848,"journal":{"name":"Proceedings of the The 12th International Conference on Predictive Models and Data Analytics in Software Engineering","volume":"80 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116006966","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Manually Validated Code Refactoring Dataset and Its Assessment Regarding Software Maintainability","authors":"I. Kádár, Péter Hegedüs, R. Ferenc, T. Gyimóthy","doi":"10.1145/2972958.2972962","DOIUrl":"https://doi.org/10.1145/2972958.2972962","url":null,"abstract":"Refactoring is a popular technique for improving the internal structure of software systems. It has a solid theoretical background while being used in development practice at the same time. However, we lack empirical research results on the real effect of code refactoring and its ways of application. This paper presents a manually validated dataset of applied refactorings and source code metrics and maintainability of 7 open-source systems. It is a subset of our previously published dataset containing the refactoring instances automatically extracted by the RefFinder tool. We found that RefFinder had around 27% overall average precision on the subject systems, thus our new -- manually validated -- subset has substantial added value allowing researchers to perform more accurate empirical investigations. Using this data, we were able to study whether refactorings were really triggered by poor maintainability of the code, or by other aspects. The results show that source code elements subject to refactorings had significantly lower maintainability values (approximated by source code metric aggregation) than elements not affected by refactorings between two releases.","PeriodicalId":176848,"journal":{"name":"Proceedings of the The 12th International Conference on Predictive Models and Data Analytics in Software Engineering","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114618220","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Empirical Evaluation of Distribution-based Thresholds for Internal Software Measures","authors":"L. Lavazza, S. Morasca","doi":"10.1145/2972958.2972965","DOIUrl":"https://doi.org/10.1145/2972958.2972965","url":null,"abstract":"Background Setting thresholds is important for the practical use of internal software measures, so software modules can be classified as having either acceptable or unacceptable quality, and software practitioners can take appropriate quality improvement actions. Quite a few methods have been proposed for setting thresholds and several of them are based on the distribution of an internal measure's values (and, possibly, other internal measures), without any explicit relationship with any external software quality of interest. Objective In this paper, we empirically investigate the consequences of defining thresholds on internal measures without taking into account the external measures that quantify qualities of practical interest. We focus on fault-proneness as the specific quality of practical interest. Method We analyzed datasets from the PROMISE repository. First, we computed the thresholds of code measures according to three distribution-based methods. Then, we derived statistically significant models of fault-proneness that use internal measures as independent variables. We then evaluated the indications provided by the distribution-based thresholds when used along with the fault-proneness models. Results Some methods for defining distribution-based thresholds requires that code measures be normally distributed. However, we found that this is hardly ever the case with the PROMISE datasets, making that entire class of methods inapplicable. We adapted these methods for non-normal distributions and obtained thresholds that appear reasonable, but are characterized by a large variation in the fault-proneness risk level they entail. Given a dataset, the thresholds for different internal measures---when used as independent variables of statistically significant models---provide fairly different values of fault-proneness. This is quite dangerous for practitioners, since they get thresholds that are presented as equally important, but practically can correspond to very different levels of user-perceivable quality. For other distribution-based methods, we found that the proposed thresholds are practically useless, as many modules with values of internal measures deemed acceptable according to the thresholds actually have high fault-proneness. Also, the accuracy of all of these methods appears to be lower than the accuracy obtained by simply estimating modules at random. Conclusions Our results indicate that distribution-based thresholds appear to be unreliable in providing sensible indications about the quality of software modules. Practitioners should instead use different kinds of threshold-setting methods, such as the ones that take into account data about the presence of faults in software modules, in addition to the values of internal software measures.","PeriodicalId":176848,"journal":{"name":"Proceedings of the The 12th International Conference on Predictive Models and Data Analytics in Software Engineering","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133828580","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. Klünder, Oliver Karras, Fabian Kortum, K. Schneider
{"title":"Forecasting Communication Behavior in Student Software Projects","authors":"J. Klünder, Oliver Karras, Fabian Kortum, K. Schneider","doi":"10.1145/2972958.2972961","DOIUrl":"https://doi.org/10.1145/2972958.2972961","url":null,"abstract":"Communication is an essential part of software product development. Therefore, communication is an inevitable means for information sharing. For example, ill-communicated requirements, guidelines or decisions complicate working in a team and may threaten project success. Hence, monitoring communication behavior can help fostering project success by preventing loss of information due to insufficient communication. Knowledge about a team's communication behavior and information sharing enables the corresponding project leader to react. Forecasting communication behavior can indicate critical situations like too little communication, inappropriate media or wrong receivers at early project stages. A good forecast can identify if there is a need to change communication behavior. In a study with 165 students in 34 teams participating in a software project, we collected data concerning the used communication channels and perceived intensity. We combine these two parameters for analyzing and forecasting communication behavior. Considering the displayed evolution of communication behavior within a team can indicate the necessity to intervene. For example, the project leader can establish one more meeting each week to support information exchange. Our forecasting algorithm bases on k-nearest neighbor selection in order to identify comparable projects. We validate this approach using cross validation, which leads to an average accuracy of 90%. This level of accuracy may provide a reliable forecast and a good opportunity for early conflict identification.","PeriodicalId":176848,"journal":{"name":"Proceedings of the The 12th International Conference on Predictive Models and Data Analytics in Software Engineering","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114865187","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Measuring the Stylistic Inconsistency in Software Projects using Hierarchical Agglomerative Clustering","authors":"Qing Mi, J. Keung, Yang Yu","doi":"10.1145/2972958.2972963","DOIUrl":"https://doi.org/10.1145/2972958.2972963","url":null,"abstract":"Background: Although many software engineering methodologies and guidelines are provided, it is common that developers apply their very own programming styles to the source code being produced. These individually preferred programming styles are more comprehensive for themselves, but may well conflict with each other. Thus, the problem of stylistic inconsistency is inevitable during the software development process involving multiple developers, the result is undesirable and that will significantly degrade program readability and maintainability. Aims: Given limited understanding in this regard, we perform an empirical analysis for the purpose of quantitatively measuring the inconsistency degree of programming style within a software project team. Method: We first propose stylistic fingerprints, which are represented as a set of attribute-counting-metrics, in an attempt to characterize different programming styles. Then we adopt the hierarchical agglomerative clustering (HAC) technique to quantitatively measuring the proximity of programming style based on six C/C++ open source projects chosen from different application domains. Results: The empirical results demonstrate the feasibility and validity of our fingerprinting methodology. Moreover, the proposed clustering procedure utilizing HAC algorithm with dendrograms is capable of effectively illustrating the inconsistency degree of programming style among source files, which is significant for future research. Conclusions: This study proposed an effective and efficient approach for analyzing programming style inconsistency, supported by a sound theoretical basis for dealing with such a problem. Ultimately improving program readability and therefore reduce the maintenance overhead for software projects.","PeriodicalId":176848,"journal":{"name":"Proceedings of the The 12th International Conference on Predictive Models and Data Analytics in Software Engineering","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126758116","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On the Terms Within- and Cross-Company in Software Effort Estimation","authors":"Leandro L. Minku","doi":"10.1145/2972958.2972968","DOIUrl":"https://doi.org/10.1145/2972958.2972968","url":null,"abstract":"Background: the terms Within-Company (WC) and Cross-Company (CC) in Software Effort Estimation (SEE) have the connotation that CC projects are considerably different from WC projects, and that WC projects are more similar to the projects being estimated. However, as WC projects can themselves be heterogeneous, this is not always the case. Therefore, the use of the terms WC and CC has been questioned as potentially misleading and possibly unhelpful. Aims: to raise awareness of the SEE community in terms of the problems presented by the terms WC and CC, and to encourage discussions on the appropriateness of these terms. Method: existing literature on CC and WC SEE is discussed to raise evidence in favour and against the use of these terms. Results: existing evidence suggests that the terms WC and CC are helpful, because distinguishing between WC and CC projects can help the predictive performance of SEE models. However, due to their connotation, they can be misleading and potentially lead to wrong conclusions in studies comparing WC and CC SEE models. Conclusions: the issue being tackled when investigating WC and CC SEE is heterogeneity, and not the different origins of the software projects per se. Given that the terms WC and CC can be misleading, researchers are encouraged to discuss and consider the problems presented by these terms in SEE papers. Labelling projects as \"potentially homogeneous\" and \"potentially heterogeneous\" may be safer than directly labelling them as WC and CC projects.","PeriodicalId":176848,"journal":{"name":"Proceedings of the The 12th International Conference on Predictive Models and Data Analytics in Software Engineering","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123105432","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Predicting the Popularity of GitHub Repositories","authors":"H. Borges, André C. Hora, M. T. Valente","doi":"10.1145/2972958.2972966","DOIUrl":"https://doi.org/10.1145/2972958.2972966","url":null,"abstract":"GitHub is the largest source code repository in the world. It provides a git-based source code management platform and also many features inspired by social networks. For example, GitHub users can show appreciation to projects by adding stars to them. Therefore, the number of stars of a repository is a direct measure of its popularity. In this paper, we use multiple linear regressions to predict the number of stars of GitHub repositories. These predictions are useful both to repository owners and clients, who usually want to know how their projects are performing in a competitive open source development market. In a large-scale analysis, we show that the proposed models start to provide accurate predictions after being trained with the number of stars received in the last six months. Furthermore, specific models---generated using data from repositories that share the same growth trends---are recommended for repositories with slow growth and/or for repositories with less stars. Finally, we evaluate the ability to predict not the number of stars of a repository but its rank among the GitHub repositories. We found a very strong correlation between predicted and real rankings (Spearman's rho greater than 0.95).","PeriodicalId":176848,"journal":{"name":"Proceedings of the The 12th International Conference on Predictive Models and Data Analytics in Software Engineering","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126737789","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}