Matteo Orrù, E. Tempero, M. Marchesi, R. Tonelli, Giuseppe Destefanis
{"title":"A Curated Benchmark Collection of Python Systems for Empirical Studies on Software Engineering","authors":"Matteo Orrù, E. Tempero, M. Marchesi, R. Tonelli, Giuseppe Destefanis","doi":"10.1145/2810146.2810148","DOIUrl":"https://doi.org/10.1145/2810146.2810148","url":null,"abstract":"The aim of this paper is to present a dataset of metrics associated to the first release of a curated collection of Python software systems. We describe the dataset along with the adopted criteria and the issues we faced while building such corpus. This dataset can enhance the reliability of empirical studies, enabling their reproducibility, reducing their cost, and it can foster further research on Python software.","PeriodicalId":189774,"journal":{"name":"Proceedings of the 11th International Conference on Predictive Models and Data Analytics in Software Engineering","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115495356","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Replication of Comparative Study of Moving Windows on Linear Regression and Estimation by Analogy","authors":"S. Amasaki, C. Lokan","doi":"10.1145/2810146.2810153","DOIUrl":"https://doi.org/10.1145/2810146.2810153","url":null,"abstract":"Context: Recent studies have shown that estimation accuracy can be affected by only using a window of recent projects as training data for building an effort estimation model. The effect and its extent can be affected by effort estimation methods (e.g. linear regression (LR) or estimation by analogy (EbA)), windowing policies (fixed-size or fixed-duration), and between organizations. However, different effects between organizations have only been explored with LR as the estimation method, and different effects between estimation methods and windowing policies have mainly been explored with data from only one organization. Objective: To further investigate the effect on estimation accuracy of using windows, with different windowing policies, when using EbA as the estimation method. Also, to compare the effect of LR with EbA as an estimation method, when using windows. Method: Using a data set studied with LR in previous research, we examine the effects of using windows on the accuracy of effort estimates, using EbA with both fixed-size and fixed-duration windowing policies. Results: With this data set, fixed-size windows, no matter their size, do not improve the accuracy of estimates obtained using EbA. This reinforces previous research with this data set, which used LR as the estimation approach. However, fixed-duration windows can improve the accuracy of estimates obtained with EbA. This contradicts previous research with this data set, which used LR as the estimation approach. Variations in the settings for EbA can change the sizes at which windows are helpful. Conclusions: This study reinforces that the effect of using windows can be affected by the effort estimation approach, and by the windowing policy. Contrary to previous research, fixed-duration windows are found to be more helpful than fixed-size windows, and significant improvements are found with EbA that were not found with LR. Further research is needed to understand these differences.","PeriodicalId":189774,"journal":{"name":"Proceedings of the 11th International Conference on Predictive Models and Data Analytics in Software Engineering","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130703531","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"What is the Impact of Imbalance on Software Defect Prediction Performance?","authors":"Zaheed Mahmood, David Bowes, Peter Lane, T. Hall","doi":"10.1145/2810146.2810150","DOIUrl":"https://doi.org/10.1145/2810146.2810150","url":null,"abstract":"Software defect prediction performance varies over a large range. Menzies suggested there is a ceiling effect of 80% Recall [8]. Most of the data sets used are highly imbalanced. This paper asks, what is the empirical effect of using different datasets with varying levels of imbalance on predictive performance? We use data synthesised by a previous meta-analysis of 600 fault prediction models and their results. Four model evaluation measures (the Mathews Correlation Coefficient (MCC), F-Measure, Precision and Recall) are compared to the corresponding data imbalance ratio. When the data are imbalanced, the predictive performance of software defect prediction studies is low. As the data become more balanced, the predictive performance of prediction models increases, from an average MCC of 0.15, until the minority class makes up 20% of the instances in the dataset, where the MCC reaches an average value of about 0.34. As the proportion of the minority class increases above 20%, the predictive performance does not significantly increase. Using datasets with more than 20% of the instances being defective has not had a significant impact on the predictive performance when using MCC. We conclude that comparing the results of defect prediction studies should take into account the imbalance of the data.","PeriodicalId":189774,"journal":{"name":"Proceedings of the 11th International Conference on Predictive Models and Data Analytics in Software Engineering","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122151900","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
E. Mendes, Burak Turhan, Pilar Rodríguez Marín, V. Freitas
{"title":"Estimating the Value of Decisions Relating to Managing and Developing Software-intensive Products and Projects","authors":"E. Mendes, Burak Turhan, Pilar Rodríguez Marín, V. Freitas","doi":"10.1145/2810146.2810154","DOIUrl":"https://doi.org/10.1145/2810146.2810154","url":null,"abstract":"The software industry's current decision-making relating to product/project management and development is largely done in a value neutral setting, in which cost is the primary driver for every decision taken. However, numerous studies have shown that the primary critical success factor that differentiates successful products/projects from failed ones lie in the value domain. Therefore, to remain competitive, innovative and to grow, companies must change from cost-based decision-making to value-based decision-making where the decisions taken are the best for that company's overall value creation. Our vision to tackle this problem and to provide a solution for value estimation is to employ a combination of qualitative and machine learning solutions where a probabilistic model encompassing the knowledge from different stakeholders will be used to predict the overall value of a given decision relating to product management and development. This vision drives the goal of a 3-year research project funded by the Finnish Funding Agency for Technology and Innovation (Tekes), with the participation of several industry partners.","PeriodicalId":189774,"journal":{"name":"Proceedings of the 11th International Conference on Predictive Models and Data Analytics in Software Engineering","volume":"235 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132937473","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Marco Ortu, Giuseppe Destefanis, Bram Adams, Alessandro Murgia, M. Marchesi, R. Tonelli
{"title":"The JIRA Repository Dataset: Understanding Social Aspects of Software Development","authors":"Marco Ortu, Giuseppe Destefanis, Bram Adams, Alessandro Murgia, M. Marchesi, R. Tonelli","doi":"10.1145/2810146.2810147","DOIUrl":"https://doi.org/10.1145/2810146.2810147","url":null,"abstract":"Issue tracking systems store valuable data for testing hypotheses concerning maintenance, building statistical prediction models and recently investigating developers \"affectiveness\". In particular, the Jira Issue Tracking System is a proprietary tracking system that has gained a tremendous popularity in the last years and offers unique features like the project management system and the Jira agile kanban board. This paper presents a dataset extracted from the Jira ITS of four popular open source ecosystems (as well as the tools and infrastructure used for extraction) the Apache Software Foundation, Spring, JBoss and CodeHaus communities. Our dataset hosts more than 1K projects, containing more than 700K issue reports and more than 2 million issue comments. Using this data, we have been able to deeply study the communication process among developers, and how this aspect affects the development process. Furthermore, comments posted by developers contain not only technical information, but also valuable information about sentiments and emotions. Since sentiment analysis and human aspects in software engineering are gaining more and more importance in the last years, with this repository we would like to encourage further studies in this direction.","PeriodicalId":189774,"journal":{"name":"Proceedings of the 11th International Conference on Predictive Models and Data Analytics in Software Engineering","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126279098","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Different Classifiers Find Different Defects Although With Different Level of Consistency","authors":"David Bowes, T. Hall, Jean Petrić","doi":"10.1145/2810146.2810149","DOIUrl":"https://doi.org/10.1145/2810146.2810149","url":null,"abstract":"BACKGROUND -- During the last 10 years hundreds of different defect prediction models have been published. The performance of the classifiers used in these models is reported to be similar with models rarely performing above the predictive performance ceiling of about 80% recall. OBJECTIVE -- We investigate the individual defects that four classifiers predict and analyse the level of prediction uncertainty produced by these classifiers. METHOD -- We perform a sensitivity analysis to compare the performance of Random Forest, Naïve Bayes, RPart and SVM classifiers when predicting defects in 12 NASA data sets. The defect predictions that each classifier makes is captured in a confusion matrix and the prediction uncertainty is compared against different classifiers. RESULTS -- Despite similar predictive performance values for these four classifiers, each detects different sets of defects. Some classifiers are more consistent in predicting defects than others. CONCLUSIONS -- Our results confirm that a unique sub-set of defects can be detected by specific classifiers. However, while some classifiers are consistent in the predictions they make, other classifiers vary in their predictions. Classifier ensembles with decision making strategies not based on majority voting are likely to perform best.","PeriodicalId":189774,"journal":{"name":"Proceedings of the 11th International Conference on Predictive Models and Data Analytics in Software Engineering","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133192809","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sofia Charalampidou, Apostolos Ampatzoglou, P. Avgeriou
{"title":"Size and cohesion metrics as indicators of the long method bad smell: An empirical study","authors":"Sofia Charalampidou, Apostolos Ampatzoglou, P. Avgeriou","doi":"10.1145/2810146.2810155","DOIUrl":"https://doi.org/10.1145/2810146.2810155","url":null,"abstract":"Source code bad smells are usually resolved through the application of well-defined solutions, i.e., refactorings. In the literature, software metrics are used as indicators of the existence and prioritization of resolving bad smells. In this paper, we focus on the long method smell (i.e. one of the most frequent and persistent bad smells) that can be resolved by the extract method refactoring. Until now, the identification of long methods or extract method opportunities has been performed based on cohesion, size or complexity metrics. However, the empirical validation of these metrics has exhibited relatively low accuracy with regard to their capacity to indicate the existence of long methods or extract method opportunities. Thus, we empirically explore the ability of size and cohesion metrics to predict the existence and the refactoring urgency of long method occurrences, through a case study on java open-source methods. The results of the study suggest that one size and four cohesion metrics are capable of characterizing the need and urgency for resolving the long method bad smell, with a higher accuracy compared to the previous studies. The obtained results are discussed by providing possible interpretations and implications to practitioners and researchers.","PeriodicalId":189774,"journal":{"name":"Proceedings of the 11th International Conference on Predictive Models and Data Analytics in Software Engineering","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122369626","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Proceedings of the 11th International Conference on Predictive Models and Data Analytics in Software Engineering","authors":"","doi":"10.1145/2810146","DOIUrl":"https://doi.org/10.1145/2810146","url":null,"abstract":"","PeriodicalId":189774,"journal":{"name":"Proceedings of the 11th International Conference on Predictive Models and Data Analytics in Software Engineering","volume":"121 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134450647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}