Proceedings of the 11th International Conference on Predictive Models and Data Analytics in Software Engineering最新文献

A Curated Benchmark Collection of Python Systems for Empirical Studies on Software Engineering 软件工程实证研究的Python系统精选基准集

Proceedings of the 11th International Conference on Predictive Models and Data Analytics in Software Engineering Pub Date : 2015-10-21 DOI: 10.1145/2810146.2810148

Matteo Orrù, E. Tempero, M. Marchesi, R. Tonelli, Giuseppe Destefanis

引用次数: 21

A Replication of Comparative Study of Moving Windows on Linear Regression and Estimation by Analogy 线性回归与类比估计中移动窗口的比较研究

Proceedings of the 11th International Conference on Predictive Models and Data Analytics in Software Engineering Pub Date : 2015-10-21 DOI: 10.1145/2810146.2810153

S. Amasaki, C. Lokan

{"title":"A Replication of Comparative Study of Moving Windows on Linear Regression and Estimation by Analogy","authors":"S. Amasaki, C. Lokan","doi":"10.1145/2810146.2810153","DOIUrl":"https://doi.org/10.1145/2810146.2810153","url":null,"abstract":"Context: Recent studies have shown that estimation accuracy can be affected by only using a window of recent projects as training data for building an effort estimation model. The effect and its extent can be affected by effort estimation methods (e.g. linear regression (LR) or estimation by analogy (EbA)), windowing policies (fixed-size or fixed-duration), and between organizations. However, different effects between organizations have only been explored with LR as the estimation method, and different effects between estimation methods and windowing policies have mainly been explored with data from only one organization. Objective: To further investigate the effect on estimation accuracy of using windows, with different windowing policies, when using EbA as the estimation method. Also, to compare the effect of LR with EbA as an estimation method, when using windows. Method: Using a data set studied with LR in previous research, we examine the effects of using windows on the accuracy of effort estimates, using EbA with both fixed-size and fixed-duration windowing policies. Results: With this data set, fixed-size windows, no matter their size, do not improve the accuracy of estimates obtained using EbA. This reinforces previous research with this data set, which used LR as the estimation approach. However, fixed-duration windows can improve the accuracy of estimates obtained with EbA. This contradicts previous research with this data set, which used LR as the estimation approach. Variations in the settings for EbA can change the sizes at which windows are helpful. Conclusions: This study reinforces that the effect of using windows can be affected by the effort estimation approach, and by the windowing policy. Contrary to previous research, fixed-duration windows are found to be more helpful than fixed-size windows, and significant improvements are found with EbA that were not found with LR. Further research is needed to understand these differences.","PeriodicalId":189774,"journal":{"name":"Proceedings of the 11th International Conference on Predictive Models and Data Analytics in Software Engineering","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130703531","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

What is the Impact of Imbalance on Software Defect Prediction Performance? 不平衡对软件缺陷预测性能的影响是什么?

Proceedings of the 11th International Conference on Predictive Models and Data Analytics in Software Engineering Pub Date : 2015-10-21 DOI: 10.1145/2810146.2810150

Zaheed Mahmood, David Bowes, Peter Lane, T. Hall

{"title":"What is the Impact of Imbalance on Software Defect Prediction Performance?","authors":"Zaheed Mahmood, David Bowes, Peter Lane, T. Hall","doi":"10.1145/2810146.2810150","DOIUrl":"https://doi.org/10.1145/2810146.2810150","url":null,"abstract":"Software defect prediction performance varies over a large range. Menzies suggested there is a ceiling effect of 80% Recall [8]. Most of the data sets used are highly imbalanced. This paper asks, what is the empirical effect of using different datasets with varying levels of imbalance on predictive performance? We use data synthesised by a previous meta-analysis of 600 fault prediction models and their results. Four model evaluation measures (the Mathews Correlation Coefficient (MCC), F-Measure, Precision and Recall) are compared to the corresponding data imbalance ratio. When the data are imbalanced, the predictive performance of software defect prediction studies is low. As the data become more balanced, the predictive performance of prediction models increases, from an average MCC of 0.15, until the minority class makes up 20% of the instances in the dataset, where the MCC reaches an average value of about 0.34. As the proportion of the minority class increases above 20%, the predictive performance does not significantly increase. Using datasets with more than 20% of the instances being defective has not had a significant impact on the predictive performance when using MCC. We conclude that comparing the results of defect prediction studies should take into account the imbalance of the data.","PeriodicalId":189774,"journal":{"name":"Proceedings of the 11th International Conference on Predictive Models and Data Analytics in Software Engineering","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122151900","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 33

Estimating the Value of Decisions Relating to Managing and Developing Software-intensive Products and Projects 评估与管理和开发软件密集型产品和项目有关的决策的价值

Proceedings of the 11th International Conference on Predictive Models and Data Analytics in Software Engineering Pub Date : 2015-10-21 DOI: 10.1145/2810146.2810154

E. Mendes, Burak Turhan, Pilar Rodríguez Marín, V. Freitas

{"title":"Estimating the Value of Decisions Relating to Managing and Developing Software-intensive Products and Projects","authors":"E. Mendes, Burak Turhan, Pilar Rodríguez Marín, V. Freitas","doi":"10.1145/2810146.2810154","DOIUrl":"https://doi.org/10.1145/2810146.2810154","url":null,"abstract":"The software industry's current decision-making relating to product/project management and development is largely done in a value neutral setting, in which cost is the primary driver for every decision taken. However, numerous studies have shown that the primary critical success factor that differentiates successful products/projects from failed ones lie in the value domain. Therefore, to remain competitive, innovative and to grow, companies must change from cost-based decision-making to value-based decision-making where the decisions taken are the best for that company's overall value creation. Our vision to tackle this problem and to provide a solution for value estimation is to employ a combination of qualitative and machine learning solutions where a probabilistic model encompassing the knowledge from different stakeholders will be used to predict the overall value of a given decision relating to product management and development. This vision drives the goal of a 3-year research project funded by the Finnish Funding Agency for Technology and Innovation (Tekes), with the participation of several industry partners.","PeriodicalId":189774,"journal":{"name":"Proceedings of the 11th International Conference on Predictive Models and Data Analytics in Software Engineering","volume":"235 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132937473","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 12

The JIRA Repository Dataset: Understanding Social Aspects of Software Development JIRA存储库数据集:理解软件开发的社会方面

Proceedings of the 11th International Conference on Predictive Models and Data Analytics in Software Engineering Pub Date : 2015-10-21 DOI: 10.1145/2810146.2810147

Marco Ortu, Giuseppe Destefanis, Bram Adams, Alessandro Murgia, M. Marchesi, R. Tonelli

{"title":"The JIRA Repository Dataset: Understanding Social Aspects of Software Development","authors":"Marco Ortu, Giuseppe Destefanis, Bram Adams, Alessandro Murgia, M. Marchesi, R. Tonelli","doi":"10.1145/2810146.2810147","DOIUrl":"https://doi.org/10.1145/2810146.2810147","url":null,"abstract":"Issue tracking systems store valuable data for testing hypotheses concerning maintenance, building statistical prediction models and recently investigating developers \"affectiveness\". In particular, the Jira Issue Tracking System is a proprietary tracking system that has gained a tremendous popularity in the last years and offers unique features like the project management system and the Jira agile kanban board. This paper presents a dataset extracted from the Jira ITS of four popular open source ecosystems (as well as the tools and infrastructure used for extraction) the Apache Software Foundation, Spring, JBoss and CodeHaus communities. Our dataset hosts more than 1K projects, containing more than 700K issue reports and more than 2 million issue comments. Using this data, we have been able to deeply study the communication process among developers, and how this aspect affects the development process. Furthermore, comments posted by developers contain not only technical information, but also valuable information about sentiments and emotions. Since sentiment analysis and human aspects in software engineering are gaining more and more importance in the last years, with this repository we would like to encourage further studies in this direction.","PeriodicalId":189774,"journal":{"name":"Proceedings of the 11th International Conference on Predictive Models and Data Analytics in Software Engineering","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126279098","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 69

Different Classifiers Find Different Defects Although With Different Level of Consistency 不同的分类器发现不同的缺陷，尽管一致性程度不同

Proceedings of the 11th International Conference on Predictive Models and Data Analytics in Software Engineering Pub Date : 2015-10-21 DOI: 10.1145/2810146.2810149

David Bowes, T. Hall, Jean Petrić

{"title":"Different Classifiers Find Different Defects Although With Different Level of Consistency","authors":"David Bowes, T. Hall, Jean Petrić","doi":"10.1145/2810146.2810149","DOIUrl":"https://doi.org/10.1145/2810146.2810149","url":null,"abstract":"BACKGROUND -- During the last 10 years hundreds of different defect prediction models have been published. The performance of the classifiers used in these models is reported to be similar with models rarely performing above the predictive performance ceiling of about 80% recall. OBJECTIVE -- We investigate the individual defects that four classifiers predict and analyse the level of prediction uncertainty produced by these classifiers. METHOD -- We perform a sensitivity analysis to compare the performance of Random Forest, Naïve Bayes, RPart and SVM classifiers when predicting defects in 12 NASA data sets. The defect predictions that each classifier makes is captured in a confusion matrix and the prediction uncertainty is compared against different classifiers. RESULTS -- Despite similar predictive performance values for these four classifiers, each detects different sets of defects. Some classifiers are more consistent in predicting defects than others. CONCLUSIONS -- Our results confirm that a unique sub-set of defects can be detected by specific classifiers. However, while some classifiers are consistent in the predictions they make, other classifiers vary in their predictions. Classifier ensembles with decision making strategies not based on majority voting are likely to perform best.","PeriodicalId":189774,"journal":{"name":"Proceedings of the 11th International Conference on Predictive Models and Data Analytics in Software Engineering","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133192809","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Size and cohesion metrics as indicators of the long method bad smell: An empirical study 大小和内聚度量作为长方法异味的指标:实证研究

Proceedings of the 11th International Conference on Predictive Models and Data Analytics in Software Engineering Pub Date : 2015-10-21 DOI: 10.1145/2810146.2810155

Sofia Charalampidou, Apostolos Ampatzoglou, P. Avgeriou

{"title":"Size and cohesion metrics as indicators of the long method bad smell: An empirical study","authors":"Sofia Charalampidou, Apostolos Ampatzoglou, P. Avgeriou","doi":"10.1145/2810146.2810155","DOIUrl":"https://doi.org/10.1145/2810146.2810155","url":null,"abstract":"Source code bad smells are usually resolved through the application of well-defined solutions, i.e., refactorings. In the literature, software metrics are used as indicators of the existence and prioritization of resolving bad smells. In this paper, we focus on the long method smell (i.e. one of the most frequent and persistent bad smells) that can be resolved by the extract method refactoring. Until now, the identification of long methods or extract method opportunities has been performed based on cohesion, size or complexity metrics. However, the empirical validation of these metrics has exhibited relatively low accuracy with regard to their capacity to indicate the existence of long methods or extract method opportunities. Thus, we empirically explore the ability of size and cohesion metrics to predict the existence and the refactoring urgency of long method occurrences, through a case study on java open-source methods. The results of the study suggest that one size and four cohesion metrics are capable of characterizing the need and urgency for resolving the long method bad smell, with a higher accuracy compared to the previous studies. The obtained results are discussed by providing possible interpretations and implications to practitioners and researchers.","PeriodicalId":189774,"journal":{"name":"Proceedings of the 11th International Conference on Predictive Models and Data Analytics in Software Engineering","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122369626","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 21

Proceedings of the 11th International Conference on Predictive Models and Data Analytics in Software Engineering 第11届软件工程预测模型与数据分析国际会议论文集

Proceedings of the 11th International Conference on Predictive Models and Data Analytics in Software Engineering Pub Date : 1900-01-01 DOI: 10.1145/2810146

引用次数: 5