Proceedings of the Fifteenth International Conference on Predictive Models and Data Analytics in Software Engineering最新文献

An Evaluation of Parameter Pruning Approaches for Software Estimation 软件估计中参数修剪方法的评价

Proceedings of the Fifteenth International Conference on Predictive Models and Data Analytics in Software Engineering Pub Date : 2019-09-18 DOI: 10.1145/3345629.3345633

Thu D. Tran, Vu Nguyen, Thong Truong, C. Tran, Phu Le

引用次数: 1

Applying Cross Project Defect Prediction Approaches to Cross-Company Effort Estimation 跨项目缺陷预测方法在跨公司工作量估算中的应用

Proceedings of the Fifteenth International Conference on Predictive Models and Data Analytics in Software Engineering Pub Date : 2019-09-18 DOI: 10.1145/3345629.3345638

S. Amasaki, Tomoyuki Yokogawa, Hirohisa Aman

{"title":"Applying Cross Project Defect Prediction Approaches to Cross-Company Effort Estimation","authors":"S. Amasaki, Tomoyuki Yokogawa, Hirohisa Aman","doi":"10.1145/3345629.3345638","DOIUrl":"https://doi.org/10.1145/3345629.3345638","url":null,"abstract":"BACKGROUND: Prediction systems in software engineering often suffer from the shortage of suitable data within a project. A promising solution is transfer learning that utilizes data from outside the project. Many transfer learning approaches have been proposed for defect prediction known as cross-project defect prediction (CPDP). In contrast, a few approaches have been proposed for software effort estimation known as cross-company software effort estimation (CCSEE). Both CCSEE and CPDP are engaged in a similar problem, and a few CPDP approaches are applicable as CCSEE in actual. It is thus beneficial for improving CCSEE performance to examine how well CPDP approaches can perform as CCSEE approaches. AIMS: To explore how well CPDP approaches work as CCSEE approaches. METHOD: An empirical experiment was conducted for evaluating the performance of CPDP approaches in CCSEE. We examined 7 CPDP approaches which were selected due to the easiness of application. Those approaches were applied to 8 data sets, each of which consists of a few subsets from different domains. The estimation results were evaluated with a common performance measure called SA. RESULTS: there were several CPDP approaches which could improve the estimation accuracy though the degree of improvement was not large. CONCLUSIONS: A straight forward application of selected CPDP approaches did not bring a clear effect. CCSEE may need specific transfer learning approaches for more improvement.","PeriodicalId":424201,"journal":{"name":"Proceedings of the Fifteenth International Conference on Predictive Models and Data Analytics in Software Engineering","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130720610","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Leveraging Change Intents for Characterizing and Identifying Large-Review-Effort Changes 利用变更意图来描述和识别大型评审工作变更

Proceedings of the Fifteenth International Conference on Predictive Models and Data Analytics in Software Engineering Pub Date : 2019-09-18 DOI: 10.1145/3345629.3345635

Song Wang, Chetan Bansal, Nachiappan Nagappan, Adithya Abraham Philip

{"title":"Leveraging Change Intents for Characterizing and Identifying Large-Review-Effort Changes","authors":"Song Wang, Chetan Bansal, Nachiappan Nagappan, Adithya Abraham Philip","doi":"10.1145/3345629.3345635","DOIUrl":"https://doi.org/10.1145/3345629.3345635","url":null,"abstract":"Code changes to software occur due to various reasons such as bug fixing, new feature addition, and code refactoring. In most existing studies, the intent of the change is rarely leveraged to provide more specific, context aware analysis. In this paper, we present the first study to leverage change intent to characterize and identify Large-Review-Effort (LRE) changes regarding review effort---changes with large review effort. Specifically, we first propose a feedback-driven and heuristics-based approach to obtain change intents. We then characterize the changes regarding review effort by using various features extracted from change metadata and the change intents. We further explore the feasibility of automatically classifying LRE changes. We conduct our study on a large-scale project from Microsoft and three large-scale open source projects, i.e., Qt, Android, and OpenStack. Our results show that, (i) code changes with some intents are more likely to be LRE changes, (ii) machine learning based prediction models can efficiently help identify LRE changes, and (iii) prediction models built for code changes with some intents achieve better performance than prediction models without considering the change intent, the improvement in AUC can be up to 19 percentage points and is 7.4 percentage points on average. The tool developed in this study has already been used in Microsoft to provide the review effort and intent information of changes for reviewers to accelerate the review process.","PeriodicalId":424201,"journal":{"name":"Proceedings of the Fifteenth International Conference on Predictive Models and Data Analytics in Software Engineering","volume":"155 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132827447","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 14

From Reports to Bug-Fix Commits: A 10 Years Dataset of Bug-Fixing Activity from 55 Apache's Open Source Projects 从报告到bug修复提交:来自55个Apache开源项目的10年bug修复活动数据集

Proceedings of the Fifteenth International Conference on Predictive Models and Data Analytics in Software Engineering Pub Date : 2019-09-18 DOI: 10.1145/3345629.3345639

Renan Vieira, Antônio da Silva, L. Rocha, J. Gomes

{"title":"From Reports to Bug-Fix Commits: A 10 Years Dataset of Bug-Fixing Activity from 55 Apache's Open Source Projects","authors":"Renan Vieira, Antônio da Silva, L. Rocha, J. Gomes","doi":"10.1145/3345629.3345639","DOIUrl":"https://doi.org/10.1145/3345629.3345639","url":null,"abstract":"Bugs appear in almost any software development. Solving all or at least a large part of them requires a great deal of time, effort, and budget. Software projects typically use issue tracking systems as a way to report and monitor bug-fixing tasks. In recent years, several researchers have been conducting bug tracking analysis to better understand the problem and thus provide means to reduce costs and improve the efficiency of the bug-fixing task. In this paper, we introduce a new dataset composed of more than 70,000 bug-fix reports from 10 years of bug-fixing activity of 55 projects from the Apache Software Foundation, distributed in 9 categories. We have mined this information from Jira issue track system concerning two different perspectives of reports with closed/resolved status: static (the latest version of reports) and dynamic (the changes that have occurred in reports over time). We also extract information from the commits (if they exist) that fix such bugs from their respective version-control system (Git). We also provide a change analysis that occurs in the reports as a way of illustrating and characterizing the proposed dataset. Once the data extraction process is an error-prone nontrivial task, we believe such initiatives like this could be useful to support researchers in further more detailed investigations.","PeriodicalId":424201,"journal":{"name":"Proceedings of the Fifteenth International Conference on Predictive Models and Data Analytics in Software Engineering","volume":"2016 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130126025","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 19

Does chronology matter in JIT defect prediction?: A Partial Replication Study 时间顺序在JIT缺陷预测中重要吗?:部分重复研究

Proceedings of the Fifteenth International Conference on Predictive Models and Data Analytics in Software Engineering Pub Date : 2019-09-18 DOI: 10.1145/3345629.3351449

H. Jahanshahi, Dhanya Jothimani, Ayse Basar, Mucahit Cevik

{"title":"Does chronology matter in JIT defect prediction?: A Partial Replication Study","authors":"H. Jahanshahi, Dhanya Jothimani, Ayse Basar, Mucahit Cevik","doi":"10.1145/3345629.3351449","DOIUrl":"https://doi.org/10.1145/3345629.3351449","url":null,"abstract":"BACKGROUND: Just-In-Time (JIT) models, unlike the traditional defect prediction models, detect the fix-inducing changes (or defect inducing changes). These models are designed based on the assumption that past code change properties are similar to future ones. However, as the system evolves, the expertise of developers and/or the complexity of the system also change. AIM: In this work, we aim to investigate the effect of code change properties on JIT models over time. We also study the impact of using recent data as well as all available data on the performance of JIT models. Further, we analyze the effect of weighted sampling on the performance of fix-inducing properties of JIT models. For this purpose, we used datasets from four open-source projects, namely Eclipse JDT, Mozilla, Eclipse Platform, and PostgreSQL. METHOD: We used five families of change code properties such as size, diffusion, history, experience, and purpose. We used Random Forest to train and test the JIT model and Brier Score (BS) and Area Under Curve (AUC) for performance measurement. We applied the Wilcoxon Signed Rank Test on the output to statistically validate whether the performance of JIT models improves using all the available data or the recent data. RESULTS: Our paper suggest that the predictive power of JIT models does not change by time. Furthermore, we observed that the chronology of data in JIT defect prediction models can be discarded by considering all the available data. On the other hand, the importance score of families of code change properties is found to oscillate over time. CONCLUSION: To mitigate the impact of the evolution of code change properties, it is recommended to use weighted sampling approach in which more emphasis is placed upon the changes occurring closer to the current time. Moreover, since properties such as \"Expertise of the Developer\" and \"Size\" evolve with the time, the models obtained from old data may exhibit different characteristics compared to those employing the newer dataset. Hence, practitioners should constantly retrain JIT models to include fresh data.","PeriodicalId":424201,"journal":{"name":"Proceedings of the Fifteenth International Conference on Predictive Models and Data Analytics in Software Engineering","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127351094","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Proceedings of the Fifteenth International Conference on Predictive Models and Data Analytics in Software Engineering 第十五届软件工程预测模型与数据分析国际会议论文集

Proceedings of the Fifteenth International Conference on Predictive Models and Data Analytics in Software Engineering Pub Date : 2019-09-18 DOI: 10.1145/3345629

引用次数: 2

Prioritizing automated user interface tests using reinforcement learning 使用强化学习对自动化用户界面测试进行优先级排序

Proceedings of the Fifteenth International Conference on Predictive Models and Data Analytics in Software Engineering Pub Date : 2019-09-18 DOI: 10.1145/3345629.3345636

A. Nguyen, Bach Le, Vu Nguyen

引用次数: 7

Which Refactoring Reduces Bug Rate? 哪种重构能降低Bug率?

Proceedings of the Fifteenth International Conference on Predictive Models and Data Analytics in Software Engineering Pub Date : 2019-09-18 DOI: 10.1145/3345629.3345631

Idan Amit, D. Feitelson

引用次数: 6

Reviewer Recommendation using Software Artifact Traceability Graphs 使用软件工件可追溯性图的审稿人建议

Proceedings of the Fifteenth International Conference on Predictive Models and Data Analytics in Software Engineering Pub Date : 2019-09-18 DOI: 10.1145/3345629.3345637

Emre Sülün, Eray Tüzün, Ugur Dogrusöz

{"title":"Reviewer Recommendation using Software Artifact Traceability Graphs","authors":"Emre Sülün, Eray Tüzün, Ugur Dogrusöz","doi":"10.1145/3345629.3345637","DOIUrl":"https://doi.org/10.1145/3345629.3345637","url":null,"abstract":"Various types of artifacts (requirements, source code, test cases, documents, etc.) are produced throughout the lifecycle of a software. These artifacts are often related with each other via traceability links that are stored in modern application lifecycle management repositories. Throughout the lifecycle of a software, various types of changes can arise in any one of these artifacts. It is important to review such changes to minimize their potential negative impacts. To maximize benefits of the review process, the reviewer(s) should be chosen appropriately. In this study, we reformulate the reviewer suggestion problem using software artifact traceability graphs. We introduce a novel approach, named RSTrace, to automatically recommend reviewers that are best suited based on their familiarity with a given artifact. The proposed approach, in theory, could be applied to all types of artifacts. For the purpose of this study, we focused on the source code artifact and conducted an experiment on finding the appropriate code reviewer(s). We initially tested RSTrace on an open source project and achieved top-3 recall of 0.85 with an MRR (mean reciprocal ranking) of 0.73. In a further empirical evaluation of 37 open source projects, we confirmed that the proposed reviewer recommendation approach yields promising top-k and MRR scores on the average compared to the existing reviewer recommendation approaches.","PeriodicalId":424201,"journal":{"name":"Proceedings of the Fifteenth International Conference on Predictive Models and Data Analytics in Software Engineering","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120995422","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 16

The Technical Debt Dataset 技术债务数据集

Proceedings of the Fifteenth International Conference on Predictive Models and Data Analytics in Software Engineering Pub Date : 2019-08-02 DOI: 10.1145/3345629.3345630

Valentina Lenarduzzi, Nyyti Saarimäki, D. Taibi

{"title":"The Technical Debt Dataset","authors":"Valentina Lenarduzzi, Nyyti Saarimäki, D. Taibi","doi":"10.1145/3345629.3345630","DOIUrl":"https://doi.org/10.1145/3345629.3345630","url":null,"abstract":"Technical Debt analysis is increasing in popularity as nowadays researchers and industry are adopting various tools for static code analysis to evaluate the quality of their code. Despite this, empirical studies on software projects are expensive because of the time needed to analyze the projects. In addition, the results are difficult to compare as studies commonly consider different projects. In this work, we propose the Technical Debt Dataset, a curated set of project measurement data from 33 Java projects from the Apache Software Foundation. In the Technical Debt Dataset, we analyzed all commits from separately defined time frames with SonarQube to collect Technical Debt information and with Ptidej to detect code smells. Moreover, we extracted all available commit information from the git logs, the refactoring applied with Refactoring Miner, and fault information reported in the issue trackers (Jira). Using this information, we executed the SZZ algorithm to identify the fault-inducing and -fixing commits. We analyzed 78K commits from the selected 33 projects, detecting 1.8M SonarQube issues, 62K code smells, 28K faults and 57K refactorings. The project analysis took more than 200 days. In this paper, we describe the data retrieval pipeline together with the tools used for the analysis. The dataset is made available through CSV files and an SQLite database to facilitate queries on the data. The Technical Debt Dataset aims to open up diverse opportunities for Technical Debt research, enabling researchers to compare results on common projects.","PeriodicalId":424201,"journal":{"name":"Proceedings of the Fifteenth International Conference on Predictive Models and Data Analytics in Software Engineering","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116366624","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 63