Proceedings of the 3rd ACM SIGSOFT International Workshop on Machine Learning Techniques for Software Quality Evaluation最新文献

A machine learning based automatic folding of dynamically typed languages 基于机器学习的动态类型语言自动折叠

Proceedings of the 3rd ACM SIGSOFT International Workshop on Machine Learning Techniques for Software Quality Evaluation Pub Date : 2019-08-27 DOI: 10.1145/3340482.3342746

N. Viuginov, A. Filchenkov

引用次数: 2

On the role of data balancing for machine learning-based code smell detection 基于机器学习的代码气味检测中数据平衡的作用

Proceedings of the 3rd ACM SIGSOFT International Workshop on Machine Learning Techniques for Software Quality Evaluation Pub Date : 2019-08-27 DOI: 10.1145/3340482.3342744

Fabiano Pecorelli, D. D. Nucci, Coen De Roover, A. D. Lucia

{"title":"On the role of data balancing for machine learning-based code smell detection","authors":"Fabiano Pecorelli, D. D. Nucci, Coen De Roover, A. D. Lucia","doi":"10.1145/3340482.3342744","DOIUrl":"https://doi.org/10.1145/3340482.3342744","url":null,"abstract":"Code smells can compromise software quality in the long term by inducing technical debt. For this reason, many approaches aimed at identifying these design flaws have been proposed in the last decade. Most of them are based on heuristics in which a set of metrics (e.g., code metrics, process metrics) is used to detect smelly code components. However, these techniques suffer of subjective interpretation, low agreement between detectors, and threshold dependability. To overcome these limitations, previous work applied Machine Learning techniques that can learn from previous datasets without needing any threshold definition. However, more recent work has shown that Machine Learning is not always suitable for code smell detection due to the highly unbalanced nature of the problem. In this study we investigate several approaches able to mitigate data unbalancing issues to understand their impact on ML-based approaches for code smell detection. Our findings highlight a number of limitations and open issues with respect to the usage of data balancing in ML-based code smell detection.","PeriodicalId":254040,"journal":{"name":"Proceedings of the 3rd ACM SIGSOFT International Workshop on Machine Learning Techniques for Software Quality Evaluation","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121756098","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 33

Leveraging mutants for automatic prediction of metamorphic relations using machine learning 利用机器学习利用突变体自动预测变形关系

Proceedings of the 3rd ACM SIGSOFT International Workshop on Machine Learning Techniques for Software Quality Evaluation Pub Date : 2019-08-27 DOI: 10.1145/3340482.3342741

Aravind Nair, K. Meinke, Sigrid Eldh

{"title":"Leveraging mutants for automatic prediction of metamorphic relations using machine learning","authors":"Aravind Nair, K. Meinke, Sigrid Eldh","doi":"10.1145/3340482.3342741","DOIUrl":"https://doi.org/10.1145/3340482.3342741","url":null,"abstract":"An oracle is used in software testing to derive the verdict (pass/fail) for a test case. Lack of precise test oracles is one of the major problems in software testing which can hinder judgements about quality. Metamorphic testing is an emerging technique which solves both the oracle problem and the test case generation problem by testing special forms of software requirements known as metamorphic requirements. However, manually deriving the metamorphic requirements for a given program requires a high level of domain expertise, is labor intensive and error prone. As an alternative, we consider the problem of automatic detection of metamorphic requirements using machine learning (ML). For this problem we can apply graph kernels and support vector machines (SVM). A significant problem for any ML approach is to obtain a large labeled training set of data (in this case programs) that generalises well. The main contribution of this paper is a general method to generate large volumes of synthetic training data which can improve ML assisted detection of metamorphic requirements. For training data synthesis we adopt mutation testing techniques. This research is the first to explore the area of data augmentation techniques for ML-based analysis of software code. We also have the goal to enhance black-box testing using white-box methodologies. Our results show that the mutants incorporated into the source code corpus not only efficiently scale the dataset size, but they can also improve the accuracy of classification models.","PeriodicalId":254040,"journal":{"name":"Proceedings of the 3rd ACM SIGSOFT International Workshop on Machine Learning Techniques for Software Quality Evaluation","volume":"272 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116250769","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 16

Risk-based data validation in machine learning-based software systems 基于机器学习的软件系统中基于风险的数据验证

Proceedings of the 3rd ACM SIGSOFT International Workshop on Machine Learning Techniques for Software Quality Evaluation Pub Date : 2019-08-27 DOI: 10.1145/3340482.3342743

Harald Foidl, M. Felderer

{"title":"Risk-based data validation in machine learning-based software systems","authors":"Harald Foidl, M. Felderer","doi":"10.1145/3340482.3342743","DOIUrl":"https://doi.org/10.1145/3340482.3342743","url":null,"abstract":"Data validation is an essential requirement to ensure the reliability and quality of Machine Learning-based Software Systems. However, an exhaustive validation of all data fed to these systems (i.e. up to several thousand features) is practically unfeasible. In addition, there has been little discussion about methods that support software engineers of such systems in determining how thorough to validate each feature (i.e. data validation rigor). Therefore, this paper presents a conceptual data validation approach that prioritizes features based on their estimated risk of poor data quality. The risk of poor data quality is determined by the probability that a feature is of low data quality and the impact of this low (data) quality feature on the result of the machine learning model. Three criteria are presented to estimate the probability of low data quality (Data Source Quality, Data Smells, Data Pipeline Quality). To determine the impact of low (data) quality features, the importance of features according to the performance of the machine learning model (i.e. Feature Importance) is utilized. The presented approach provides decision support (i.e. data validation prioritization and rigor) for software engineers during the implementation of data validation techniques in the course of deploying a trained machine learning model and its software stack.","PeriodicalId":254040,"journal":{"name":"Proceedings of the 3rd ACM SIGSOFT International Workshop on Machine Learning Techniques for Software Quality Evaluation","volume":"97 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133519355","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 24

Classifying non-functional requirements using RNN variants for quality software development 使用RNN变体为高质量软件开发分类非功能需求

Proceedings of the 3rd ACM SIGSOFT International Workshop on Machine Learning Techniques for Software Quality Evaluation Pub Date : 2019-08-27 DOI: 10.1145/3340482.3342745

Mohamed Abdur Rahman, Md. Ariful Haque, Md. Nurul Ahad Tawhid, Md. Saeed Siddik

{"title":"Classifying non-functional requirements using RNN variants for quality software development","authors":"Mohamed Abdur Rahman, Md. Ariful Haque, Md. Nurul Ahad Tawhid, Md. Saeed Siddik","doi":"10.1145/3340482.3342745","DOIUrl":"https://doi.org/10.1145/3340482.3342745","url":null,"abstract":"Non-Functional Requirements (NFR), a set of quality attributes, required for software architectural design. Which are usually scattered in SRS and must be extracted for quality software development to meet user expectations. Researchers show that functional and non-functional requirements are mixed together within the same SRS, which requires a mammoth effort for distinguishing them. Automatic NFR classification would be a feasible way to characterize those requirements, where several techniques have been recommended e.g. IR, linguistic knowledge, etc. However, conventional supervised machine learning methods suffered for word representation problem and usually required hand-crafted features, which will be overcome by proposed research using RNN variants to categories NFR. The NFR are interrelated and one task happens after another, which is the ideal situation for RNN. In this approach, requirements are processed to eliminate unnecessary contents, which are used to extract features using word2vec to fed as input of RNN variants LSTM and GRU. Performance has been evaluated using PROMISE dataset considering several statistical analyses. Among those models, precision, recall, and f1-score of LSTM validation are 0.973, 0.967 and 0.966 respectively, which is higher over CNN and GRU models. LSTM also correctly classified minimum 60% and maximum 80% unseen requirements. In addition, classification accuracy of LSTM is 6.1% better than GRU, which concluded that RNN variants can lead to better classification results, and LSTM is more suitable for NFR classification from textual requirements.","PeriodicalId":254040,"journal":{"name":"Proceedings of the 3rd ACM SIGSOFT International Workshop on Machine Learning Techniques for Software Quality Evaluation","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125820144","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 27

Towards surgically-precise technical debt estimation: early results and research roadmap 迈向精确的技术债务评估:早期结果和研究路线图

Proceedings of the 3rd ACM SIGSOFT International Workshop on Machine Learning Techniques for Software Quality Evaluation Pub Date : 2019-08-02 DOI: 10.1145/3340482.3342747

Valentina Lenarduzzi, A. Martini, D. Taibi, D. Tamburri

引用次数: 31

SZZ unleashed: an open implementation of the SZZ algorithm - featuring example usage in a study of just-in-time bug prediction for the Jenkins project SZZ释放:SZZ算法的开放实现-在Jenkins项目的实时错误预测研究中提供示例使用

Proceedings of the 3rd ACM SIGSOFT International Workshop on Machine Learning Techniques for Software Quality Evaluation Pub Date : 2019-03-05 DOI: 10.1145/3340482.3342742

Markus Borg, O. Svensson, Kristian Berg, Daniel Hansson

{"title":"SZZ unleashed: an open implementation of the SZZ algorithm - featuring example usage in a study of just-in-time bug prediction for the Jenkins project","authors":"Markus Borg, O. Svensson, Kristian Berg, Daniel Hansson","doi":"10.1145/3340482.3342742","DOIUrl":"https://doi.org/10.1145/3340482.3342742","url":null,"abstract":"Machine learning applications in software engineering often rely on detailed information about bugs. While issue trackers often contain information about when bugs were fixed, details about when they were introduced to the system are often absent. As a remedy, researchers often rely on the SZZ algorithm as a heuristic approach to identify bug-introducing software changes. Unfortunately, as reported in a recent systematic literature review, few researchers have made their SZZ implementations publicly available. Consequently, there is a risk that research effort is wasted as new projects based on SZZ output need to initially reimplement the approach. Furthermore, there is a risk that newly developed (closed source) SZZ implementations have not been properly tested, thus conducting research based on their output might introduce threats to validity. We present SZZ Unleashed, an open implementation of the SZZ algorithm for git repositories. This paper describes our implementation along with a usage example for the Jenkins project, and conclude with an illustrative study on just-in-time bug prediction. We hope to continue evolving SZZ Unleashed on GitHub, and warmly invite the community to contribute.","PeriodicalId":254040,"journal":{"name":"Proceedings of the 3rd ACM SIGSOFT International Workshop on Machine Learning Techniques for Software Quality Evaluation","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126073957","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 69

Proceedings of the 3rd ACM SIGSOFT International Workshop on Machine Learning Techniques for Software Quality Evaluation 第三届ACM SIGSOFT软件质量评估机器学习技术国际研讨会论文集

Proceedings of the 3rd ACM SIGSOFT International Workshop on Machine Learning Techniques for Software Quality Evaluation Pub Date : 1900-01-01 DOI: 10.1145/3340482

引用次数: 2