Github软件缺陷数据收集框架

2022 12th International Conference on Cloud Computing, Data Science & Engineering (Confluence) Pub Date : 2022-01-27 DOI:10.1109/confluence52989.2022.9734131

Vikas Suhag, S. Dubey, Bhupendra Kumar Sharma

{"title":"Github软件缺陷数据收集框架","authors":"Vikas Suhag, S. Dubey, Bhupendra Kumar Sharma","doi":"10.1109/confluence52989.2022.9734131","DOIUrl":null,"url":null,"abstract":"Software has become part of every sphere of life. This increasing dependence on software has put tremendous pressure on software development teams to deliver software applications as early as possible at the cost of compromised software quality and reliability. Software quality requires extensive testing and validation of software, which is not possible with limited human resources, time and budget, so researchers moved to a new paradigm of software quality assurance i.e., Software Defect Prediction (SDP). SDP aims to build automated Machine Learning (ML) models to aid development teams in prioritizing the key aspects of software testing while maintaining the short software development life cycle. SDP requires huge amount of data to train and test ML models, traditionally PROMISE and NASA defect datasets are most prominently used by researchers, but with changes in programming languages, programming styles and limited size of datasets has made them infeasible for SDP in current scenarios. In this paper, we have developed a software defect dataset collection framework, which mines commit level defect data from GitHub. The efficiency of data mining, accuracy of data and validity of data is verified by SDP models. Results shows that proposed method is feasible as well as efficient to execute even on regular computer systems.","PeriodicalId":261941,"journal":{"name":"2022 12th International Conference on Cloud Computing, Data Science & Engineering (Confluence)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Software Defect Data Collection Framework for Github\",\"authors\":\"Vikas Suhag, S. Dubey, Bhupendra Kumar Sharma\",\"doi\":\"10.1109/confluence52989.2022.9734131\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Software has become part of every sphere of life. This increasing dependence on software has put tremendous pressure on software development teams to deliver software applications as early as possible at the cost of compromised software quality and reliability. Software quality requires extensive testing and validation of software, which is not possible with limited human resources, time and budget, so researchers moved to a new paradigm of software quality assurance i.e., Software Defect Prediction (SDP). SDP aims to build automated Machine Learning (ML) models to aid development teams in prioritizing the key aspects of software testing while maintaining the short software development life cycle. SDP requires huge amount of data to train and test ML models, traditionally PROMISE and NASA defect datasets are most prominently used by researchers, but with changes in programming languages, programming styles and limited size of datasets has made them infeasible for SDP in current scenarios. In this paper, we have developed a software defect dataset collection framework, which mines commit level defect data from GitHub. The efficiency of data mining, accuracy of data and validity of data is verified by SDP models. Results shows that proposed method is feasible as well as efficient to execute even on regular computer systems.\",\"PeriodicalId\":261941,\"journal\":{\"name\":\"2022 12th International Conference on Cloud Computing, Data Science & Engineering (Confluence)\",\"volume\":\"4 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-01-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 12th International Conference on Cloud Computing, Data Science & Engineering (Confluence)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/confluence52989.2022.9734131\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 12th International Conference on Cloud Computing, Data Science & Engineering (Confluence)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/confluence52989.2022.9734131","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

软件已经成为生活各个领域的一部分。这种对软件日益增长的依赖给软件开发团队带来了巨大的压力，要求他们以牺牲软件质量和可靠性为代价，尽早交付软件应用程序。软件质量需要对软件进行广泛的测试和验证，这在有限的人力资源、时间和预算下是不可能的，因此研究人员转向了软件质量保证的新范式，即软件缺陷预测(SDP)。SDP旨在构建自动化机器学习(ML)模型，以帮助开发团队确定软件测试的关键方面的优先级，同时保持较短的软件开发生命周期。SDP需要大量的数据来训练和测试ML模型，传统上研究人员最常使用PROMISE和NASA缺陷数据集，但随着编程语言、编程风格的变化和数据集规模的限制，它们在当前场景下对SDP来说是不可行的。在本文中，我们开发了一个软件缺陷数据集收集框架，该框架从GitHub中挖掘提交级缺陷数据。通过SDP模型验证了数据挖掘的效率、数据的准确性和数据的有效性。结果表明，该方法是可行的，并且在普通计算机系统上也能有效地执行。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Software Defect Data Collection Framework for Github

Software has become part of every sphere of life. This increasing dependence on software has put tremendous pressure on software development teams to deliver software applications as early as possible at the cost of compromised software quality and reliability. Software quality requires extensive testing and validation of software, which is not possible with limited human resources, time and budget, so researchers moved to a new paradigm of software quality assurance i.e., Software Defect Prediction (SDP). SDP aims to build automated Machine Learning (ML) models to aid development teams in prioritizing the key aspects of software testing while maintaining the short software development life cycle. SDP requires huge amount of data to train and test ML models, traditionally PROMISE and NASA defect datasets are most prominently used by researchers, but with changes in programming languages, programming styles and limited size of datasets has made them infeasible for SDP in current scenarios. In this paper, we have developed a software defect dataset collection framework, which mines commit level defect data from GitHub. The efficiency of data mining, accuracy of data and validity of data is verified by SDP models. Results shows that proposed method is feasible as well as efficient to execute even on regular computer systems.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 12th International Conference on Cloud Computing, Data Science & Engineering (Confluence)

自引率

0.00%

发文量