Automating Test Case Identification in Java Open Source Projects on GitHub

Matej Madeja, J. Porubän, M. Bačíková, Matúš Sulír, Ján Juhár, Sergej Chodarev, Filip Gurbáľ
{"title":"Automating Test Case Identification in Java Open Source Projects on GitHub","authors":"Matej Madeja, J. Porubän, M. Bačíková, Matúš Sulír, Ján Juhár, Sergej Chodarev, Filip Gurbáľ","doi":"10.31577/cai_2021_3_575","DOIUrl":null,"url":null,"abstract":"Software testing is one of the very important Quality Assurance (QA) components. A lot of researchers deal with the testing process in terms of tester motivation and how tests should or should not be written. However, it is not known from the recommendations how the tests are written in real projects. In this paper, the following was investigated: (i) the denotation of the word\"test\"in different natural languages; (ii) whether the number of occurrences of the word\"test\"correlates with the number of test cases; and (iii) what testing frameworks are mostly used. The analysis was performed on 38 GitHub open source repositories thoroughly selected from the set of 4.3M GitHub projects. We analyzed 20,340 test cases in 803 classes manually and 170k classes using an automated approach. The results show that: (i) there exists a weak correlation (r = 0.655) between the number of occurrences of the word\"test\"and the number of test cases in a class; (ii) the proposed algorithm using static file analysis correctly detected 97% of test cases; (iii) 15% of the analyzed classes used main() function whose represent regular Java programs that test the production code without using any third-party framework. The identification of such tests is very complex due to implementation diversity. The results may be leveraged to more quickly identify and locate test cases in a repository, to understand practices in customized testing solutions, and to mine tests to improve program comprehension in the future.","PeriodicalId":345268,"journal":{"name":"Comput. Informatics","volume":"33 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Comput. Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.31577/cai_2021_3_575","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Software testing is one of the very important Quality Assurance (QA) components. A lot of researchers deal with the testing process in terms of tester motivation and how tests should or should not be written. However, it is not known from the recommendations how the tests are written in real projects. In this paper, the following was investigated: (i) the denotation of the word"test"in different natural languages; (ii) whether the number of occurrences of the word"test"correlates with the number of test cases; and (iii) what testing frameworks are mostly used. The analysis was performed on 38 GitHub open source repositories thoroughly selected from the set of 4.3M GitHub projects. We analyzed 20,340 test cases in 803 classes manually and 170k classes using an automated approach. The results show that: (i) there exists a weak correlation (r = 0.655) between the number of occurrences of the word"test"and the number of test cases in a class; (ii) the proposed algorithm using static file analysis correctly detected 97% of test cases; (iii) 15% of the analyzed classes used main() function whose represent regular Java programs that test the production code without using any third-party framework. The identification of such tests is very complex due to implementation diversity. The results may be leveraged to more quickly identify and locate test cases in a repository, to understand practices in customized testing solutions, and to mine tests to improve program comprehension in the future.
在GitHub上的Java开源项目中自动化测试用例识别
软件测试是非常重要的质量保证(QA)组件之一。许多研究人员从测试人员的动机以及应该或不应该如何编写测试的角度来处理测试过程。然而,从建议中并不知道如何在实际项目中编写测试。本文主要研究了“test”一词在不同自然语言中的外延;(ii)“测试”一词的出现次数是否与测试个案的数目有关;(iii)最常用的测试框架是什么。分析是在38个GitHub开源存储库上进行的,这些存储库是从430万个GitHub项目中挑选出来的。我们手动分析了803个类中的20340个测试用例,并使用自动化方法分析了170k个类。结果表明:(1)类中“test”一词的出现次数与测试用例的数量之间存在弱相关(r = 0.655);(ii)采用静态文件分析的算法正确检测了97%的测试用例;(iii) 15%的分析类使用main()函数,它代表常规的Java程序,测试产品代码而不使用任何第三方框架。由于实现的多样性,这类测试的识别非常复杂。可以利用这些结果来更快地识别和定位存储库中的测试用例,理解定制测试解决方案中的实践,并挖掘测试以改进将来的程序理解。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信