Hongyu Zhai, Casey Casalnuovo, Premkumar T. Devanbu
{"title":"Test Coverage in Python Programs","authors":"Hongyu Zhai, Casey Casalnuovo, Premkumar T. Devanbu","doi":"10.1109/MSR.2019.00027","DOIUrl":null,"url":null,"abstract":"We study code coverage in several popular Python projects: flask, matplotlib, pandas, scikit-learn, and scrapy. Coverage data on these projects is gathered and hosted on the Codecov website, from where this data can be mined. Using this data, and a syntactic parse of the code, we examine the effect of control flow structure, statement type (e.g., if, for) and code age on test coverage. We find that coverage depends on control flow structure, with more deeply nested statements being significantly less likely to be covered. This is a clear effect, which holds up in every project, even when controlling for the age of the line (as determined by git blame). We find that the age of a line per se has a small (but statistically significant) positive effect on coverage. Finally, we find that the kind of statement (try, if, except, raise, etc) has varying effects on coverage, with exception-handling statements being covered much less often. These results suggest that developers in Python projects have difficulty writing test sets that cover deeply-nested and error-handling statements, and might need assistance covering such code.","PeriodicalId":6706,"journal":{"name":"2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)","volume":"51 1","pages":"116-120"},"PeriodicalIF":0.0000,"publicationDate":"2019-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MSR.2019.00027","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 13
Abstract
We study code coverage in several popular Python projects: flask, matplotlib, pandas, scikit-learn, and scrapy. Coverage data on these projects is gathered and hosted on the Codecov website, from where this data can be mined. Using this data, and a syntactic parse of the code, we examine the effect of control flow structure, statement type (e.g., if, for) and code age on test coverage. We find that coverage depends on control flow structure, with more deeply nested statements being significantly less likely to be covered. This is a clear effect, which holds up in every project, even when controlling for the age of the line (as determined by git blame). We find that the age of a line per se has a small (but statistically significant) positive effect on coverage. Finally, we find that the kind of statement (try, if, except, raise, etc) has varying effects on coverage, with exception-handling statements being covered much less often. These results suggest that developers in Python projects have difficulty writing test sets that cover deeply-nested and error-handling statements, and might need assistance covering such code.