{"title":"Applying Bayesian Analysis Guidelines to Empirical Software Engineering Data: The Case of Programming Languages and Code Quality","authors":"Carlo A. Furia, R. Torkar, R. Feldt","doi":"10.1145/3490953","DOIUrl":null,"url":null,"abstract":"Statistical analysis is the tool of choice to turn data into information and then information into empirical knowledge. However, the process that goes from data to knowledge is long, uncertain, and riddled with pitfalls. To be valid, it should be supported by detailed, rigorous guidelines that help ferret out issues with the data or model and lead to qualified results that strike a reasonable balance between generality and practical relevance. Such guidelines are being developed by statisticians to support the latest techniques for Bayesian data analysis. In this article, we frame these guidelines in a way that is apt to empirical research in software engineering. To demonstrate the guidelines in practice, we apply them to reanalyze a GitHub dataset about code quality in different programming languages. The dataset’s original analysis [Ray et al. 55] and a critical reanalysis [Berger et al. 6] have attracted considerable attention—in no small part because they target a topic (the impact of different programming languages) on which strong opinions abound. The goals of our reanalysis are largely orthogonal to this previous work, as we are concerned with demonstrating, on data in an interesting domain, how to build a principled Bayesian data analysis and to showcase its benefits. In the process, we will also shed light on some critical aspects of the analyzed data and of the relationship between programming languages and code quality—such as the impact of project-specific characteristics other than the used programming language. The high-level conclusions of our exercise will be that Bayesian statistical techniques can be applied to analyze software engineering data in a way that is principled, flexible, and leads to convincing results that inform the state-of-the-art while highlighting the boundaries of its validity. The guidelines can support building solid statistical analyses and connecting their results. Thus, they can help buttress continued progress in empirical software engineering research.","PeriodicalId":7398,"journal":{"name":"ACM Transactions on Software Engineering and Methodology (TOSEM)","volume":"8 1","pages":"1 - 38"},"PeriodicalIF":0.0000,"publicationDate":"2021-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Software Engineering and Methodology (TOSEM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3490953","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 13
Abstract
Statistical analysis is the tool of choice to turn data into information and then information into empirical knowledge. However, the process that goes from data to knowledge is long, uncertain, and riddled with pitfalls. To be valid, it should be supported by detailed, rigorous guidelines that help ferret out issues with the data or model and lead to qualified results that strike a reasonable balance between generality and practical relevance. Such guidelines are being developed by statisticians to support the latest techniques for Bayesian data analysis. In this article, we frame these guidelines in a way that is apt to empirical research in software engineering. To demonstrate the guidelines in practice, we apply them to reanalyze a GitHub dataset about code quality in different programming languages. The dataset’s original analysis [Ray et al. 55] and a critical reanalysis [Berger et al. 6] have attracted considerable attention—in no small part because they target a topic (the impact of different programming languages) on which strong opinions abound. The goals of our reanalysis are largely orthogonal to this previous work, as we are concerned with demonstrating, on data in an interesting domain, how to build a principled Bayesian data analysis and to showcase its benefits. In the process, we will also shed light on some critical aspects of the analyzed data and of the relationship between programming languages and code quality—such as the impact of project-specific characteristics other than the used programming language. The high-level conclusions of our exercise will be that Bayesian statistical techniques can be applied to analyze software engineering data in a way that is principled, flexible, and leads to convincing results that inform the state-of-the-art while highlighting the boundaries of its validity. The guidelines can support building solid statistical analyses and connecting their results. Thus, they can help buttress continued progress in empirical software engineering research.
统计分析是将数据转化为信息,然后将信息转化为经验知识的首选工具。然而,从数据到知识的过程是漫长的、不确定的,而且充满了陷阱。为了有效,它应该得到详细的、严格的指导方针的支持,这些指导方针可以帮助找出数据或模型的问题,并得出合格的结果,在一般性和实际相关性之间取得合理的平衡。统计学家正在制定这样的指导方针,以支持贝叶斯数据分析的最新技术。在本文中,我们以一种适合于软件工程中的实证研究的方式来构建这些指导方针。为了在实践中演示这些指导方针,我们将它们应用于重新分析关于不同编程语言的代码质量的GitHub数据集。数据集的原始分析[Ray et al. 55]和关键的重新分析[Berger et al. 6]已经引起了相当大的关注,这在很大程度上是因为他们针对的主题(不同编程语言的影响)有很多强烈的意见。我们重新分析的目标与之前的工作在很大程度上是正交的,因为我们关心的是如何在一个有趣的领域中展示数据,如何构建一个有原则的贝叶斯数据分析并展示其好处。在这个过程中,我们还将阐明分析数据的一些关键方面,以及编程语言和代码质量之间的关系,比如项目特定特征的影响,而不是使用的编程语言。我们练习的高级结论是,贝叶斯统计技术可以以一种有原则的、灵活的方式应用于分析软件工程数据,并得出令人信服的结果,在强调其有效性边界的同时,告知最先进的技术。这些指导方针可以支持建立可靠的统计分析并将其结果联系起来。因此,它们可以帮助支持实证软件工程研究的持续进展。