回归模型的正态性测试：错误比比皆是（但可能无关紧要）。

IF 2.9 3区综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES

Royal Society Open Science Pub Date : 2025-04-30 eCollection Date: 2025-04-01 DOI:10.1098/rsos.241904

Stephen Midway, J Wilson White

{"title":"回归模型的正态性测试：错误比比皆是（但可能无关紧要）。","authors":"Stephen Midway, J Wilson White","doi":"10.1098/rsos.241904","DOIUrl":null,"url":null,"abstract":"This study examines the misuse of normality tests in linear regression within ecology and biology, focusing on common misconceptions. A bibliometric review found that over 70% of ecology papers and 90% of biology papers incorrectly applied normality tests to raw data instead of model residuals. To assess the impact of this error, we simulated datasets with normal, interval, and skewed distributions across various sample and effect sizes. We compared statistical power between two approaches: testing the whole dataset for normality (incorrect) versus testing model residuals (correct) to determine whether to use a parametric (t-test) or nonparametric (Mann-Whitney U test) method. Our results showed minimal differences in statistical power between the approaches, even when normality was incorrectly tested on raw data. However, when residuals violated the normality assumption, using the Mann-Whitney U test increased statistical power by 3-4%. Overall, the study suggests that, while correctly testing residuals for normality enhances model performance, the impact of testing raw data is negligible in terms of power loss, especially with large sample sizes. The findings highlight the need for more awareness of proper statistical practices, especially in evaluating the assumptions of linear models.","PeriodicalId":21525,"journal":{"name":"Royal Society Open Science","volume":"12 4","pages":"241904"},"PeriodicalIF":2.9000,"publicationDate":"2025-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12040466/pdf/","citationCount":"0","resultStr":"{\"title\":\"Testing for normality in regression models: mistakes abound (but may not matter).\",\"authors\":\"Stephen Midway, J Wilson White\",\"doi\":\"10.1098/rsos.241904\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This study examines the misuse of normality tests in linear regression within ecology and biology, focusing on common misconceptions. A bibliometric review found that over 70% of ecology papers and 90% of biology papers incorrectly applied normality tests to raw data instead of model residuals. To assess the impact of this error, we simulated datasets with normal, interval, and skewed distributions across various sample and effect sizes. We compared statistical power between two approaches: testing the whole dataset for normality (incorrect) versus testing model residuals (correct) to determine whether to use a parametric (t-test) or nonparametric (Mann-Whitney U test) method. Our results showed minimal differences in statistical power between the approaches, even when normality was incorrectly tested on raw data. However, when residuals violated the normality assumption, using the Mann-Whitney U test increased statistical power by 3-4%. Overall, the study suggests that, while correctly testing residuals for normality enhances model performance, the impact of testing raw data is negligible in terms of power loss, especially with large sample sizes. The findings highlight the need for more awareness of proper statistical practices, especially in evaluating the assumptions of linear models.\",\"PeriodicalId\":21525,\"journal\":{\"name\":\"Royal Society Open Science\",\"volume\":\"12 4\",\"pages\":\"241904\"},\"PeriodicalIF\":2.9000,\"publicationDate\":\"2025-04-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12040466/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Royal Society Open Science\",\"FirstCategoryId\":\"103\",\"ListUrlMain\":\"https://doi.org/10.1098/rsos.241904\",\"RegionNum\":3,\"RegionCategory\":\"综合性期刊\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/4/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q1\",\"JCRName\":\"MULTIDISCIPLINARY SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Royal Society Open Science","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.1098/rsos.241904","RegionNum":3,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/4/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}

引用次数: 0

摘要

本研究探讨了在生态学和生物学的线性回归中滥用正态性测试，重点是常见的误解。一项文献计量学综述发现，超过70%的生态学论文和90%的生物学论文错误地将正态性检验应用于原始数据，而不是模型残差。为了评估该误差的影响，我们模拟了不同样本和效应大小的正态分布、区间分布和偏态分布的数据集。我们比较了两种方法之间的统计能力：测试整个数据集的正态性（不正确）与测试模型残差（正确），以确定是否使用参数（t检验）或非参数（Mann-Whitney U检验）方法。我们的结果显示，即使在对原始数据进行不正确的正态性测试时，两种方法之间的统计能力差异也很小。然而，当残差违反正态性假设时，使用Mann-Whitney U检验将统计能力提高了3-4%。总体而言，该研究表明，虽然正确测试残差的正态性可以提高模型性能，但测试原始数据的影响在功率损失方面可以忽略不计，特别是在大样本量的情况下。研究结果突出表明，需要更多地了解适当的统计实践，特别是在评估线性模型的假设方面。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Testing for normality in regression models: mistakes abound (but may not matter).

This study examines the misuse of normality tests in linear regression within ecology and biology, focusing on common misconceptions. A bibliometric review found that over 70% of ecology papers and 90% of biology papers incorrectly applied normality tests to raw data instead of model residuals. To assess the impact of this error, we simulated datasets with normal, interval, and skewed distributions across various sample and effect sizes. We compared statistical power between two approaches: testing the whole dataset for normality (incorrect) versus testing model residuals (correct) to determine whether to use a parametric (t-test) or nonparametric (Mann-Whitney U test) method. Our results showed minimal differences in statistical power between the approaches, even when normality was incorrectly tested on raw data. However, when residuals violated the normality assumption, using the Mann-Whitney U test increased statistical power by 3-4%. Overall, the study suggests that, while correctly testing residuals for normality enhances model performance, the impact of testing raw data is negligible in terms of power loss, especially with large sample sizes. The findings highlight the need for more awareness of proper statistical practices, especially in evaluating the assumptions of linear models.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Royal Society Open Science Multidisciplinary-Multidisciplinary

CiteScore

6.00

自引率

0.00%

发文量

508

审稿时长

14 weeks

期刊介绍： Royal Society Open Science is a new open journal publishing high-quality original research across the entire range of science on the basis of objective peer-review. The journal covers the entire range of science and mathematics and will allow the Society to publish all the high-quality work it receives without the usual restrictions on scope, length or impact.