生物统计学中学生t检验、Mann-Whitney U检验、卡方检验和Kruskal-Wallis检验的简单使用指南。

IF 6.1 3区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Biodata Mining Pub Date : 2025-08-20 DOI:10.1186/s13040-025-00465-6

Davide Chicco, Andrea Sichenze, Giuseppe Jurman

{"title":"生物统计学中学生t检验、Mann-Whitney U检验、卡方检验和Kruskal-Wallis检验的简单使用指南。","authors":"Davide Chicco, Andrea Sichenze, Giuseppe Jurman","doi":"10.1186/s13040-025-00465-6","DOIUrl":null,"url":null,"abstract":"In an age when machine learning and artificial intelligence are broadly employed, traditional statistics can still provide insightful information and results quickly and at a low computational cost. Statistics, in fact, offers many useful tools to researchers, including a series of univariate statistical tests that can identify relationships between pairs of numeric samples: Student's t-test, Mann-Whitney U test, Chi-squared test, and Kruskal-Wallis test. These tests generate several outcomes, including probability values (p-values) that can express a numerical quantity which accepts or rejects the null hypothesis, based on a certain threshold used. Although effective, these tests are often misused or employed in the wrong contexts, especially among biostatistics studies. Many scientific researchers do not seem to know how to choose one test over the others, and this misuse can lead to incorrect results and wrong conclusions. Here we present a simple theoretical and practical guide to the use of these four tests, first describing their theoretical properties and then displaying the results obtained by applying these tests to real-world medical datasets. Eventually, we explain when and how to use each test based on the data types of the samples considered. Our study can have a strong impact on scientific research by potentially influencing future studies involving these tests. Our recommendations, in turn, can help researchers produce more reliable and sound scientific results, thus increasing the quality of multiple scientific studies across various fields.","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"18 1","pages":"56"},"PeriodicalIF":6.1000,"publicationDate":"2025-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12366075/pdf/","citationCount":"0","resultStr":"{\"title\":\"A simple guide to the use of Student's t-test, Mann-Whitney U test, Chi-squared test, and Kruskal-Wallis test in biostatistics.\",\"authors\":\"Davide Chicco, Andrea Sichenze, Giuseppe Jurman\",\"doi\":\"10.1186/s13040-025-00465-6\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In an age when machine learning and artificial intelligence are broadly employed, traditional statistics can still provide insightful information and results quickly and at a low computational cost. Statistics, in fact, offers many useful tools to researchers, including a series of univariate statistical tests that can identify relationships between pairs of numeric samples: Student's t-test, Mann-Whitney U test, Chi-squared test, and Kruskal-Wallis test. These tests generate several outcomes, including probability values (p-values) that can express a numerical quantity which accepts or rejects the null hypothesis, based on a certain threshold used. Although effective, these tests are often misused or employed in the wrong contexts, especially among biostatistics studies. Many scientific researchers do not seem to know how to choose one test over the others, and this misuse can lead to incorrect results and wrong conclusions. Here we present a simple theoretical and practical guide to the use of these four tests, first describing their theoretical properties and then displaying the results obtained by applying these tests to real-world medical datasets. Eventually, we explain when and how to use each test based on the data types of the samples considered. Our study can have a strong impact on scientific research by potentially influencing future studies involving these tests. Our recommendations, in turn, can help researchers produce more reliable and sound scientific results, thus increasing the quality of multiple scientific studies across various fields.\",\"PeriodicalId\":48947,\"journal\":{\"name\":\"Biodata Mining\",\"volume\":\"18 1\",\"pages\":\"56\"},\"PeriodicalIF\":6.1000,\"publicationDate\":\"2025-08-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12366075/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Biodata Mining\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1186/s13040-025-00465-6\",\"RegionNum\":3,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"MATHEMATICAL & COMPUTATIONAL BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biodata Mining","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s13040-025-00465-6","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}

引用次数: 0

摘要

在一个机器学习和人工智能被广泛应用的时代，传统统计仍然可以以较低的计算成本快速提供有洞察力的信息和结果。事实上，统计学为研究人员提供了许多有用的工具，包括一系列可以识别数字样本对之间关系的单变量统计检验：学生t检验、Mann-Whitney U检验、卡方检验和Kruskal-Wallis检验。这些检验产生若干结果，包括概率值（p值），它可以根据所使用的某个阈值表示接受或拒绝原假设的数值。这些测试虽然有效，但经常被误用或在错误的情况下使用，特别是在生物统计学研究中。许多科学研究人员似乎不知道如何选择一种测试而不是其他测试，这种滥用可能导致不正确的结果和错误的结论。在这里，我们提供了一个简单的理论和实践指南来使用这四个测试，首先描述了它们的理论特性，然后展示了将这些测试应用于现实世界的医疗数据集所获得的结果。最后，我们将根据所考虑的样本的数据类型解释何时以及如何使用每个测试。我们的研究可能会影响未来涉及这些测试的研究，从而对科学研究产生重大影响。反过来，我们的建议可以帮助研究人员产生更可靠、更合理的科学结果，从而提高各个领域的多项科学研究的质量。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

A simple guide to the use of Student's t-test, Mann-Whitney U test, Chi-squared test, and Kruskal-Wallis test in biostatistics.

查看原文本刊更多论文

A simple guide to the use of Student's t-test, Mann-Whitney U test, Chi-squared test, and Kruskal-Wallis test in biostatistics.

In an age when machine learning and artificial intelligence are broadly employed, traditional statistics can still provide insightful information and results quickly and at a low computational cost. Statistics, in fact, offers many useful tools to researchers, including a series of univariate statistical tests that can identify relationships between pairs of numeric samples: Student's t-test, Mann-Whitney U test, Chi-squared test, and Kruskal-Wallis test. These tests generate several outcomes, including probability values (p-values) that can express a numerical quantity which accepts or rejects the null hypothesis, based on a certain threshold used. Although effective, these tests are often misused or employed in the wrong contexts, especially among biostatistics studies. Many scientific researchers do not seem to know how to choose one test over the others, and this misuse can lead to incorrect results and wrong conclusions. Here we present a simple theoretical and practical guide to the use of these four tests, first describing their theoretical properties and then displaying the results obtained by applying these tests to real-world medical datasets. Eventually, we explain when and how to use each test based on the data types of the samples considered. Our study can have a strong impact on scientific research by potentially influencing future studies involving these tests. Our recommendations, in turn, can help researchers produce more reliable and sound scientific results, thus increasing the quality of multiple scientific studies across various fields.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Biodata Mining MATHEMATICAL & COMPUTATIONAL BIOLOGY-

CiteScore

7.90

自引率

0.00%

发文量

审稿时长

23 weeks

期刊介绍： BioData Mining is an open access, open peer-reviewed journal encompassing research on all aspects of data mining applied to high-dimensional biological and biomedical data, focusing on computational aspects of knowledge discovery from large-scale genetic, transcriptomic, genomic, proteomic, and metabolomic data. Topical areas include, but are not limited to: -Development, evaluation, and application of novel data mining and machine learning algorithms. -Adaptation, evaluation, and application of traditional data mining and machine learning algorithms. -Open-source software for the application of data mining and machine learning algorithms. -Design, development and integration of databases, software and web services for the storage, management, retrieval, and analysis of data from large scale studies. -Pre-processing, post-processing, modeling, and interpretation of data mining and machine learning results for biological interpretation and knowledge discovery.