使用分类数据分析的农学良好统计实践，与具有泊松和二项潜在分布的紫花苜蓿实例

GM crops Pub Date : 2022-05-13 DOI:10.3390/crops2020012

R. Mowers, B. Bucciarelli, Yuanyuan Cao, D. Samac, Zhanyou Xu

{"title":"使用分类数据分析的农学良好统计实践，与具有泊松和二项潜在分布的紫花苜蓿实例","authors":"R. Mowers, B. Bucciarelli, Yuanyuan Cao, D. Samac, Zhanyou Xu","doi":"10.3390/crops2020012","DOIUrl":null,"url":null,"abstract":"Categorical data derived from qualitative classifications or countable quantitative data are common in biological scientific work and crop breeding. Categorical data analyses are important for drawing correct inferences from experiments. However, categorical data can introduce unique issues in data analysis. This paper discusses common problems arising from categorical variable analysis and modeling, demonstrates the issues or risks of misapplying analysis, and suggests approaches to address data analysis challenges using two data sets from alfalfa breeding programs. For each data set, we present several analysis methods, e.g., simple t-test, analysis of variance (ANOVA), split plot analysis, generalized linear model (glm), generalized linear mixed model (glmm) using R with R markdown, and with the standard statistical analysis software SAS/JMP. The goal is to demonstrate good analysis practices for categorical data by comparing the potential ‘bad’ analyses with better ones, avoiding too much reliance on reaching a significant p-value of 0.05, and navigating the morass of ever-increasing numbers of potential R functions. The three main aspects of this research focus on choosing the right data distribution to use, using the correct error terms for hypothesis test p-values including the right type of sum of the squares (Type I, II, and III), and proper statistical models for categorical data analysis. Our results show the importance of good statistical analysis practice to help agronomists, breeders, and other researchers apply appropriate statistical approaches to draw more accurate conclusions from their data.","PeriodicalId":89376,"journal":{"name":"GM crops","volume":"34 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Good Statistical Practices in Agronomy Using Categorical Data Analysis, with Alfalfa Examples Having Poisson and Binomial Underlying Distributions\",\"authors\":\"R. Mowers, B. Bucciarelli, Yuanyuan Cao, D. Samac, Zhanyou Xu\",\"doi\":\"10.3390/crops2020012\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Categorical data derived from qualitative classifications or countable quantitative data are common in biological scientific work and crop breeding. Categorical data analyses are important for drawing correct inferences from experiments. However, categorical data can introduce unique issues in data analysis. This paper discusses common problems arising from categorical variable analysis and modeling, demonstrates the issues or risks of misapplying analysis, and suggests approaches to address data analysis challenges using two data sets from alfalfa breeding programs. For each data set, we present several analysis methods, e.g., simple t-test, analysis of variance (ANOVA), split plot analysis, generalized linear model (glm), generalized linear mixed model (glmm) using R with R markdown, and with the standard statistical analysis software SAS/JMP. The goal is to demonstrate good analysis practices for categorical data by comparing the potential ‘bad’ analyses with better ones, avoiding too much reliance on reaching a significant p-value of 0.05, and navigating the morass of ever-increasing numbers of potential R functions. The three main aspects of this research focus on choosing the right data distribution to use, using the correct error terms for hypothesis test p-values including the right type of sum of the squares (Type I, II, and III), and proper statistical models for categorical data analysis. Our results show the importance of good statistical analysis practice to help agronomists, breeders, and other researchers apply appropriate statistical approaches to draw more accurate conclusions from their data.\",\"PeriodicalId\":89376,\"journal\":{\"name\":\"GM crops\",\"volume\":\"34 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-05-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"GM crops\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3390/crops2020012\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"GM crops","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/crops2020012","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

在生物科学工作和作物育种中，从定性分类或可数的定量数据中获得的分类数据是很常见的。分类数据分析对于从实验中得出正确的推论很重要。然而，分类数据可能会在数据分析中引入独特的问题。本文讨论了分类变量分析和建模中出现的常见问题，展示了错误应用分析的问题或风险，并提出了使用苜蓿育种计划的两个数据集来解决数据分析挑战的方法。对于每个数据集，我们提出了几种分析方法，例如简单t检验，方差分析(ANOVA)，分裂图分析，广义线性模型(glm)，广义线性混合模型(glmm)，使用R带R标记，并使用标准统计分析软件SAS/JMP。目标是通过比较潜在的“坏”分析和更好的分析来展示分类数据的良好分析实践，避免过多地依赖于达到0.05的显著p值，并在不断增加的潜在R函数数量的泥潭中导航。本研究的三个主要方面集中在选择正确的数据分布，使用正确的误差项进行假设检验p值，包括正确的平方和类型(类型I, II和III)，以及分类数据分析的适当统计模型。我们的研究结果显示了良好的统计分析实践对于帮助农学家、育种家和其他研究人员应用适当的统计方法从他们的数据中得出更准确的结论的重要性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Good Statistical Practices in Agronomy Using Categorical Data Analysis, with Alfalfa Examples Having Poisson and Binomial Underlying Distributions

Categorical data derived from qualitative classifications or countable quantitative data are common in biological scientific work and crop breeding. Categorical data analyses are important for drawing correct inferences from experiments. However, categorical data can introduce unique issues in data analysis. This paper discusses common problems arising from categorical variable analysis and modeling, demonstrates the issues or risks of misapplying analysis, and suggests approaches to address data analysis challenges using two data sets from alfalfa breeding programs. For each data set, we present several analysis methods, e.g., simple t-test, analysis of variance (ANOVA), split plot analysis, generalized linear model (glm), generalized linear mixed model (glmm) using R with R markdown, and with the standard statistical analysis software SAS/JMP. The goal is to demonstrate good analysis practices for categorical data by comparing the potential ‘bad’ analyses with better ones, avoiding too much reliance on reaching a significant p-value of 0.05, and navigating the morass of ever-increasing numbers of potential R functions. The three main aspects of this research focus on choosing the right data distribution to use, using the correct error terms for hypothesis test p-values including the right type of sum of the squares (Type I, II, and III), and proper statistical models for categorical data analysis. Our results show the importance of good statistical analysis practice to help agronomists, breeders, and other researchers apply appropriate statistical approaches to draw more accurate conclusions from their data.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

GM crops

自引率

0.00%

发文量