{"title":"使用嵌套狄利克雷分布比较成分间正相关的两个独立总体组成数据。","authors":"Jacob A Turner,Bianca A Luedeker,Monnie McGee","doi":"10.1037/met0000702","DOIUrl":null,"url":null,"abstract":"Compositional data are multivariate data made up of components that sum to a fixed value. Often the data are presented as proportions of a whole, where the value of each component is constrained to be between 0 and 1 and the sum of the components is 1. There are many applications in psychology and other disciplines that yield compositional data sets including Morris water maze experiments, psychological well-being scores, analysis of daily physical activity times, and components of household expenditures. Statistical methods exist for compositional data and typically consist of two approaches. The first is to use transformation strategies, such as log ratios, which can lead to results that are challenging to interpret. The second involves using an appropriate distribution, such as the Dirichlet distribution, that captures the key characteristics of compositional data, and allows for ready interpretation of downstream analysis. Unfortunately, the Dirichlet distribution has constraints on variance and correlation that render it inappropriate for some applications. As a result, practicing researchers will often resort to standard two-sample t test or analysis of variance models for each variable in the composition to detect differences in means. We show that a recently published method using the Dirichlet distribution can drastically inflate Type I error rates, and we introduce a global two-sample test to detect differences in mean proportion of components for two independent groups where both groups are from either a Dirichlet or a more flexible nested Dirichlet distribution. We also derive confidence interval formulas for individual components for post hoc testing and further interpretation of results. We illustrate the utility of our methods using a recent Morris water maze experiment and human activity data. (PsycInfo Database Record (c) 2025 APA, all rights reserved).","PeriodicalId":20782,"journal":{"name":"Psychological methods","volume":"7 1","pages":""},"PeriodicalIF":7.6000,"publicationDate":"2025-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Comparison of two independent populations of compositional data with positive correlations among components using a nested dirichlet distribution.\",\"authors\":\"Jacob A Turner,Bianca A Luedeker,Monnie McGee\",\"doi\":\"10.1037/met0000702\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Compositional data are multivariate data made up of components that sum to a fixed value. Often the data are presented as proportions of a whole, where the value of each component is constrained to be between 0 and 1 and the sum of the components is 1. There are many applications in psychology and other disciplines that yield compositional data sets including Morris water maze experiments, psychological well-being scores, analysis of daily physical activity times, and components of household expenditures. Statistical methods exist for compositional data and typically consist of two approaches. The first is to use transformation strategies, such as log ratios, which can lead to results that are challenging to interpret. The second involves using an appropriate distribution, such as the Dirichlet distribution, that captures the key characteristics of compositional data, and allows for ready interpretation of downstream analysis. Unfortunately, the Dirichlet distribution has constraints on variance and correlation that render it inappropriate for some applications. As a result, practicing researchers will often resort to standard two-sample t test or analysis of variance models for each variable in the composition to detect differences in means. We show that a recently published method using the Dirichlet distribution can drastically inflate Type I error rates, and we introduce a global two-sample test to detect differences in mean proportion of components for two independent groups where both groups are from either a Dirichlet or a more flexible nested Dirichlet distribution. We also derive confidence interval formulas for individual components for post hoc testing and further interpretation of results. We illustrate the utility of our methods using a recent Morris water maze experiment and human activity data. (PsycInfo Database Record (c) 2025 APA, all rights reserved).\",\"PeriodicalId\":20782,\"journal\":{\"name\":\"Psychological methods\",\"volume\":\"7 1\",\"pages\":\"\"},\"PeriodicalIF\":7.6000,\"publicationDate\":\"2025-01-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Psychological methods\",\"FirstCategoryId\":\"102\",\"ListUrlMain\":\"https://doi.org/10.1037/met0000702\",\"RegionNum\":1,\"RegionCategory\":\"心理学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"PSYCHOLOGY, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Psychological methods","FirstCategoryId":"102","ListUrlMain":"https://doi.org/10.1037/met0000702","RegionNum":1,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PSYCHOLOGY, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0
摘要
组合数据是由和为固定值的组件组成的多变量数据。数据通常以整体的比例表示,其中每个成分的值被限制在0到1之间,成分的总和为1。在心理学和其他学科中,有许多应用产生了组成数据集,包括莫里斯水迷宫实验、心理健康评分、日常身体活动时间分析和家庭支出组成部分。存在用于组合数据的统计方法,通常包括两种方法。第一种是使用转换策略,例如对数比率,这可能导致难以解释的结果。第二种方法涉及使用适当的分布,例如Dirichlet分布,它捕获了成分数据的关键特征,并允许对下游分析进行现成的解释。不幸的是,狄利克雷分布对方差和相关性有限制,使得它不适合某些应用。因此,实践研究人员通常会对组成中的每个变量采用标准的双样本t检验或方差分析模型来检测平均值的差异。我们表明,最近发表的一种使用狄利克雷分布的方法可以大大提高I型错误率,并且我们引入了一个全局双样本检验来检测两个独立组的平均成分比例的差异,其中两个组都来自狄利克雷分布或更灵活的嵌套狄利克雷分布。我们还推导了用于事后测试和进一步解释结果的单个组件的置信区间公式。我们用最近的莫里斯水迷宫实验和人类活动数据来说明我们的方法的实用性。(PsycInfo Database Record (c) 2025 APA,版权所有)。
Comparison of two independent populations of compositional data with positive correlations among components using a nested dirichlet distribution.
Compositional data are multivariate data made up of components that sum to a fixed value. Often the data are presented as proportions of a whole, where the value of each component is constrained to be between 0 and 1 and the sum of the components is 1. There are many applications in psychology and other disciplines that yield compositional data sets including Morris water maze experiments, psychological well-being scores, analysis of daily physical activity times, and components of household expenditures. Statistical methods exist for compositional data and typically consist of two approaches. The first is to use transformation strategies, such as log ratios, which can lead to results that are challenging to interpret. The second involves using an appropriate distribution, such as the Dirichlet distribution, that captures the key characteristics of compositional data, and allows for ready interpretation of downstream analysis. Unfortunately, the Dirichlet distribution has constraints on variance and correlation that render it inappropriate for some applications. As a result, practicing researchers will often resort to standard two-sample t test or analysis of variance models for each variable in the composition to detect differences in means. We show that a recently published method using the Dirichlet distribution can drastically inflate Type I error rates, and we introduce a global two-sample test to detect differences in mean proportion of components for two independent groups where both groups are from either a Dirichlet or a more flexible nested Dirichlet distribution. We also derive confidence interval formulas for individual components for post hoc testing and further interpretation of results. We illustrate the utility of our methods using a recent Morris water maze experiment and human activity data. (PsycInfo Database Record (c) 2025 APA, all rights reserved).
期刊介绍:
Psychological Methods is devoted to the development and dissemination of methods for collecting, analyzing, understanding, and interpreting psychological data. Its purpose is the dissemination of innovations in research design, measurement, methodology, and quantitative and qualitative analysis to the psychological community; its further purpose is to promote effective communication about related substantive and methodological issues. The audience is expected to be diverse and to include those who develop new procedures, those who are responsible for undergraduate and graduate training in design, measurement, and statistics, as well as those who employ those procedures in research.