Georgia D Tomova, Rosemary Walmsley, Laurie Berrie, Michelle A Morris, Peter W G Tennant
{"title":"用固定总量和可变总量分析成分数据的方法比较:以时间使用和饮食数据为例的模拟研究。","authors":"Georgia D Tomova, Rosemary Walmsley, Laurie Berrie, Michelle A Morris, Peter W G Tennant","doi":"10.1186/s12874-025-02509-1","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Compositional data comprise the parts of a 'whole' (or 'total'), which sum to that 'whole'. The 'whole' may vary between units of analyses, or it may be fixed (constant). For example, total energy intake (a variable total) is the sum of intake from all foods or macronutrients. Total time in a day (a fixed total) is the sum of time spent engaging in various activities. There exist different approaches to analysing compositional data, such as the isocaloric or isotemporal model, ratio variables, and compositional data analysis (CoDA). Although the performance of the different approaches has been compared previously, this has only been conducted in real data. Since the true relationships are unknown in real data, it is difficult to compare model performance in estimating a known effect. We use data simulations of different parametric relationships, to explore and demonstrate the performance of each approach under various possible conditions.</p><p><strong>Methods: </strong>We simulated physical activity time-use and dietary data as examples of compositional data with fixed and variable totals, respectively, using different parametric relationships between the compositional components and the outcome (fasting plasma glucose): linear, log<sub>2</sub>, and isometric log-ratios. We evaluated the performance of a range of generalised linear and additive models as well as CoDA, in estimating a 1-unit and either 10-unit (for physical activity) or 100-unit (for dietary data) reallocations under each parametric scenario. We simulated 10,000 datasets with 1,000 observations in each.</p><p><strong>Results: </strong>The performance of each approach to analysing compositional data depends on how closely its parameterisation matches the true data generating process. Overall, we demonstrated that the consequences of using an incorrect parameterisation (e.g. using CoDA when the true relationship is linear) are more severe for larger reallocations (e.g. 10-min or 100-kcal) than for 1-unit reallocations. The implications of choosing an unsuitable approach may be starker in compositional data with variable totals. For example, while models with ratio variables are mathematically equivalent to linear models in compositional data with fixed totals, their estimates may be radically different for variable totals.</p><p><strong>Conclusions: </strong>Compositional data with fixed and variable totals behave differently. All existing approaches to analysing such data have utility but need to be carefully selected. Investigators should explore the shape of the relationships between the compositional components and the outcome and chose an approach that matches it best.</p>","PeriodicalId":9114,"journal":{"name":"BMC Medical Research Methodology","volume":"25 1","pages":"100"},"PeriodicalIF":3.9000,"publicationDate":"2025-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12004694/pdf/","citationCount":"0","resultStr":"{\"title\":\"A comparison of methods for analysing compositional data with fixed and variable totals: a simulation study using the examples of time-use and dietary data.\",\"authors\":\"Georgia D Tomova, Rosemary Walmsley, Laurie Berrie, Michelle A Morris, Peter W G Tennant\",\"doi\":\"10.1186/s12874-025-02509-1\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Compositional data comprise the parts of a 'whole' (or 'total'), which sum to that 'whole'. The 'whole' may vary between units of analyses, or it may be fixed (constant). For example, total energy intake (a variable total) is the sum of intake from all foods or macronutrients. Total time in a day (a fixed total) is the sum of time spent engaging in various activities. There exist different approaches to analysing compositional data, such as the isocaloric or isotemporal model, ratio variables, and compositional data analysis (CoDA). Although the performance of the different approaches has been compared previously, this has only been conducted in real data. Since the true relationships are unknown in real data, it is difficult to compare model performance in estimating a known effect. We use data simulations of different parametric relationships, to explore and demonstrate the performance of each approach under various possible conditions.</p><p><strong>Methods: </strong>We simulated physical activity time-use and dietary data as examples of compositional data with fixed and variable totals, respectively, using different parametric relationships between the compositional components and the outcome (fasting plasma glucose): linear, log<sub>2</sub>, and isometric log-ratios. We evaluated the performance of a range of generalised linear and additive models as well as CoDA, in estimating a 1-unit and either 10-unit (for physical activity) or 100-unit (for dietary data) reallocations under each parametric scenario. We simulated 10,000 datasets with 1,000 observations in each.</p><p><strong>Results: </strong>The performance of each approach to analysing compositional data depends on how closely its parameterisation matches the true data generating process. Overall, we demonstrated that the consequences of using an incorrect parameterisation (e.g. using CoDA when the true relationship is linear) are more severe for larger reallocations (e.g. 10-min or 100-kcal) than for 1-unit reallocations. The implications of choosing an unsuitable approach may be starker in compositional data with variable totals. For example, while models with ratio variables are mathematically equivalent to linear models in compositional data with fixed totals, their estimates may be radically different for variable totals.</p><p><strong>Conclusions: </strong>Compositional data with fixed and variable totals behave differently. All existing approaches to analysing such data have utility but need to be carefully selected. Investigators should explore the shape of the relationships between the compositional components and the outcome and chose an approach that matches it best.</p>\",\"PeriodicalId\":9114,\"journal\":{\"name\":\"BMC Medical Research Methodology\",\"volume\":\"25 1\",\"pages\":\"100\"},\"PeriodicalIF\":3.9000,\"publicationDate\":\"2025-04-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12004694/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"BMC Medical Research Methodology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1186/s12874-025-02509-1\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"HEALTH CARE SCIENCES & SERVICES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Medical Research Methodology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12874-025-02509-1","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
A comparison of methods for analysing compositional data with fixed and variable totals: a simulation study using the examples of time-use and dietary data.
Background: Compositional data comprise the parts of a 'whole' (or 'total'), which sum to that 'whole'. The 'whole' may vary between units of analyses, or it may be fixed (constant). For example, total energy intake (a variable total) is the sum of intake from all foods or macronutrients. Total time in a day (a fixed total) is the sum of time spent engaging in various activities. There exist different approaches to analysing compositional data, such as the isocaloric or isotemporal model, ratio variables, and compositional data analysis (CoDA). Although the performance of the different approaches has been compared previously, this has only been conducted in real data. Since the true relationships are unknown in real data, it is difficult to compare model performance in estimating a known effect. We use data simulations of different parametric relationships, to explore and demonstrate the performance of each approach under various possible conditions.
Methods: We simulated physical activity time-use and dietary data as examples of compositional data with fixed and variable totals, respectively, using different parametric relationships between the compositional components and the outcome (fasting plasma glucose): linear, log2, and isometric log-ratios. We evaluated the performance of a range of generalised linear and additive models as well as CoDA, in estimating a 1-unit and either 10-unit (for physical activity) or 100-unit (for dietary data) reallocations under each parametric scenario. We simulated 10,000 datasets with 1,000 observations in each.
Results: The performance of each approach to analysing compositional data depends on how closely its parameterisation matches the true data generating process. Overall, we demonstrated that the consequences of using an incorrect parameterisation (e.g. using CoDA when the true relationship is linear) are more severe for larger reallocations (e.g. 10-min or 100-kcal) than for 1-unit reallocations. The implications of choosing an unsuitable approach may be starker in compositional data with variable totals. For example, while models with ratio variables are mathematically equivalent to linear models in compositional data with fixed totals, their estimates may be radically different for variable totals.
Conclusions: Compositional data with fixed and variable totals behave differently. All existing approaches to analysing such data have utility but need to be carefully selected. Investigators should explore the shape of the relationships between the compositional components and the outcome and chose an approach that matches it best.
期刊介绍:
BMC Medical Research Methodology is an open access journal publishing original peer-reviewed research articles in methodological approaches to healthcare research. Articles on the methodology of epidemiological research, clinical trials and meta-analysis/systematic review are particularly encouraged, as are empirical studies of the associations between choice of methodology and study outcomes. BMC Medical Research Methodology does not aim to publish articles describing scientific methods or techniques: these should be directed to the BMC journal covering the relevant biomedical subject area.