用固定总量和可变总量分析成分数据的方法比较:以时间使用和饮食数据为例的模拟研究。

IF 3.9 3区 医学 Q1 HEALTH CARE SCIENCES & SERVICES
Georgia D Tomova, Rosemary Walmsley, Laurie Berrie, Michelle A Morris, Peter W G Tennant
{"title":"用固定总量和可变总量分析成分数据的方法比较:以时间使用和饮食数据为例的模拟研究。","authors":"Georgia D Tomova, Rosemary Walmsley, Laurie Berrie, Michelle A Morris, Peter W G Tennant","doi":"10.1186/s12874-025-02509-1","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Compositional data comprise the parts of a 'whole' (or 'total'), which sum to that 'whole'. The 'whole' may vary between units of analyses, or it may be fixed (constant). For example, total energy intake (a variable total) is the sum of intake from all foods or macronutrients. Total time in a day (a fixed total) is the sum of time spent engaging in various activities. There exist different approaches to analysing compositional data, such as the isocaloric or isotemporal model, ratio variables, and compositional data analysis (CoDA). Although the performance of the different approaches has been compared previously, this has only been conducted in real data. Since the true relationships are unknown in real data, it is difficult to compare model performance in estimating a known effect. We use data simulations of different parametric relationships, to explore and demonstrate the performance of each approach under various possible conditions.</p><p><strong>Methods: </strong>We simulated physical activity time-use and dietary data as examples of compositional data with fixed and variable totals, respectively, using different parametric relationships between the compositional components and the outcome (fasting plasma glucose): linear, log<sub>2</sub>, and isometric log-ratios. We evaluated the performance of a range of generalised linear and additive models as well as CoDA, in estimating a 1-unit and either 10-unit (for physical activity) or 100-unit (for dietary data) reallocations under each parametric scenario. We simulated 10,000 datasets with 1,000 observations in each.</p><p><strong>Results: </strong>The performance of each approach to analysing compositional data depends on how closely its parameterisation matches the true data generating process. Overall, we demonstrated that the consequences of using an incorrect parameterisation (e.g. using CoDA when the true relationship is linear) are more severe for larger reallocations (e.g. 10-min or 100-kcal) than for 1-unit reallocations. The implications of choosing an unsuitable approach may be starker in compositional data with variable totals. For example, while models with ratio variables are mathematically equivalent to linear models in compositional data with fixed totals, their estimates may be radically different for variable totals.</p><p><strong>Conclusions: </strong>Compositional data with fixed and variable totals behave differently. All existing approaches to analysing such data have utility but need to be carefully selected. Investigators should explore the shape of the relationships between the compositional components and the outcome and chose an approach that matches it best.</p>","PeriodicalId":9114,"journal":{"name":"BMC Medical Research Methodology","volume":"25 1","pages":"100"},"PeriodicalIF":3.9000,"publicationDate":"2025-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12004694/pdf/","citationCount":"0","resultStr":"{\"title\":\"A comparison of methods for analysing compositional data with fixed and variable totals: a simulation study using the examples of time-use and dietary data.\",\"authors\":\"Georgia D Tomova, Rosemary Walmsley, Laurie Berrie, Michelle A Morris, Peter W G Tennant\",\"doi\":\"10.1186/s12874-025-02509-1\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Compositional data comprise the parts of a 'whole' (or 'total'), which sum to that 'whole'. The 'whole' may vary between units of analyses, or it may be fixed (constant). For example, total energy intake (a variable total) is the sum of intake from all foods or macronutrients. Total time in a day (a fixed total) is the sum of time spent engaging in various activities. There exist different approaches to analysing compositional data, such as the isocaloric or isotemporal model, ratio variables, and compositional data analysis (CoDA). Although the performance of the different approaches has been compared previously, this has only been conducted in real data. Since the true relationships are unknown in real data, it is difficult to compare model performance in estimating a known effect. We use data simulations of different parametric relationships, to explore and demonstrate the performance of each approach under various possible conditions.</p><p><strong>Methods: </strong>We simulated physical activity time-use and dietary data as examples of compositional data with fixed and variable totals, respectively, using different parametric relationships between the compositional components and the outcome (fasting plasma glucose): linear, log<sub>2</sub>, and isometric log-ratios. We evaluated the performance of a range of generalised linear and additive models as well as CoDA, in estimating a 1-unit and either 10-unit (for physical activity) or 100-unit (for dietary data) reallocations under each parametric scenario. We simulated 10,000 datasets with 1,000 observations in each.</p><p><strong>Results: </strong>The performance of each approach to analysing compositional data depends on how closely its parameterisation matches the true data generating process. Overall, we demonstrated that the consequences of using an incorrect parameterisation (e.g. using CoDA when the true relationship is linear) are more severe for larger reallocations (e.g. 10-min or 100-kcal) than for 1-unit reallocations. The implications of choosing an unsuitable approach may be starker in compositional data with variable totals. For example, while models with ratio variables are mathematically equivalent to linear models in compositional data with fixed totals, their estimates may be radically different for variable totals.</p><p><strong>Conclusions: </strong>Compositional data with fixed and variable totals behave differently. All existing approaches to analysing such data have utility but need to be carefully selected. Investigators should explore the shape of the relationships between the compositional components and the outcome and chose an approach that matches it best.</p>\",\"PeriodicalId\":9114,\"journal\":{\"name\":\"BMC Medical Research Methodology\",\"volume\":\"25 1\",\"pages\":\"100\"},\"PeriodicalIF\":3.9000,\"publicationDate\":\"2025-04-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12004694/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"BMC Medical Research Methodology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1186/s12874-025-02509-1\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"HEALTH CARE SCIENCES & SERVICES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Medical Research Methodology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12874-025-02509-1","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
引用次数: 0

摘要

背景:组成数据由“整体”(或“总”)的各个部分组成,这些部分加起来就是“整体”。“整体”可能在不同的分析单元之间变化,也可能是固定的(常数)。例如,总能量摄入(一个可变的总量)是从所有食物或常量营养素摄入的总和。一天的总时间(一个固定的总数)是从事各种活动的时间总和。分析成分数据有不同的方法,如等热量或等时间模型、比率变量和成分数据分析(CoDA)。虽然之前已经比较了不同方法的性能,但这只是在实际数据中进行的。由于真实数据中的真实关系是未知的,因此很难比较模型在估计已知效应时的性能。我们使用不同参数关系的数据模拟来探索和演示每种方法在各种可能条件下的性能。方法:我们分别模拟了体力活动时间利用和饮食数据作为固定总量和可变总量组成数据的例子,使用组成成分与结果(空腹血糖)之间的不同参数关系:线性、log2和等距对数比。我们评估了一系列广义线性和可加性模型以及CoDA的性能,以估计每个参数情景下的1单位、10单位(体育活动)或100单位(饮食数据)再分配。我们模拟了10000个数据集,每个数据集有1000个观测值。结果:每种分析成分数据的方法的性能取决于其参数化与真实数据生成过程的匹配程度。总的来说,我们证明了使用不正确的参数化(例如,当真正的关系是线性的时候使用CoDA)的后果对于更大的重新分配(例如10分钟或100千卡)比1单位重新分配更严重。选择一个不合适的方法的影响可能是明显的组成数据与可变总数。例如,虽然具有比率变量的模型在数学上等同于具有固定总数的组合数据中的线性模型,但对于可变总数,它们的估计可能完全不同。结论:固定总量和可变总量的成分数据表现不同。分析这类数据的所有现有方法都有用,但需要仔细选择。调查人员应该探索组成成分和结果之间关系的形状,并选择最匹配的方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
A comparison of methods for analysing compositional data with fixed and variable totals: a simulation study using the examples of time-use and dietary data.

Background: Compositional data comprise the parts of a 'whole' (or 'total'), which sum to that 'whole'. The 'whole' may vary between units of analyses, or it may be fixed (constant). For example, total energy intake (a variable total) is the sum of intake from all foods or macronutrients. Total time in a day (a fixed total) is the sum of time spent engaging in various activities. There exist different approaches to analysing compositional data, such as the isocaloric or isotemporal model, ratio variables, and compositional data analysis (CoDA). Although the performance of the different approaches has been compared previously, this has only been conducted in real data. Since the true relationships are unknown in real data, it is difficult to compare model performance in estimating a known effect. We use data simulations of different parametric relationships, to explore and demonstrate the performance of each approach under various possible conditions.

Methods: We simulated physical activity time-use and dietary data as examples of compositional data with fixed and variable totals, respectively, using different parametric relationships between the compositional components and the outcome (fasting plasma glucose): linear, log2, and isometric log-ratios. We evaluated the performance of a range of generalised linear and additive models as well as CoDA, in estimating a 1-unit and either 10-unit (for physical activity) or 100-unit (for dietary data) reallocations under each parametric scenario. We simulated 10,000 datasets with 1,000 observations in each.

Results: The performance of each approach to analysing compositional data depends on how closely its parameterisation matches the true data generating process. Overall, we demonstrated that the consequences of using an incorrect parameterisation (e.g. using CoDA when the true relationship is linear) are more severe for larger reallocations (e.g. 10-min or 100-kcal) than for 1-unit reallocations. The implications of choosing an unsuitable approach may be starker in compositional data with variable totals. For example, while models with ratio variables are mathematically equivalent to linear models in compositional data with fixed totals, their estimates may be radically different for variable totals.

Conclusions: Compositional data with fixed and variable totals behave differently. All existing approaches to analysing such data have utility but need to be carefully selected. Investigators should explore the shape of the relationships between the compositional components and the outcome and chose an approach that matches it best.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
BMC Medical Research Methodology
BMC Medical Research Methodology 医学-卫生保健
CiteScore
6.50
自引率
2.50%
发文量
298
审稿时长
3-8 weeks
期刊介绍: BMC Medical Research Methodology is an open access journal publishing original peer-reviewed research articles in methodological approaches to healthcare research. Articles on the methodology of epidemiological research, clinical trials and meta-analysis/systematic review are particularly encouraged, as are empirical studies of the associations between choice of methodology and study outcomes. BMC Medical Research Methodology does not aim to publish articles describing scientific methods or techniques: these should be directed to the BMC journal covering the relevant biomedical subject area.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信