Comparison of dimension reduction methods for the identification of heart-healthy dietary patterns

Observational studies Pub Date : 2023-03-01 DOI:10.1353/obs.2023.0020

Natalie C. Gasca, R. McClelland

{"title":"Comparison of dimension reduction methods for the identification of heart-healthy dietary patterns","authors":"Natalie C. Gasca, R. McClelland","doi":"10.1353/obs.2023.0020","DOIUrl":null,"url":null,"abstract":"Abstract:Most nutritional epidemiology studies investigating diet-disease trends use unsupervised dimension reduction methods, like principal component regression (PCR) and sparse PCR (SPCR), to create dietary patterns. Supervised methods, such as partial least squares (PLS), sparse PLS (SPLS), and Lasso, offer the possibility of more concisely summarizing the foods most related to a disease. In this study we evaluate these five methods for interpretable reduction of food frequency questionnaire (FFQ) data when analyzing a univariate continuous cardiac-related outcome via a simulation study and data application. We also demonstrate that to control for covariates, various scientific premises require different adjustment approaches when using PLS. To emulate food groups, we generated blocks of normally distributed predictors with varying intra-block covariances; only nine of 24 predictors contributed to the normal response. When block covariances were informed by FFQ data, the only methods that performed variable selection were Lasso and SPLS, which selected two and four irrelevant variables, respectively. SPLS had the lowest prediction error, and both PLS-based methods constructed four patterns, while PCR and SPCR created 24 patterns. These methods were applied to 120 FFQ variables and baseline body mass index (BMI) from the Multi-Ethnic Study of Atherosclerosis, which includes 6814 participants aged 45-84, and we adjusted for age, gender, race/ethnicity, exercise, and total energy intake. From 120 variables, PCR created 17 BMI-related patterns and PLS selected one pattern; SPLS only used five variables to create two patterns. All methods exhibited similar predictive performance. Specifically, SPLS’s first pattern highlighted hamburger and diet soda intake (positive associations with BMI), reflecting a fast food diet. By selecting fewer patterns and foods, SPLS can create interpretable dietary patterns while maintaining predictive ability.","PeriodicalId":74335,"journal":{"name":"Observational studies","volume":"9 1","pages":"123 - 156"},"PeriodicalIF":0.0000,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Observational studies","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1353/obs.2023.0020","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Abstract:Most nutritional epidemiology studies investigating diet-disease trends use unsupervised dimension reduction methods, like principal component regression (PCR) and sparse PCR (SPCR), to create dietary patterns. Supervised methods, such as partial least squares (PLS), sparse PLS (SPLS), and Lasso, offer the possibility of more concisely summarizing the foods most related to a disease. In this study we evaluate these five methods for interpretable reduction of food frequency questionnaire (FFQ) data when analyzing a univariate continuous cardiac-related outcome via a simulation study and data application. We also demonstrate that to control for covariates, various scientific premises require different adjustment approaches when using PLS. To emulate food groups, we generated blocks of normally distributed predictors with varying intra-block covariances; only nine of 24 predictors contributed to the normal response. When block covariances were informed by FFQ data, the only methods that performed variable selection were Lasso and SPLS, which selected two and four irrelevant variables, respectively. SPLS had the lowest prediction error, and both PLS-based methods constructed four patterns, while PCR and SPCR created 24 patterns. These methods were applied to 120 FFQ variables and baseline body mass index (BMI) from the Multi-Ethnic Study of Atherosclerosis, which includes 6814 participants aged 45-84, and we adjusted for age, gender, race/ethnicity, exercise, and total energy intake. From 120 variables, PCR created 17 BMI-related patterns and PLS selected one pattern; SPLS only used five variables to create two patterns. All methods exhibited similar predictive performance. Specifically, SPLS’s first pattern highlighted hamburger and diet soda intake (positive associations with BMI), reflecting a fast food diet. By selecting fewer patterns and foods, SPLS can create interpretable dietary patterns while maintaining predictive ability.

查看原文本刊更多论文

降维方法识别心脏健康饮食模式的比较

摘要：大多数调查饮食疾病趋势的营养流行病学研究都使用无监督降维方法，如主成分回归（PCR）和稀疏PCR（SPCR），来创建饮食模式。监督方法，如偏最小二乘（PLS）、稀疏PLS（SPLS）和Lasso，提供了更简洁地总结与疾病最相关的食物的可能性。在本研究中，我们通过模拟研究和数据应用分析单变量连续心脏相关结果时，评估了这五种可解释的减少食物频率问卷（FFQ）数据的方法。我们还证明，为了控制协变量，在使用PLS时，各种科学前提需要不同的调整方法。为了模拟食物组，我们生成了具有不同块内协变量的正态分布预测因子块；24个预测因子中只有9个对正常反应有贡献。当块协变量由FFQ数据告知时，唯一进行变量选择的方法是Lasso和SPLS，它们分别选择了两个和四个不相关的变量。SPLS的预测误差最低，两种基于PLS的方法都构建了四种模式，而PCR和SPCR则构建了24种模式。这些方法应用于动脉粥样硬化多民族研究的120个FFQ变量和基线体重指数（BMI），该研究包括6814名年龄在45-84岁的参与者，我们对年龄、性别、种族/民族、运动和总能量摄入进行了调整。从120个变量中，PCR创建了17个BMI相关模式，PLS选择了一个模式；SPLS只使用了五个变量来创建两个模式。所有方法都表现出相似的预测性能。具体来说，SPLS的第一个模式强调了汉堡和无糖苏打水的摄入（与BMI呈正相关），反映了快餐饮食。通过选择更少的模式和食物，SPLS可以在保持预测能力的同时创造可解释的饮食模式。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Observational studies

CiteScore

0.80

自引率

0.00%

发文量