第1部分。多种空气污染成分影响的统计学习方法。

Research report (Health Effects Institute) Pub Date : 2015-06-01

Brent A Coull, Jennifer F Bobb, Gregory A Wellenius, Marianthi-Anna Kioumourtzoglou, Murray A Mittleman, Petros Koutrakis, John J Godleski

{"title":"第1部分。多种空气污染成分影响的统计学习方法。","authors":"Brent A Coull, Jennifer F Bobb, Gregory A Wellenius, Marianthi-Anna Kioumourtzoglou, Murray A Mittleman, Petros Koutrakis, John J Godleski","doi":"","DOIUrl":null,"url":null,"abstract":"Introduction: The United States Environmental Protection Agency (U.S. EPA*) currently regulates individual air pollutants on a pollutant-by-pollutant basis, adjusted for other pollutants and potential confounders. However, the National Academies of Science concluded that a multipollutant regulatory approach that takes into account the joint effects of multiple constituents is likely to be more protective of human health. Unfortunately, the large majority of existing research had focused on health effects of air pollution for one pollutant or for one pollutant with control for the independent effects of a small number of copollutants. Limitations in existing statistical methods are at least partially responsible for this lack of information on joint effects. The goal of this project was to fill this gap by developing flexible statistical methods to estimate the joint effects of multiple pollutants, while allowing for potential nonlinear or nonadditive associations between a given pollutant and the health outcome of interest.Methods: We proposed Bayesian kernel machine regression (BKMR) methods as a way to simultaneously achieve the multifaceted goals of variable selection, flexible estimation of the exposure-response relationship, and inference on the strength of the association between individual pollutants and health outcomes in a health effects analysis of mixtures. We first developed a BKMR variable-selection approach, which we call component-wise variable selection, to make estimating such a potentially complex exposure-response function possible by effectively using two types of penalization (or regularization) of the multivariate exposure-response surface. Next we developed an extension of this first variable-selection approach that incorporates knowledge about how pollutants might group together, such as multiple constituents of particulate matter that might represent a common pollution source category. This second grouped, or hierarchical, variable-selection procedure is applicable when groups of highly correlated pollutants are being studied. To investigate the properties of the proposed methods, we conducted three simulation studies designed to evaluate the ability of BKMR to estimate environmental mixtures responsible for health effects under potentially complex but plausible exposure-response relationships. An attractive feature of our simulation studies is that we used actual exposure data rather than simulated values. This real-data simulation approach allowed us to evaluate the performance of BKMR and several other models under realistic joint distributions of multipollutant exposure. The simulation studies compared the two proposed variable-selection approaches (component-wise and hierarchical variable selection) with each other and with existing frequentist treatments of kernel machine regression (KMR). After the simulation studies, we applied the newly developed methods to an epidemiologic data set and to a toxicologic data set. To illustrate the applicability of the proposed methods to human epidemiologic data, we estimated associations between short-term exposures to fine particulate matter constituents and blood pressure in the Maintenance of Balance, Independent Living, Intellect, and Zest in the Elderly (MOBILIZE) Boston study, a prospective cohort study of elderly subjects. To illustrate the applicability of these methods to animal toxicologic studies, we analyzed data on the associations between both blood pressure and heart rate in canines exposed to a composition of concentrated ambient particles (CAPs) in a study conducted at the Harvard T. H. Chan School of Public Health (the Harvard Chan School; formerly Harvard School of Public Health; Bartoli et al. 2009).Results: We successfully developed the theory and computational tools required to apply the proposed methods to the motivating data sets. Collectively, the three simulation studies showed that component-wise variable selection can identify important pollutants within a mixture as long as the correlations among pollutant concentrations are low to moderate. The hierarchical variable-selection method was more effective in high-dimension, high-correlation settings. Variable selection in existing frequentist KMR models can incur inflated type I error rates, particularly when pollutants are highly correlated. The analyses of the MOBILIZE data yielded evidence of a linear and additive association of black carbon (BC) or Cu exposure with standing diastolic blood pressure (DBP), and a linear association of S exposure with standing systolic blood pressure (SBP). Cu is thought to be a marker of urban road dust associated with traffic; and S is a marker of power plant emissions or regional long-range transported air pollution or both. Therefore, these analyses of the MOBILIZE data set suggest that emissions from these three source categories were most strongly associated with hemodynamic responses in this cohort. In contrast, in the Harvard Chan School canine study, after controlling for an overall effect of CAPs exposure, we did not observe any associations between DBP or SBP and any elemental concentrations. Instead, we observed strong evidence of an association between Mn concentrations and heart rate in that heart rate increased linearly with increasing concentrations of Mn. According to the positive matrix factorization (PMF) source apportionment analyses of the multipollutant data set from the Harvard Chan School Boston Supersite, Mn loads on the two factors that represent the mobile and road dust source categories. The results of the BKMR analyses in both the MOBILIZE and canine studies were similar to those from existing linear mixed model analyses of the same multipollutant data because the effects have linear and additive forms that could also have been detected using standard methods.Conclusions: This work provides several contributions to the KMR literature. First, to our knowledge this is the first time KMR methods have been used to estimate the health effects of multipollutant mixtures. Second, we developed a novel hierarchical variable-selection approach within BKMR that is able to account for the structure of the mixture and systematically handle highly correlated exposures. The analyses of the epidemiologic and toxicologic data on associations between fine particulate matter constituents and blood pressure or heart rate demonstrated associations with constituents that are typically associated with traffic emissions, power plants, and long-range transported pollutants. The simulation studies showed that the BKMR methods proposed here work well for small to moderate data sets; more work is needed to develop computationally fast methods for large data sets. This will be a goal of future work.","PeriodicalId":74687,"journal":{"name":"Research report (Health Effects Institute)","volume":" 183 Pt 1-2","pages":"5-50"},"PeriodicalIF":0.0000,"publicationDate":"2015-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Part 1. Statistical Learning Methods for the Effects of Multiple Air Pollution Constituents.\",\"authors\":\"Brent A Coull, Jennifer F Bobb, Gregory A Wellenius, Marianthi-Anna Kioumourtzoglou, Murray A Mittleman, Petros Koutrakis, John J Godleski\",\"doi\":\"\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Introduction: The United States Environmental Protection Agency (U.S. EPA*) currently regulates individual air pollutants on a pollutant-by-pollutant basis, adjusted for other pollutants and potential confounders. However, the National Academies of Science concluded that a multipollutant regulatory approach that takes into account the joint effects of multiple constituents is likely to be more protective of human health. Unfortunately, the large majority of existing research had focused on health effects of air pollution for one pollutant or for one pollutant with control for the independent effects of a small number of copollutants. Limitations in existing statistical methods are at least partially responsible for this lack of information on joint effects. The goal of this project was to fill this gap by developing flexible statistical methods to estimate the joint effects of multiple pollutants, while allowing for potential nonlinear or nonadditive associations between a given pollutant and the health outcome of interest.Methods: We proposed Bayesian kernel machine regression (BKMR) methods as a way to simultaneously achieve the multifaceted goals of variable selection, flexible estimation of the exposure-response relationship, and inference on the strength of the association between individual pollutants and health outcomes in a health effects analysis of mixtures. We first developed a BKMR variable-selection approach, which we call component-wise variable selection, to make estimating such a potentially complex exposure-response function possible by effectively using two types of penalization (or regularization) of the multivariate exposure-response surface. Next we developed an extension of this first variable-selection approach that incorporates knowledge about how pollutants might group together, such as multiple constituents of particulate matter that might represent a common pollution source category. This second grouped, or hierarchical, variable-selection procedure is applicable when groups of highly correlated pollutants are being studied. To investigate the properties of the proposed methods, we conducted three simulation studies designed to evaluate the ability of BKMR to estimate environmental mixtures responsible for health effects under potentially complex but plausible exposure-response relationships. An attractive feature of our simulation studies is that we used actual exposure data rather than simulated values. This real-data simulation approach allowed us to evaluate the performance of BKMR and several other models under realistic joint distributions of multipollutant exposure. The simulation studies compared the two proposed variable-selection approaches (component-wise and hierarchical variable selection) with each other and with existing frequentist treatments of kernel machine regression (KMR). After the simulation studies, we applied the newly developed methods to an epidemiologic data set and to a toxicologic data set. To illustrate the applicability of the proposed methods to human epidemiologic data, we estimated associations between short-term exposures to fine particulate matter constituents and blood pressure in the Maintenance of Balance, Independent Living, Intellect, and Zest in the Elderly (MOBILIZE) Boston study, a prospective cohort study of elderly subjects. To illustrate the applicability of these methods to animal toxicologic studies, we analyzed data on the associations between both blood pressure and heart rate in canines exposed to a composition of concentrated ambient particles (CAPs) in a study conducted at the Harvard T. H. Chan School of Public Health (the Harvard Chan School; formerly Harvard School of Public Health; Bartoli et al. 2009).Results: We successfully developed the theory and computational tools required to apply the proposed methods to the motivating data sets. Collectively, the three simulation studies showed that component-wise variable selection can identify important pollutants within a mixture as long as the correlations among pollutant concentrations are low to moderate. The hierarchical variable-selection method was more effective in high-dimension, high-correlation settings. Variable selection in existing frequentist KMR models can incur inflated type I error rates, particularly when pollutants are highly correlated. The analyses of the MOBILIZE data yielded evidence of a linear and additive association of black carbon (BC) or Cu exposure with standing diastolic blood pressure (DBP), and a linear association of S exposure with standing systolic blood pressure (SBP). Cu is thought to be a marker of urban road dust associated with traffic; and S is a marker of power plant emissions or regional long-range transported air pollution or both. Therefore, these analyses of the MOBILIZE data set suggest that emissions from these three source categories were most strongly associated with hemodynamic responses in this cohort. In contrast, in the Harvard Chan School canine study, after controlling for an overall effect of CAPs exposure, we did not observe any associations between DBP or SBP and any elemental concentrations. Instead, we observed strong evidence of an association between Mn concentrations and heart rate in that heart rate increased linearly with increasing concentrations of Mn. According to the positive matrix factorization (PMF) source apportionment analyses of the multipollutant data set from the Harvard Chan School Boston Supersite, Mn loads on the two factors that represent the mobile and road dust source categories. The results of the BKMR analyses in both the MOBILIZE and canine studies were similar to those from existing linear mixed model analyses of the same multipollutant data because the effects have linear and additive forms that could also have been detected using standard methods.Conclusions: This work provides several contributions to the KMR literature. First, to our knowledge this is the first time KMR methods have been used to estimate the health effects of multipollutant mixtures. Second, we developed a novel hierarchical variable-selection approach within BKMR that is able to account for the structure of the mixture and systematically handle highly correlated exposures. The analyses of the epidemiologic and toxicologic data on associations between fine particulate matter constituents and blood pressure or heart rate demonstrated associations with constituents that are typically associated with traffic emissions, power plants, and long-range transported pollutants. The simulation studies showed that the BKMR methods proposed here work well for small to moderate data sets; more work is needed to develop computationally fast methods for large data sets. This will be a goal of future work.\",\"PeriodicalId\":74687,\"journal\":{\"name\":\"Research report (Health Effects Institute)\",\"volume\":\" 183 Pt 1-2\",\"pages\":\"5-50\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Research report (Health Effects Institute)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Research report (Health Effects Institute)","FirstCategoryId":"1085","ListUrlMain":"","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

导言:美国环境保护署(U.S. EPA*)目前对个别空气污染物进行逐污染物管理，并根据其他污染物和潜在混杂因素进行调整。然而，美国国家科学院得出的结论是，考虑到多种成分的共同影响的多污染物监管方法可能更能保护人类健康。不幸的是，现有的绝大多数研究都侧重于空气污染对一种污染物或一种污染物的健康影响，而对少数共污染物的独立影响进行了控制。现有统计方法的局限性至少是缺乏关于联合效应的资料的部分原因。该项目的目标是通过开发灵活的统计方法来填补这一空白，以估计多种污染物的联合效应，同时允许给定污染物与感兴趣的健康结果之间潜在的非线性或非加性关联。方法:我们提出了贝叶斯核机回归(BKMR)方法，作为一种同时实现变量选择、灵活估计暴露-反应关系以及推断混合物健康影响分析中单个污染物与健康结果之间关联强度的多方面目标的方法。我们首先开发了一种BKMR变量选择方法，我们称之为组件明智的变量选择，通过有效地使用多变量暴露响应面的两种惩罚(或正则化)，使估计这种潜在复杂的暴露响应函数成为可能。接下来，我们对第一种变量选择方法进行了扩展，将污染物如何组合在一起的知识纳入其中，例如可能代表常见污染源类别的颗粒物质的多种成分。这第二组，或分层，变量选择程序适用于研究高度相关的污染物组。为了研究提出的方法的特性，我们进行了三项模拟研究，旨在评估BKMR在潜在复杂但合理的暴露-反应关系下估计导致健康影响的环境混合物的能力。我们的模拟研究的一个吸引人的特点是，我们使用了实际的暴露数据，而不是模拟值。这种真实数据模拟方法使我们能够在多污染物暴露的真实联合分布下评估BKMR和其他几个模型的性能。仿真研究比较了两种提出的变量选择方法(组件智能和分层变量选择)彼此以及现有的核机回归(KMR)的频率处理。在模拟研究之后，我们将新开发的方法应用于流行病学数据集和毒理学数据集。为了说明所提出的方法对人类流行病学数据的适用性，我们在老年人的平衡、独立生活、智力和激情的维持(动员)波士顿研究(一项前瞻性队列研究)中估计了短期暴露于细颗粒物成分与血压之间的关联。为了说明这些方法在动物毒理学研究中的适用性，我们分析了在哈佛大学陈曾熙公共卫生学院进行的一项研究中，暴露于浓缩环境颗粒(CAPs)组成物的犬的血压和心率之间关系的数据。原哈佛大学公共卫生学院;Bartoli et al. 2009)。结果:我们成功地开发了理论和计算工具，将提出的方法应用于激励数据集。总的来说，三个模拟研究表明，只要污染物浓度之间的相关性低到中等，成分明智的变量选择就可以识别混合物中的重要污染物。分层变量选择方法在高维、高相关设置中更为有效。在现有的频率主义KMR模型中，变量选择可能导致I型错误率过高，特别是当污染物高度相关时。对动员试验数据的分析表明，黑碳(BC)或铜暴露与站立舒张压(DBP)存在线性和可加性关联，S暴露与站立收缩压(SBP)存在线性关联。铜被认为是与交通有关的城市道路粉尘的标志;S是电厂排放或区域远距离输送空气污染的标志，或两者兼而有之。因此，这些对动员数据集的分析表明，这三种来源的排放与该队列中的血流动力学反应最密切相关。相比之下，在哈佛大学陈学院的犬类研究中，在控制了CAPs暴露的总体影响后，我们没有观察到舒张压或收缩压与任何元素浓度之间的任何关联。相反，我们观察到强有力的证据表明Mn浓度与心率之间存在关联，即随着Mn浓度的增加，心率呈线性增加。通过对哈佛大学陈学院波士顿超级站点多污染物数据集的正矩阵分解(PMF)源分配分析，得出Mn在代表移动和道路粉尘源类别的两个因子上的负荷。在动员和犬类研究中，BKMR分析的结果与现有的对相同多污染物数据的线性混合模型分析的结果相似，因为影响具有线性和可加性形式，也可以使用标准方法检测到。结论:这项工作为KMR文献提供了一些贡献。首先，据我们所知，这是首次使用KMR方法来估计多种污染物混合物对健康的影响。其次，我们在BKMR中开发了一种新的分层变量选择方法，该方法能够解释混合物的结构并系统地处理高度相关的暴露。对细颗粒物成分与血压或心率之间关联的流行病学和毒理学数据的分析表明，细颗粒物成分与通常与交通排放、发电厂和远距离输送污染物相关的成分存在关联。仿真研究表明，本文提出的BKMR方法适用于小到中等数据集;需要做更多的工作来开发大型数据集的快速计算方法。这将是未来工作的目标。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

本刊更多论文

Part 1. Statistical Learning Methods for the Effects of Multiple Air Pollution Constituents.

Introduction: The United States Environmental Protection Agency (U.S. EPA*) currently regulates individual air pollutants on a pollutant-by-pollutant basis, adjusted for other pollutants and potential confounders. However, the National Academies of Science concluded that a multipollutant regulatory approach that takes into account the joint effects of multiple constituents is likely to be more protective of human health. Unfortunately, the large majority of existing research had focused on health effects of air pollution for one pollutant or for one pollutant with control for the independent effects of a small number of copollutants. Limitations in existing statistical methods are at least partially responsible for this lack of information on joint effects. The goal of this project was to fill this gap by developing flexible statistical methods to estimate the joint effects of multiple pollutants, while allowing for potential nonlinear or nonadditive associations between a given pollutant and the health outcome of interest.

Methods: We proposed Bayesian kernel machine regression (BKMR) methods as a way to simultaneously achieve the multifaceted goals of variable selection, flexible estimation of the exposure-response relationship, and inference on the strength of the association between individual pollutants and health outcomes in a health effects analysis of mixtures. We first developed a BKMR variable-selection approach, which we call component-wise variable selection, to make estimating such a potentially complex exposure-response function possible by effectively using two types of penalization (or regularization) of the multivariate exposure-response surface. Next we developed an extension of this first variable-selection approach that incorporates knowledge about how pollutants might group together, such as multiple constituents of particulate matter that might represent a common pollution source category. This second grouped, or hierarchical, variable-selection procedure is applicable when groups of highly correlated pollutants are being studied. To investigate the properties of the proposed methods, we conducted three simulation studies designed to evaluate the ability of BKMR to estimate environmental mixtures responsible for health effects under potentially complex but plausible exposure-response relationships. An attractive feature of our simulation studies is that we used actual exposure data rather than simulated values. This real-data simulation approach allowed us to evaluate the performance of BKMR and several other models under realistic joint distributions of multipollutant exposure. The simulation studies compared the two proposed variable-selection approaches (component-wise and hierarchical variable selection) with each other and with existing frequentist treatments of kernel machine regression (KMR). After the simulation studies, we applied the newly developed methods to an epidemiologic data set and to a toxicologic data set. To illustrate the applicability of the proposed methods to human epidemiologic data, we estimated associations between short-term exposures to fine particulate matter constituents and blood pressure in the Maintenance of Balance, Independent Living, Intellect, and Zest in the Elderly (MOBILIZE) Boston study, a prospective cohort study of elderly subjects. To illustrate the applicability of these methods to animal toxicologic studies, we analyzed data on the associations between both blood pressure and heart rate in canines exposed to a composition of concentrated ambient particles (CAPs) in a study conducted at the Harvard T. H. Chan School of Public Health (the Harvard Chan School; formerly Harvard School of Public Health; Bartoli et al. 2009).

Results: We successfully developed the theory and computational tools required to apply the proposed methods to the motivating data sets. Collectively, the three simulation studies showed that component-wise variable selection can identify important pollutants within a mixture as long as the correlations among pollutant concentrations are low to moderate. The hierarchical variable-selection method was more effective in high-dimension, high-correlation settings. Variable selection in existing frequentist KMR models can incur inflated type I error rates, particularly when pollutants are highly correlated. The analyses of the MOBILIZE data yielded evidence of a linear and additive association of black carbon (BC) or Cu exposure with standing diastolic blood pressure (DBP), and a linear association of S exposure with standing systolic blood pressure (SBP). Cu is thought to be a marker of urban road dust associated with traffic; and S is a marker of power plant emissions or regional long-range transported air pollution or both. Therefore, these analyses of the MOBILIZE data set suggest that emissions from these three source categories were most strongly associated with hemodynamic responses in this cohort. In contrast, in the Harvard Chan School canine study, after controlling for an overall effect of CAPs exposure, we did not observe any associations between DBP or SBP and any elemental concentrations. Instead, we observed strong evidence of an association between Mn concentrations and heart rate in that heart rate increased linearly with increasing concentrations of Mn. According to the positive matrix factorization (PMF) source apportionment analyses of the multipollutant data set from the Harvard Chan School Boston Supersite, Mn loads on the two factors that represent the mobile and road dust source categories. The results of the BKMR analyses in both the MOBILIZE and canine studies were similar to those from existing linear mixed model analyses of the same multipollutant data because the effects have linear and additive forms that could also have been detected using standard methods.

Conclusions: This work provides several contributions to the KMR literature. First, to our knowledge this is the first time KMR methods have been used to estimate the health effects of multipollutant mixtures. Second, we developed a novel hierarchical variable-selection approach within BKMR that is able to account for the structure of the mixture and systematically handle highly correlated exposures. The analyses of the epidemiologic and toxicologic data on associations between fine particulate matter constituents and blood pressure or heart rate demonstrated associations with constituents that are typically associated with traffic emissions, power plants, and long-range transported pollutants. The simulation studies showed that the BKMR methods proposed here work well for small to moderate data sets; more work is needed to develop computationally fast methods for large data sets. This will be a goal of future work.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Research report (Health Effects Institute)

自引率

0.00%

发文量