{"title":"具有基线协变量的纵向数据回归树。","authors":"Madan Gopal Kundu, Jaroslaw Harezlak","doi":"10.1080/24709360.2018.1557797","DOIUrl":null,"url":null,"abstract":"<p><p>Longitudinal changes in a population of interest are often heterogeneous and may be influenced by a combination of baseline factors. In such cases, traditional linear mixed effects models (Laird and Ware, 1982) assuming common parametric form for the mean structure may not be applicable. We show that the regression tree methodology for longitudinal data can identify and characterize longitudinally homogeneous subgroups. Most of the currently available regression tree construction methods are either limited to a repeated measures scenario or combine the heterogeneity among subgroups with the random inter-subject variability. We propose a longitudinal classification and regression tree (LongCART) algorithm under conditional inference framework (Hothorn, Hornik and Zeileis, 2006) that overcomes these limitations utilizing a two-step approach. The LongCART algorithm first selects the partitioning variable via a <i>parameter instability test</i> and then finds the optimal split for the selected partitioning variable. Thus, at each node, the decision of further splitting is type-I error controlled and thus it guards against variable selection bias, over-fitting and spurious splitting. We have obtained the asymptotic results for the proposed instability test and examined its finite sample behavior through simulation studies. Comparative performance of LongCART algorithm were evaluated empirically via simulation studies. Finally, we applied LongCART to study the longitudinal changes in <i>choline</i> levels among HIV-positive patients.</p>","PeriodicalId":37240,"journal":{"name":"Biostatistics and Epidemiology","volume":"3 1","pages":"1-22"},"PeriodicalIF":0.0000,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/24709360.2018.1557797","citationCount":"10","resultStr":"{\"title\":\"Regression Trees for Longitudinal Data with Baseline Covariates.\",\"authors\":\"Madan Gopal Kundu, Jaroslaw Harezlak\",\"doi\":\"10.1080/24709360.2018.1557797\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Longitudinal changes in a population of interest are often heterogeneous and may be influenced by a combination of baseline factors. In such cases, traditional linear mixed effects models (Laird and Ware, 1982) assuming common parametric form for the mean structure may not be applicable. We show that the regression tree methodology for longitudinal data can identify and characterize longitudinally homogeneous subgroups. Most of the currently available regression tree construction methods are either limited to a repeated measures scenario or combine the heterogeneity among subgroups with the random inter-subject variability. We propose a longitudinal classification and regression tree (LongCART) algorithm under conditional inference framework (Hothorn, Hornik and Zeileis, 2006) that overcomes these limitations utilizing a two-step approach. The LongCART algorithm first selects the partitioning variable via a <i>parameter instability test</i> and then finds the optimal split for the selected partitioning variable. Thus, at each node, the decision of further splitting is type-I error controlled and thus it guards against variable selection bias, over-fitting and spurious splitting. We have obtained the asymptotic results for the proposed instability test and examined its finite sample behavior through simulation studies. Comparative performance of LongCART algorithm were evaluated empirically via simulation studies. Finally, we applied LongCART to study the longitudinal changes in <i>choline</i> levels among HIV-positive patients.</p>\",\"PeriodicalId\":37240,\"journal\":{\"name\":\"Biostatistics and Epidemiology\",\"volume\":\"3 1\",\"pages\":\"1-22\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.1080/24709360.2018.1557797\",\"citationCount\":\"10\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Biostatistics and Epidemiology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1080/24709360.2018.1557797\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2018/12/31 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q3\",\"JCRName\":\"Medicine\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biostatistics and Epidemiology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1080/24709360.2018.1557797","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2018/12/31 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"Medicine","Score":null,"Total":0}
引用次数: 10
摘要
研究人群的纵向变化往往是异质的,可能受到基线因素组合的影响。在这种情况下,传统的线性混合效应模型(Laird and Ware, 1982)假设平均结构的共同参数形式可能不适用。我们证明了纵向数据的回归树方法可以识别和表征纵向均匀的子群。目前大多数可用的回归树构建方法要么局限于重复测量场景,要么将子组之间的异质性与随机的主体间变异性结合起来。我们提出了一种在条件推理框架下的纵向分类和回归树(LongCART)算法(Hothorn, Hornik和Zeileis, 2006),该算法利用两步法克服了这些限制。LongCART算法首先通过参数不稳定性测试选择分区变量,然后为所选分区变量找到最优分割。因此,在每个节点上,进一步分裂的决策是类型- i错误控制的,从而防止了变量选择偏差,过拟合和虚假分裂。我们得到了所提出的不稳定性试验的渐近结果,并通过模拟研究检验了其有限样本行为。通过仿真研究,对LongCART算法的性能进行了实证评价。最后,我们应用LongCART研究hiv阳性患者胆碱水平的纵向变化。
Regression Trees for Longitudinal Data with Baseline Covariates.
Longitudinal changes in a population of interest are often heterogeneous and may be influenced by a combination of baseline factors. In such cases, traditional linear mixed effects models (Laird and Ware, 1982) assuming common parametric form for the mean structure may not be applicable. We show that the regression tree methodology for longitudinal data can identify and characterize longitudinally homogeneous subgroups. Most of the currently available regression tree construction methods are either limited to a repeated measures scenario or combine the heterogeneity among subgroups with the random inter-subject variability. We propose a longitudinal classification and regression tree (LongCART) algorithm under conditional inference framework (Hothorn, Hornik and Zeileis, 2006) that overcomes these limitations utilizing a two-step approach. The LongCART algorithm first selects the partitioning variable via a parameter instability test and then finds the optimal split for the selected partitioning variable. Thus, at each node, the decision of further splitting is type-I error controlled and thus it guards against variable selection bias, over-fitting and spurious splitting. We have obtained the asymptotic results for the proposed instability test and examined its finite sample behavior through simulation studies. Comparative performance of LongCART algorithm were evaluated empirically via simulation studies. Finally, we applied LongCART to study the longitudinal changes in choline levels among HIV-positive patients.