P. Pinsky, Ruth Etzioni, N. Howlader, P. Goodman, I. Thompson
{"title":"Modeling the Effect of a Preventive Intervention on the Natural History of Cancer: Application to the Prostate Cancer Prevention Trial","authors":"P. Pinsky, Ruth Etzioni, N. Howlader, P. Goodman, I. Thompson","doi":"10.2202/1557-4679.1036","DOIUrl":"https://doi.org/10.2202/1557-4679.1036","url":null,"abstract":"The Prostate Cancer Prevention Trial (PCPT) recently demonstrated a significant reduction in prostate cancer incidence of about 25% among men taking finasteride compared to men taking placebo. However, the effect of finasteride on the natural history of prostate cancer is not well understood. We adapted a convolution model developed by Pinsky (2001) to characterize the natural history of prostate cancer in the presence and absence of finasteride. The model was applied to data from 10,995 men in PCPT who had disease status determined by interim diagnosis of prostate cancer or end-of-study biopsy. Prostate cancer cases were either screen-detected by Prostate-Specific Antigen (PSA), biopsy-detected at the end of the study, or clinically detected, that is, detected by methods other than PSA screening. The hazard ratio (HR) for the incidence of preclinical disease on finasteride versus placebo was 0.42 (95% CI: 0.20-0.58). The progression from preclinical to clinical disease was relatively unaffected by finasteride, with mean sojourn time being 16 years for placebo cases and 18.5 years for finasteride cases (p-value for difference = 0.2). We conclude that finasteride appears to affect prostate cancer primarily by preventing the emergence of new, preclinical tumors with little impact on established, latent disease.","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":"2 1","pages":""},"PeriodicalIF":1.2,"publicationDate":"2006-12-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.2202/1557-4679.1036","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68715671","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Targeted Maximum Likelihood Learning","authors":"M. J. van der Laan, D. Rubin","doi":"10.2202/1557-4679.1043","DOIUrl":"https://doi.org/10.2202/1557-4679.1043","url":null,"abstract":"Suppose one observes a sample of independent and identically distributed observations from a particular data generating distribution. Suppose that one is concerned with estimation of a particular pathwise differentiable Euclidean parameter. A substitution estimator evaluating the parameter of a given likelihood based density estimator is typically too biased and might not even converge at the parametric rate: that is, the density estimator was targeted to be a good estimator of the density and might therefore result in a poor estimator of a particular smooth functional of the density. In this article we propose a one step (and, by iteration, k-th step) targeted maximum likelihood density estimator which involves 1) creating a hardest parametric submodel with parameter epsilon through the given density estimator with score equal to the efficient influence curve of the pathwise differentiable parameter at the density estimator, 2) estimating epsilon with the maximum likelihood estimator, and 3) defining a new density estimator as the corresponding update of the original density estimator. We show that iteration of this algorithm results in a targeted maximum likelihood density estimator which solves the efficient influence curve estimating equation and thereby yields a locally efficient estimator of the parameter of interest, under regularity conditions. In particular, we show that, if the parameter is linear and the model is convex, then the targeted maximum likelihood estimator is often achieved in the first step, and it results in a locally efficient estimator at an arbitrary (e.g., heavily misspecified) starting density.We also show that the targeted maximum likelihood estimators are now in full agreement with the locally efficient estimating function methodology as presented in Robins and Rotnitzky (1992) and van der Laan and Robins (2003), creating, in particular, algebraic equivalence between the double robust locally efficient estimators using the targeted maximum likelihood estimators as an estimate of its nuisance parameters, and targeted maximum likelihood estimators. In addition, it is argued that the targeted MLE has various advantages relative to the current estimating function based approach. We proceed by providing data driven methodologies to select the initial density estimator for the targeted MLE, thereby providing data adaptive targeted maximum likelihood estimation methodology. We illustrate the method with various worked out examples.","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":"2 1","pages":""},"PeriodicalIF":1.2,"publicationDate":"2006-12-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.2202/1557-4679.1043","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68715717","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Choice of Monitoring Mechanism for Optimal Nonparametric Functional Estimation for Binary Data","authors":"N. Jewell, M. J. van der Laan, S. Shiboski","doi":"10.2202/1557-4679.1031","DOIUrl":"https://doi.org/10.2202/1557-4679.1031","url":null,"abstract":"Optimal designs of dose levels in order to estimate parameters from a model for binary response data have a long and rich history. These designs are based on parametric models. Here we consider fully nonparametric models with interest focused on estimation of smooth functionals using plug-in estimators based on the nonparametric maximum likelihood estimator. An important application of the results is the derivation of the optimal choice of the monitoring time distribution function for current status observation of a survival distribution. The optimal choice depends in a simple way on the dose-response function and the form of the functional. The results can be extended to allow dependence of the monitoring mechanism on covariates.","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":"2 1","pages":""},"PeriodicalIF":1.2,"publicationDate":"2006-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68715343","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Application of a Variable Importance Measure Method","authors":"M. Birkner, M. J. van der Laan","doi":"10.2202/1557-4679.1013","DOIUrl":"https://doi.org/10.2202/1557-4679.1013","url":null,"abstract":"Van der Laan (2005) proposed a targeted method used to construct variable importance measures coupled with respective statistical inference. This technique involves determining the importance of a variable in predicting an outcome. This method can be applied as inverse probability of treatment weighted (IPTW) or double robust inverse probability of treatment weighted (DR-IPTW) estimators. The variance and respective p-value of the estimate are calculated by estimating the influence curve. This article applies the Van der Laan (2005) variable importance measures and corresponding inference to HIV-1 sequence data. In this application, the method is targeted at every codon position. In this data application, protease and reverse transcriptase codon positions on the HIV-1 strand are assessed to determine their respective variable importance, with respect to an outcome of viral replication capacity. We estimate the DR-IPTW W-adjusted variable importance measure for a specified set of potential effect modifiers W. In addition, simulations were performed on two separate datasets to examine the DR-IPTW estimator.","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":"2 1","pages":""},"PeriodicalIF":1.2,"publicationDate":"2006-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.2202/1557-4679.1013","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68715116","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The Two Sample Problem for Multiple Categorical Variables","authors":"A. DiRienzo","doi":"10.2202/1557-4679.1019","DOIUrl":"https://doi.org/10.2202/1557-4679.1019","url":null,"abstract":"Comparing two large multivariate distributions is potentially complicated at least for the following reasons. First, some variable/level combinations may have a redundant difference in prevalence between groups in the sense that the difference can be completely explained in terms of lower-order combinations. Second, the total number of variable/level combinations to compare between groups is very large, and likely computationally prohibitive. In this paper, for both the paired and independent sample case, an approximate comparison method is proposed, along with a computationally efficient algorithm, that estimates the set of variable/level combinations that have a non-redundant different prevalence between two populations. The probability that the estimate contains one or more false or redundant differences is asymptotically bounded above by any pre-specified level for arbitrary data-generating distributions. The method is shown to perform well for finite samples in a simulation study, and is used to investigate HIV-1 genotype evolution in a recent AIDS clinical trial.","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":"2 1","pages":""},"PeriodicalIF":1.2,"publicationDate":"2006-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.2202/1557-4679.1019","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68715010","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Comparing Distribution Functions via Empirical Likelihood","authors":"I. McKeague, Yichuan Zhao","doi":"10.2202/1557-4679.1007","DOIUrl":"https://doi.org/10.2202/1557-4679.1007","url":null,"abstract":"This paper develops empirical likelihood based simultaneous confidence bands for differences and ratios of two distribution functions from independent samples of right-censored survival data. The proposed confidence bands provide a flexible way of comparing treatments in biomedical settings, and bring empirical likelihood methods to bear on important target functions for which only Wald-type confidence bands have been available in the literature. The approach is illustrated with a real data example.","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":"1 1","pages":""},"PeriodicalIF":1.2,"publicationDate":"2006-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.2202/1557-4679.1007","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68714433","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Regression Model for Dependent Gap Times","authors":"R. Strawderman","doi":"10.2202/1557-4679.1005","DOIUrl":"https://doi.org/10.2202/1557-4679.1005","url":null,"abstract":"A natural choice of time scale for analyzing recurrent event data is the ``gap\" (or soujourn) time between successive events. In many situations it is reasonable to assume correlation exists between the successive events experienced by a given subject. This paper looks at the problem of extending the accelerated failure time (AFT) model to the case of dependent recurrent event data via intensity modeling. Specifically, the accelerated gap times model of Strawderman (2005), a semiparametric intensity model for independent gap time data, is extended to the case of multiplicative gamma frailty. As argued in Aalen & Husebye (1991), incorporating frailty captures the heterogeneity between subjects and the ``hazard\" portion of the intensity model captures gap time variation within a subject. Estimators are motivated using semiparametric efficiency theory and lead to useful generalizations of the rank statistics considered in Strawderman (2005). Several interesting distinctions arise in comparison to the Cox-Andersen-Gill frailty model (e.g., Nielsen et al, 1992; Klein, 1992). The proposed methodology is illustrated by simulation and data analysis.","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":"2 1","pages":""},"PeriodicalIF":1.2,"publicationDate":"2006-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68714794","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"History-Adjusted Marginal Structural Models and Statically-Optimal Dynamic Treatment Regimens","authors":"M. J. van der Laan, M. Petersen, M. Joffe","doi":"10.2202/1557-4679.1003","DOIUrl":"https://doi.org/10.2202/1557-4679.1003","url":null,"abstract":"Marginal structural models (MSM) provide a powerful tool for estimating the causal effect of a treatment. These models, introduced by Robins, model the marginal distributions of treatment-specific counterfactual outcomes, possibly conditional on a subset of the baseline covariates. Marginal structural models are particularly useful in the context of longitudinal data structures, in which each subject's treatment and covariate history are measured over time, and an outcome is recorded at a final time point. However, the utility of these models for some applications has been limited by their inability to incorporate modification of the causal effect of treatment by time-varying covariates. Particularly in the context of clinical decision making, such time-varying effect modifiers are often of considerable or even primary interest, as they are used in practice to guide treatment decisions for an individual. In this article we propose a generalization of marginal structural models, which we call history-adjusted marginal structural models (HA-MSM). These models allow estimation of adjusted causal effects of treatment, given the observed past, and are therefore more suitable for making treatment decisions at the individual level and for identification of time-dependent effect modifiers. Specifically, a HA-MSM models the conditional distribution of treatment-specific counterfactual outcomes, conditional on the whole or a subset of the observed past up till a time-point, simultaneously for all time-points. Double robust inverse probability of treatment weighted estimators have been developed and studied in detail for standard MSM. We extend these results by proposing a class of double robust inverse probability of treatment weighted estimators for the unknown parameters of the HA-MSM. In addition, we show that HA-MSM provide a natural approach to identifying the dynamic treatment regimen which follows, at each time-point, the history-adjusted (up till the most recent time point) optimal static treatment regimen. We illustrate our results using an example drawn from the treatment of HIV infection.","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":"1 1","pages":""},"PeriodicalIF":1.2,"publicationDate":"2005-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.2202/1557-4679.1003","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68714712","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Score Statistics for Current Status Data: Comparisons with Likelihood Ratio and Wald Statistics","authors":"M. Banerjee, J. Wellner","doi":"10.2202/1557-4679.1001","DOIUrl":"https://doi.org/10.2202/1557-4679.1001","url":null,"abstract":"In this paper we introduce three natural ``score statistics\" for testing the hypothesis that F(t_0)takes on a fixed value in the context of nonparametric inference with current status data. These three new test statistics have natural interpretations in terms of certain (weighted) L_2 distances, and are also connected to natural ``one-sided\" scores. We compare these new test statistics with the analogue of the classical Wald statistic and the likelihood ratio statistic introduced in Banerjee and Wellner (2001) for the same testing problem. Under classical ``regular\" statistical problems the likelihood ratio, score, and Wald statistics all have the same chi-squared limiting distribution under the null hypothesis. In sharp contrast, in this non-regular problem all three statistics have different limiting distributions under the null hypothesis. Thus we begin by establishing the limit distribution theory of the statistics under the null hypothesis, and discuss calculation of the relevant critical points for the test statistics. Once the null distribution theory is known, the immediate question becomes that of power. We establish the limiting behavior of the three types of statistics under local alternatives. We have also compared the power of these five different statistics via a limited Monte-Carlo study. Our conclusions are: (a) the Wald statistic is less powerful than the likelihood ratio and score statistics; and (b) one of the score statistics may have more power than the likelihood ratio statistic for some alternatives.","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":"1 1","pages":""},"PeriodicalIF":1.2,"publicationDate":"2005-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.2202/1557-4679.1001","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68714657","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Some Variants of the Backcalculation Method for Estimation of Disease Incidence: An Application to Multiple Sclerosis Data from the Faroe Islands","authors":"N. Jewell, B. Lu","doi":"10.2202/1557-4679.1002","DOIUrl":"https://doi.org/10.2202/1557-4679.1002","url":null,"abstract":"Backcalculation is a technique that was originally developed for the study of HIV incidence. Here we introduce some variants of the estimation technique that allow for (i) correlation of the unobserved disease incidence counts, and (ii) the use of a smoothing step as part of the maximizing step in the EM algorithm to reduce instability due to small diagnosis counts. Both of these issues can be important in the analysis of small \"epidemics.\" In addition, identification of correlation between diagnosis counts provides indirect evidence of correlation among unobserved incidence counts, hinting at the possibility of an infectious agent. We illustrate the ideas by reconstructing an incidence intensity function for the onset of multiple sclerosis, using data from the Faroe Islands. Previously, this data had been examined statistically, by Joseph, Wolfson & Wolfson (1990), to address the issue of infectiousness of multiple sclerosis. We argue that the incidence function cannot directly shed light on the enigmatic origin of multiple sclerosis in the Faroe Islands during World War II, and, in particular, cannot discriminate between hypotheses of an infectious or environmental agent.","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":"1 1","pages":""},"PeriodicalIF":1.2,"publicationDate":"2005-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.2202/1557-4679.1002","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68714703","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}