{"title":"Model-Based Recursive Partitioning for Subgroup Analyses.","authors":"Heidi Seibold, Achim Zeileis, Torsten Hothorn","doi":"10.1515/ijb-2015-0032","DOIUrl":"https://doi.org/10.1515/ijb-2015-0032","url":null,"abstract":"<p><p>The identification of patient subgroups with differential treatment effects is the first step towards individualised treatments. A current draft guideline by the EMA discusses potentials and problems in subgroup analyses and formulated challenges to the development of appropriate statistical procedures for the data-driven identification of patient subgroups. We introduce model-based recursive partitioning as a procedure for the automated detection of patient subgroups that are identifiable by predictive factors. The method starts with a model for the overall treatment effect as defined for the primary analysis in the study protocol and uses measures for detecting parameter instabilities in this treatment effect. The procedure produces a segmented model with differential treatment parameters corresponding to each patient subgroup. The subgroups are linked to predictive factors by means of a decision tree. The method is applied to the search for subgroups of patients suffering from amyotrophic lateral sclerosis that differ with respect to their Riluzole treatment effect, the only currently approved drug for this disease.</p>","PeriodicalId":49058,"journal":{"name":"International Journal of Biostatistics","volume":"12 1","pages":"45-63"},"PeriodicalIF":1.2,"publicationDate":"2016-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/ijb-2015-0032","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34427724","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Data-Adaptive Bias-Reduced Doubly Robust Estimation.","authors":"Karel Vermeulen, Stijn Vansteelandt","doi":"10.1515/ijb-2015-0029","DOIUrl":"https://doi.org/10.1515/ijb-2015-0029","url":null,"abstract":"<p><p>Doubly robust estimators have now been proposed for a variety of target parameters in the causal inference and missing data literature. These consistently estimate the parameter of interest under a semiparametric model when one of two nuisance working models is correctly specified, regardless of which. The recently proposed bias-reduced doubly robust estimation procedure aims to partially retain this robustness in more realistic settings where both working models are misspecified. These so-called bias-reduced doubly robust estimators make use of special (finite-dimensional) nuisance parameter estimators that are designed to locally minimize the squared asymptotic bias of the doubly robust estimator in certain directions of these finite-dimensional nuisance parameters under misspecification of both parametric working models. In this article, we extend this idea to incorporate the use of data-adaptive estimators (infinite-dimensional nuisance parameters), by exploiting the bias reduction estimation principle in the direction of only one nuisance parameter. We additionally provide an asymptotic linearity theorem which gives the influence function of the proposed doubly robust estimator under correct specification of a parametric nuisance working model for the missingness mechanism/propensity score but a possibly misspecified (finite- or infinite-dimensional) outcome working model. Simulation studies confirm the desirable finite-sample performance of the proposed estimators relative to a variety of other doubly robust estimators.</p>","PeriodicalId":49058,"journal":{"name":"International Journal of Biostatistics","volume":"12 1","pages":"253-82"},"PeriodicalIF":1.2,"publicationDate":"2016-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/ijb-2015-0029","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34585342","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Optimal Spatial Prediction Using Ensemble Machine Learning.","authors":"Molly Margaret Davies, Mark J van der Laan","doi":"10.1515/ijb-2014-0060","DOIUrl":"https://doi.org/10.1515/ijb-2014-0060","url":null,"abstract":"<p><p>Spatial prediction is an important problem in many scientific disciplines. Super Learner is an ensemble prediction approach related to stacked generalization that uses cross-validation to search for the optimal predictor amongst all convex combinations of a heterogeneous candidate set. It has been applied to non-spatial data, where theoretical results demonstrate it will perform asymptotically at least as well as the best candidate under consideration. We review these optimality properties and discuss the assumptions required in order for them to hold for spatial prediction problems. We present results of a simulation study confirming Super Learner works well in practice under a variety of sample sizes, sampling designs, and data-generating functions. We also apply Super Learner to a real world dataset.</p>","PeriodicalId":49058,"journal":{"name":"International Journal of Biostatistics","volume":"12 1","pages":"179-201"},"PeriodicalIF":1.2,"publicationDate":"2016-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/ijb-2014-0060","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34357980","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Second-Order Inference for the Mean of a Variable Missing at Random.","authors":"Iván Díaz, Marco Carone, Mark J van der Laan","doi":"10.1515/ijb-2015-0031","DOIUrl":"https://doi.org/10.1515/ijb-2015-0031","url":null,"abstract":"<p><p>We present a second-order estimator of the mean of a variable subject to missingness, under the missing at random assumption. The estimator improves upon existing methods by using an approximate second-order expansion of the parameter functional, in addition to the first-order expansion employed by standard doubly robust methods. This results in weaker assumptions about the convergence rates necessary to establish consistency, local efficiency, and asymptotic linearity. The general estimation strategy is developed under the targeted minimum loss-based estimation (TMLE) framework. We present a simulation comparing the sensitivity of the first and second-order estimators to the convergence rate of the initial estimators of the outcome regression and missingness score. In our simulation, the second-order TMLE always had a coverage probability equal or closer to the nominal value 0.95, compared to its first-order counterpart. In the best-case scenario, the proposed second-order TMLE had a coverage probability of 0.86 when the first-order TMLE had a coverage probability of zero. We also present a novel first-order estimator inspired by a second-order expansion of the parameter functional. This estimator only requires one-dimensional smoothing, whereas implementation of the second-order TMLE generally requires kernel smoothing on the covariate space. The first-order estimator proposed is expected to have improved finite sample performance compared to existing first-order estimators. In the best-case scenario of our simulation study, the novel first-order TMLE improved the coverage probability from 0 to 0.90. We provide an illustration of our methods using a publicly available dataset to determine the effect of an anticoagulant on health outcomes of patients undergoing percutaneous coronary intervention. We provide R code implementing the proposed estimator.</p>","PeriodicalId":49058,"journal":{"name":"International Journal of Biostatistics","volume":"12 1","pages":"333-49"},"PeriodicalIF":1.2,"publicationDate":"2016-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/ijb-2015-0031","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34519917","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ashkan Ertefaie, Dylan Small, James Flory, Sean Hennessy
{"title":"Selection Bias When Using Instrumental Variable Methods to Compare Two Treatments But More Than Two Treatments Are Available.","authors":"Ashkan Ertefaie, Dylan Small, James Flory, Sean Hennessy","doi":"10.1515/ijb-2015-0006","DOIUrl":"https://doi.org/10.1515/ijb-2015-0006","url":null,"abstract":"<p><p>Instrumental variable (IV) methods are widely used to adjust for the bias in estimating treatment effects caused by unmeasured confounders in observational studies. It is common that a comparison between two treatments is focused on and that only subjects receiving one of these two treatments are considered in the analysis even though more than two treatments are available. In this paper, we provide empirical and theoretical evidence that the IV methods may result in biased treatment effects if applied on a data set in which subjects are preselected based on their received treatments. We frame this as a selection bias problem and propose a procedure that identifies the treatment effect of interest as a function of a vector of sensitivity parameters. We also list assumptions under which analyzing the preselected data does not lead to a biased treatment effect estimate. The performance of the proposed method is examined using simulation studies. We applied our method on The Health Improvement Network (THIN) database to estimate the comparative effect of metformin and sulfonylureas on weight gain among diabetic patients.</p>","PeriodicalId":49058,"journal":{"name":"International Journal of Biostatistics","volume":"12 1","pages":"219-32"},"PeriodicalIF":1.2,"publicationDate":"2016-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/ijb-2015-0006","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34585340","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Antoine Chambaz, Alan Hubbard, Mark J van der Laan
{"title":"Special Issue on Data-Adaptive Statistical Inference.","authors":"Antoine Chambaz, Alan Hubbard, Mark J van der Laan","doi":"10.1515/ijb-2016-0033","DOIUrl":"https://doi.org/10.1515/ijb-2016-0033","url":null,"abstract":"The concomitant emergence of big data, explosion of ubiquitous computational resources and democratization of the access to more powerful computing make it necessary and possible to rethink pragmatically the practice of statistics. While numerous machine learning methods provide much ever easier access to datamining tools and sophisticated prediction, there is a growing realization that ad hoc and non-prespecified approaches to high-dimensional problems lend themselves to a proliferation of “findings” of dubious reproducibility. This period of fast-paced evolution is thus a blessing for statistics. It is a golden opportunity to build upon more than a century of methodological research in statistics and five decades of methodological research in machine learning to bend the course of statistics in a new direction, away from the misuse of parametric models and reporting of non-robust inference, to tackle rigorously the challenges that we, as a community, are confronted with. The foundation of statistics is incorporating knowledge about the data-generating experiment through the definition of a statistical model (a set of laws), formalizing the question of interest through the definition of an estimand seen as the value of a statistical parameter (a functional mapping the model to a parameter set) at the true law of the experiment and inferring the estimand based on data yielded by the experiment. Typically, one would construct an estimator of (a collection of key features of) the true law and evaluate the statistical parameter at its value. The present special issue broadly focuses on the inference of various statistical parameters in situations where either the data-generating law or the statistical parameter or both are dataadaptively defined and/or estimated. Statistical theory has advanced in sync with scientific computing so practical implementation is now possible for the resulting computationally challenging estimators. We asked researchers currently engaged in cutting edge research on data-adaptive inferential methods to share their views with us. The result is a compelling collection of advances in statistical theory and practice. The special issue consists of 19 articles. Its theoretical spectrum is wide. Semiparametric models and inference, empirical process theory and machine learning are the three major subfields explored in the articles. Across this special issue, the acceptation of the word inference covers the estimation of finite-dimensional parameters and the construction of confidence regions for them, the estimation of infinite-dimensional features (either as an endgame or as a means to an end); testing hypotheses (for the sake of making discoveries), identifying particular subgroups in a population, selecting (groups or clusters of) significant variables, comparing data-adaptive predictors. Cross-validating, decomposing a task in a series of sub-tasks (by partitioning or relying on a recurrence), fluctuating and weighting are t","PeriodicalId":49058,"journal":{"name":"International Journal of Biostatistics","volume":"12 1","pages":"1"},"PeriodicalIF":1.2,"publicationDate":"2016-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/ijb-2016-0033","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34427721","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alan E Hubbard, Sara Kherad-Pajouh, Mark J van der Laan
{"title":"Statistical Inference for Data Adaptive Target Parameters.","authors":"Alan E Hubbard, Sara Kherad-Pajouh, Mark J van der Laan","doi":"10.1515/ijb-2015-0013","DOIUrl":"https://doi.org/10.1515/ijb-2015-0013","url":null,"abstract":"Abstract Consider one observes n i.i.d. copies of a random variable with a probability distribution that is known to be an element of a particular statistical model. In order to define our statistical target we partition the sample in V equal size sub-samples, and use this partitioning to define V splits in an estimation sample (one of the V subsamples) and corresponding complementary parameter-generating sample. For each of the V parameter-generating samples, we apply an algorithm that maps the sample to a statistical target parameter. We define our sample-split data adaptive statistical target parameter as the average of these V-sample specific target parameters. We present an estimator (and corresponding central limit theorem) of this type of data adaptive target parameter. This general methodology for generating data adaptive target parameters is demonstrated with a number of practical examples that highlight new opportunities for statistical learning from data. This new framework provides a rigorous statistical methodology for both exploratory and confirmatory analysis within the same data. Given that more research is becoming “data-driven”, the theory developed within this paper provides a new impetus for a greater involvement of statistical inference into problems that are being increasingly addressed by clever, yet ad hoc pattern finding methods. To suggest such potential, and to verify the predictions of the theory, extensive simulation studies, along with a data analysis based on adaptively determined intervention rules are shown and give insight into how to structure such an approach. The results show that the data adaptive target parameter approach provides a general framework and resulting methodology for data-driven science.","PeriodicalId":49058,"journal":{"name":"International Journal of Biostatistics","volume":"12 1","pages":"3-19"},"PeriodicalIF":1.2,"publicationDate":"2016-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/ijb-2015-0013","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34427722","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Sequential Rejection Testing Method for High-Dimensional Regression with Correlated Variables.","authors":"Jacopo Mandozzi, Peter Bühlmann","doi":"10.1515/ijb-2015-0008","DOIUrl":"https://doi.org/10.1515/ijb-2015-0008","url":null,"abstract":"<p><p>We propose a general, modular method for significance testing of groups (or clusters) of variables in a high-dimensional linear model. In presence of high correlations among the covariables, due to serious problems of identifiability, it is indispensable to focus on detecting groups of variables rather than singletons. We propose an inference method which allows to build in hierarchical structures. It relies on repeated sample splitting and sequential rejection, and we prove that it asymptotically controls the familywise error rate. It can be implemented on any collection of clusters and leads to improved power in comparison to more standard non-sequential rejection methods. We complement the theoretical analysis with empirical results for simulated and real data.</p>","PeriodicalId":49058,"journal":{"name":"International Journal of Biostatistics","volume":"12 1","pages":"79-95"},"PeriodicalIF":1.2,"publicationDate":"2016-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/ijb-2015-0008","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34585337","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Influence Re-weighted G-Estimation.","authors":"Benjamin Rich, Erica E M Moodie, David A Stephens","doi":"10.1515/ijb-2015-0015","DOIUrl":"https://doi.org/10.1515/ijb-2015-0015","url":null,"abstract":"<p><p>Individualized medicine is an area that is growing, both in clinical and statistical settings, where in the latter, personalized treatment strategies are often referred to as dynamic treatment regimens. Estimation of the optimal dynamic treatment regime has focused primarily on semi-parametric approaches, some of which are said to be doubly robust in that they give rise to consistent estimators provided at least one of two models is correctly specified. In particular, the locally efficient doubly robust g-estimation is robust to misspecification of the treatment-free outcome model so long as the propensity model is specified correctly, at the cost of an increase in variability. In this paper, we propose data-adaptive weighting schemes that serve to decrease the impact of influential points and thus stabilize the estimator. In doing so, we provide a doubly robust g-estimator that is also robust in the sense of Hampel (15).</p>","PeriodicalId":49058,"journal":{"name":"International Journal of Biostatistics","volume":"12 1","pages":"157-77"},"PeriodicalIF":1.2,"publicationDate":"2016-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/ijb-2015-0015","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34059729","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nicholas Tarabelloni, Francesca Ieva, Rachele Biasi, Anna Maria Paganoni
{"title":"Use of Depth Measure for Multivariate Functional Data in Disease Prediction: An Application to Electrocardiograph Signals.","authors":"Nicholas Tarabelloni, Francesca Ieva, Rachele Biasi, Anna Maria Paganoni","doi":"10.1515/ijb-2014-0041","DOIUrl":"https://doi.org/10.1515/ijb-2014-0041","url":null,"abstract":"<p><p>In this paper we develop statistical methods to compare two independent samples of multivariate functional data that differ in terms of covariance operators. In particular we generalize the concept of depth measure to this kind of data, exploiting the role of the covariance operators in weighting the components that define the depth. Two simulation studies are carried out to validate the robustness of the proposed methods and to test their effectiveness in some settings of interest. We present an application to Electrocardiographic (ECG) signals aimed at comparing physiological subjects and patients affected by Left Bundle Branch Block. The proposed depth measures computed on data are then used to perform a nonparametric comparison test among these two populations. They are also introduced into a generalized regression model aimed at classifying the ECG signals. </p>","PeriodicalId":49058,"journal":{"name":"International Journal of Biostatistics","volume":"11 2","pages":"189-201"},"PeriodicalIF":1.2,"publicationDate":"2015-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/ijb-2014-0041","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"33419632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}