{"title":"Transforming data into actionable insights","authors":"C. Clancy","doi":"10.1080/24709360.2019.1704127","DOIUrl":"https://doi.org/10.1080/24709360.2019.1704127","url":null,"abstract":"","PeriodicalId":37240,"journal":{"name":"Biostatistics and Epidemiology","volume":"4 1","pages":"1 - 2"},"PeriodicalIF":0.0,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/24709360.2019.1704127","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45784969","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
T. Kashner, Steven S. Henley, R. Golden, Xiao‐Hua Zhou
{"title":"Making causal inferences about treatment effect sizes from observational datasets","authors":"T. Kashner, Steven S. Henley, R. Golden, Xiao‐Hua Zhou","doi":"10.1080/24709360.2019.1681211","DOIUrl":"https://doi.org/10.1080/24709360.2019.1681211","url":null,"abstract":"In the era of big data and cloud computing, analysts need statistical models to go beyond predicting outcomes to forecasting how outcomes change when decision-makers intervene to change one or more causal factors. This paper reviews methods to estimate the causal effects of treatment choices on patient health outcomes using observational datasets. Methods are limited to those that model choice of treatment (propensity scoring) and treatment outcomes (instrumental variable, difference in differences, control function). A regression framework was developed to show how unobserved confounding covariates and heterogeneous outcomes can introduce biases to effect size estimates. In response to criticisms that outcome approaches are not systematic and subject to model misspecification error, we extend the control function approach of Lu and White by applying Best Approximating Model technology (BAM-CF). Results from simulation experiments are presented to compare biases between BAM-CF and propensity scoring in the presence of an unobserved confounder. We conclude no one strategy is ‘optimal’ for all datasets, and analyst should consider multiple approaches to assess robustness. For both observational and randomized datasets, researchers should assess how moderating covariates impact estimates of treatment effect sizes so that clinicians can understand what is best for each individual patient.","PeriodicalId":37240,"journal":{"name":"Biostatistics and Epidemiology","volume":"4 1","pages":"48 - 83"},"PeriodicalIF":0.0,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/24709360.2019.1681211","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44059431","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Elsa Vazquez Arreola, Kyle M. Irimata, Jeffrey R. Wilson
{"title":"Common errors of interpretation in biostatistics","authors":"Elsa Vazquez Arreola, Kyle M. Irimata, Jeffrey R. Wilson","doi":"10.1080/24709360.2020.1790085","DOIUrl":"https://doi.org/10.1080/24709360.2020.1790085","url":null,"abstract":"What do we wish to investigate? While this may be a common question in research, it does not always come with straightforward answers. This article reviews data-driven methods of collection, questions asked and questions answered, and the myriad of different conclusions that may result. We examine differences in answers to questions based on independent versus correlated observations, bivariate versus conditional associations, relations versus extrapolation, and single membership versus multiple membership modeling. Regardless of the issue, these differences are usually not due to so-called bad data or due to bad models; they are usually due to the investigators misinterpreting the answers that were given. Most importantly, one cannot ask a question and obtain an answer without understanding the data structure, its size and its representativeness. Simply stated, the fact that I went to the store and bought an outfit does not mean the outfit is appropriate for the event. The answers obtained may not be answering the question of interest.","PeriodicalId":37240,"journal":{"name":"Biostatistics and Epidemiology","volume":"4 1","pages":"238 - 246"},"PeriodicalIF":0.0,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/24709360.2020.1790085","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43256200","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Statistical modeling methods: challenges and strategies","authors":"Steven S. Henley, R. Golden, T. Kashner","doi":"10.1080/24709360.2019.1618653","DOIUrl":"https://doi.org/10.1080/24709360.2019.1618653","url":null,"abstract":"ABSTRACT Statistical modeling methods are widely used in clinical science, epidemiology, and health services research to analyze data that has been collected in clinical trials as well as observational studies of existing data sources, such as claims files and electronic health records. Diagnostic and prognostic inferences from statistical models are critical to researchers advancing science, clinical practitioners making patient care decisions, and administrators and policy makers impacting the health care system to improve quality and reduce costs. The veracity of such inferences relies not only on the quality and completeness of the collected data, but also statistical model validity. A key component of establishing model validity is determining when a model is not correctly specified and therefore incapable of adequately representing the Data Generating Process (DGP). In this article, model validity is first described and methods designed for assessing model fit, specification, and selection are reviewed. Second, data transformations that improve the model’s ability to represent the DGP are addressed. Third, model search and validation methods are discussed. Finally, methods for evaluating predictive and classification performance are presented. Together, these methods provide a practical framework with recommendations to guide the development and evaluation of statistical models that provide valid statistical inferences.","PeriodicalId":37240,"journal":{"name":"Biostatistics and Epidemiology","volume":"4 1","pages":"105 - 139"},"PeriodicalIF":0.0,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/24709360.2019.1618653","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47377251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Developments and debates on latent variable modeling in diagnostic studies when there is no gold standard","authors":"Zheyu Wang","doi":"10.1080/24709360.2019.1673623","DOIUrl":"https://doi.org/10.1080/24709360.2019.1673623","url":null,"abstract":"Latent variable modeling is often used in diagnostic studies where a gold standard reference test is not available. Its applications have become increasing popular with the fast discovery of novel biomarkers and the effort to improve healthcare for each individual. This paper attempt to provide a review on current developments and debates of these models with a focus in diagnostic studies and to discuss the value as well as cautionary considerations in the applications of these models.","PeriodicalId":37240,"journal":{"name":"Biostatistics and Epidemiology","volume":"5 1","pages":"100 - 117"},"PeriodicalIF":0.0,"publicationDate":"2019-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/24709360.2019.1673623","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45189867","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Albatineh, M. Wilcox, B. Zogheib, M. Niewiadomska-Bugaj
{"title":"How many clusters exist? Answer via maximum clustering similarity implemented in R","authors":"A. Albatineh, M. Wilcox, B. Zogheib, M. Niewiadomska-Bugaj","doi":"10.1080/24709360.2019.1615770","DOIUrl":"https://doi.org/10.1080/24709360.2019.1615770","url":null,"abstract":"Finding the number of clusters in a data set is considered as one of the fundamental problems in cluster analysis. This paper integrates maximum clustering similarity (MCS), for finding the optimal number of clusters, into R statistical software through the package MCSim. The similarity between the two clustering methods is calculated at the same number of clusters, using Rand [Objective criteria for the evaluation of clustering methods. J Am Stat Assoc. 1971;66:846–850.] and Jaccard [The distribution of the flora of the alpine zone. New Phytologist. 1912;11:37–50.] indices, corrected for chance agreement. The number of clusters at which the index attains its maximum with most frequency is a candidate for the optimal number of clusters. Unlike other criteria, MCS can be used with circular data. Seven clustering algorithms, existing in R, are implemented in MCSim. A graph of the number of clusters vs. clusters similarity using corrected similarity indices is produced. Values of the similarity indices and a clustering tree (dendrogram) are produced. Several examples including simulated, real, and circular data sets are presented to show how MCSim successfully works in practice.","PeriodicalId":37240,"journal":{"name":"Biostatistics and Epidemiology","volume":"3 1","pages":"62 - 79"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/24709360.2019.1615770","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42954294","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Cohort study design for illness-death processes with disease status under intermittent observation","authors":"Nathalie C. Moon, Leilei Zeng, R. Cook","doi":"10.1080/24709360.2019.1699341","DOIUrl":"https://doi.org/10.1080/24709360.2019.1699341","url":null,"abstract":"Cohort studies are routinely conducted to learn about the incidence or progression rates of chronic diseases. The illness-death model offers a natural framework for joint consideration of non-fatal events in the semi-competing risks setting. We consider the design of prospective cohort studies where the goal is to estimate the effect of a marker on the risk of a non-fatal event which is subject to interval-censoring due to an intermittent observation scheme. The sample size is shown to depend on the effect of interest, the number of assessments, and the duration of follow-up. Minimum-cost designs are also developed to account for the different costs of recruitment and follow-up examination. We also consider the setting where the event status of individuals is observed subject to misclassification; the consequent need to increase the sample size to account for this error is illustrated through asymptotic calculations.","PeriodicalId":37240,"journal":{"name":"Biostatistics and Epidemiology","volume":"3 1","pages":"178 - 200"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/24709360.2019.1699341","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47561742","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhengyang Fang, J. Y. Han, N. Simon, Xiaoping Zhou
{"title":"Modified sparse functional principal component analysis for fMRI data process","authors":"Zhengyang Fang, J. Y. Han, N. Simon, Xiaoping Zhou","doi":"10.1080/24709360.2019.1591072","DOIUrl":"https://doi.org/10.1080/24709360.2019.1591072","url":null,"abstract":"Sparse and functional principal component analysis is a technique to extract sparse and smooth principal components from a matrix. In this paper, we propose a modified sparse and functional principal component analysis model for feature extraction. We measure the tuning parameters by their robustness against random perturbation, and select the tuning parameters by derivative-free optimization. We test our algorithm on the ADNI dataset to distinguish between the patients with Alzheimer's disease and the control group. By applying proper classification methods for sparse features, we get better result than classic singular value decomposition, support vector machine and logistic regression.","PeriodicalId":37240,"journal":{"name":"Biostatistics and Epidemiology","volume":"3 1","pages":"80 - 89"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/24709360.2019.1591072","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46473035","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A response adaptive design for ordinal categorical responses weighing the cumulative odds ratios","authors":"A. Biswas, Rahul Bhattacharya, Soumyadeep Das","doi":"10.1080/24709360.2019.1660111","DOIUrl":"https://doi.org/10.1080/24709360.2019.1660111","url":null,"abstract":"ABSTRACT Weighing the cumulative odds ratios suitably, a two treatment response adaptive design for phase III clinical trial is proposed for ordinal categorical responses. Properties of the proposed design are investigated theoretically as well as empirically. Applicability of the design is further verified using a data pertaining to a real clinical trial with trauma patients, where the responses are observed in an ordinal categorical scale.","PeriodicalId":37240,"journal":{"name":"Biostatistics and Epidemiology","volume":"3 1","pages":"109 - 125"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/24709360.2019.1660111","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47455932","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Regression Trees for Longitudinal Data with Baseline Covariates.","authors":"Madan Gopal Kundu, Jaroslaw Harezlak","doi":"10.1080/24709360.2018.1557797","DOIUrl":"https://doi.org/10.1080/24709360.2018.1557797","url":null,"abstract":"<p><p>Longitudinal changes in a population of interest are often heterogeneous and may be influenced by a combination of baseline factors. In such cases, traditional linear mixed effects models (Laird and Ware, 1982) assuming common parametric form for the mean structure may not be applicable. We show that the regression tree methodology for longitudinal data can identify and characterize longitudinally homogeneous subgroups. Most of the currently available regression tree construction methods are either limited to a repeated measures scenario or combine the heterogeneity among subgroups with the random inter-subject variability. We propose a longitudinal classification and regression tree (LongCART) algorithm under conditional inference framework (Hothorn, Hornik and Zeileis, 2006) that overcomes these limitations utilizing a two-step approach. The LongCART algorithm first selects the partitioning variable via a <i>parameter instability test</i> and then finds the optimal split for the selected partitioning variable. Thus, at each node, the decision of further splitting is type-I error controlled and thus it guards against variable selection bias, over-fitting and spurious splitting. We have obtained the asymptotic results for the proposed instability test and examined its finite sample behavior through simulation studies. Comparative performance of LongCART algorithm were evaluated empirically via simulation studies. Finally, we applied LongCART to study the longitudinal changes in <i>choline</i> levels among HIV-positive patients.</p>","PeriodicalId":37240,"journal":{"name":"Biostatistics and Epidemiology","volume":"3 1","pages":"1-22"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/24709360.2018.1557797","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"36896395","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}