{"title":"Biostatistics Faculty and NIH Awards at U.S. Medical Schools.","authors":"Guangxiang Zhang, John J Chen","doi":"10.1080/00031305.2014.992959","DOIUrl":"https://doi.org/10.1080/00031305.2014.992959","url":null,"abstract":"<p><p>Statistical principles and methods are critical to the success of biomedical and translational research. However, it is difficult to track and evaluate the monetary value of a biostatistician to a medical school (SoM). Limited published data on this topic is available, especially comparing across SoMs. Using National Institutes of Health (NIH) awards and American Association of Medical Colleges (AAMC) faculty counts data (2010-2013), together with online information on biostatistics faculty from 119 institutions across the country, we demonstrated that the number of biostatistics faculty was significantly positively associated with the amount of NIH awards, both as a school total and on a per faculty basis, across various sizes of U.S. SoMs. Biostatisticians, as a profession, need to be proactive in communicating and advocating the value of their work and their unique contribution to the long-term success of a biomedical research enterprise.</p>","PeriodicalId":50801,"journal":{"name":"American Statistician","volume":"69 1","pages":"34-40"},"PeriodicalIF":1.8,"publicationDate":"2015-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/00031305.2014.992959","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"33205177","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Thaddeus Tarpey, R Todd Ogden, Eva Petkova, Ronald Christensen
{"title":"A Paradoxical Result in Estimating Regression Coefficients.","authors":"Thaddeus Tarpey, R Todd Ogden, Eva Petkova, Ronald Christensen","doi":"10.1080/00031305.2014.940467","DOIUrl":"https://doi.org/10.1080/00031305.2014.940467","url":null,"abstract":"<p><p>This paper presents a counterintuitive result regarding the estimation of a regression slope co-efficient. Paradoxically, the precision of the slope estimator can deteriorate when additional information is used to estimate its value. In a randomized experiment, the distribution of baseline variables should be identical across treatments due to randomization. The motivation for this paper came from noting that the precision of slope estimators deteriorated when pooling baseline predictors across treatment groups.</p>","PeriodicalId":50801,"journal":{"name":"American Statistician","volume":"68 4","pages":"271-276"},"PeriodicalIF":1.8,"publicationDate":"2014-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/00031305.2014.940467","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"33002968","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Vardeman, S. B. and Morris, M. D. (2013), \"Majority Voting by Independent Classifiers can Increase Error Rates,\" <i>The American Statistician</i>, 67, 94-96: Comment by Baker, Xu, Hu, and Huang and Reply.","authors":"Stuart G Baker, Jian-Lun Xu, Ping Hu, Peng Huang","doi":"10.1080/00031305.2014.882867","DOIUrl":"https://doi.org/10.1080/00031305.2014.882867","url":null,"abstract":"Vardeman and Morris (VM) found a counterexample to the assertion that a majority voting classifier always performs better than its independent component classifiers. VM's counterexample applies to independent classifiers, but biostatisticians are often more interested in conditionally independent classifiers. In biomedical studies, where class is disease status, classifiers are inherently dependent simply because positivity of any reasonable classifier depends on the presence or absence of disease. Conditional independence of classifiers, given disease status, could arise if the classifiers are detecting different biological phenomenon, such as tissue abnormalities versus protein markers. \u0000 \u0000To explore how majority voting affects classification performance with conditionally independent classifiers, we investigated many examples (Figure 1). Much as we expected, we found that it generally works quite well. However, we also found that conditional independence is not a sufficient condition to ensure that majority voting always leads to better classification performance than the individual classifiers. \u0000 \u0000 \u0000 \u0000Figure 1 \u0000 \u0000Comparison of ROC curves for majority voting classifier and conditionally independent component classifiers. The 45-degree line is included for reference. \u0000 \u0000 \u0000 \u0000As with VM, we considered two classes and component classifiers with identical classification performances. To measure classification performance we used receiver operating characteristic (ROC) curves. ROC curves play a central role in the evaluation of diagnostic and screening tests (Baker 2003; Pepe 2003). In accordance with a decision theory view of ROC curves (Baker, Van Calster, and Steyerberg 2012), we restricted our investigation to ROC curves that are concave, namely with monotonically decreasing slopes from left to right. For a given cutpoint x of a score, let fpr(x) and tpr(x) denote the false positive and true positive rates of the component classifier. The ROC curve for the component classifier plots tpr(x) versus fpr(x). At a given cutpoint, the true positive rate for the majority voting classifier is the probability of three or exactly two true positives among the component classifiers, namely tprM(x) = tpr(x)3 + 3 tpr(x)2 {1−tpr(x)}. Similarly the false positive rate for the majority voting classifier is fprM(x) = fpr(x)3 + 3 fpr(x)2 {1−fpr(x)}. The ROC curve for the majority voting classifier plots tprM(x) versus fprM(x). We considered the following six cases.","PeriodicalId":50801,"journal":{"name":"American Statistician","volume":"68 2","pages":"125-126"},"PeriodicalIF":1.8,"publicationDate":"2014-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/00031305.2014.882867","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"33302005","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Kurtosis as Peakedness, 1905 - 2014. <i>R.I.P.</i>","authors":"Peter H Westfall","doi":"10.1080/00031305.2014.917055","DOIUrl":"https://doi.org/10.1080/00031305.2014.917055","url":null,"abstract":"<p><p>The incorrect notion that kurtosis somehow measures \"peakedness\" (flatness, pointiness or modality) of a distribution is remarkably persistent, despite attempts by statisticians to set the record straight. This article puts the notion to rest once and for all. Kurtosis tells you virtually nothing about the shape of the peak - its only unambiguous interpretation is in terms of tail extremity; i.e., either existing outliers (for the sample kurtosis) or propensity to produce outliers (for the kurtosis of a probability distribution). To clarify this point, relevant literature is reviewed, counterexample distributions are given, and it is shown that the proportion of the kurtosis that is determined by the central μ ± σ range is usually quite small.</p>","PeriodicalId":50801,"journal":{"name":"American Statistician","volume":"68 3","pages":"191-195"},"PeriodicalIF":1.8,"publicationDate":"2014-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/00031305.2014.917055","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"33377287","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Simple Density-Based Empirical Likelihood Ratio Test for Independence.","authors":"Albert Vexler, Wan-Min Tsai, Alan D Hutson","doi":"10.1080/00031305.2014.901922","DOIUrl":"https://doi.org/10.1080/00031305.2014.901922","url":null,"abstract":"<p><p>We develop a novel nonparametric likelihood ratio test for independence between two random variables using a technique that is free of the common constraints of defining a given set of specific dependence structures. Our methodology revolves around an exact density-based empirical likelihood ratio test statistic that approximates in a distribution-free fashion the corresponding most powerful parametric likelihood ratio test. We demonstrate that the proposed test is very powerful in detecting general structures of dependence between two random variables, including non-linear and/or random-effect dependence structures. An extensive Monte Carlo study confirms that the proposed test is superior to the classical nonparametric procedures across a variety of settings. The real-world applicability of the proposed test is illustrated using data from a study of biomarkers associated with myocardial infarction.</p>","PeriodicalId":50801,"journal":{"name":"American Statistician","volume":"48 3","pages":"158-169"},"PeriodicalIF":1.8,"publicationDate":"2014-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/00031305.2014.901922","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"32742148","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Discussion of \"The Need for More Emphasis on Prediction: A 'Nondenominational' Model-Based Approach\"","authors":"Hal Stern","doi":"10.1080/00031305.2014.897257","DOIUrl":"https://doi.org/10.1080/00031305.2014.897257","url":null,"abstract":"David Harville has provided a compelling case for an increased focus on prediction in the teaching of statistics. I am very sympathetic to Harville’s plea. Indeed, it was the ability of statistical methods to address prediction problems (in sports and finance, the same fields that Harville mentions) that attracted me to the field of statistics more than 30 years ago. Even in application areas where the focus has been on parameter estimation, e.g., regression coefficients in economics or treatment effects in clinical trials in medicine, it seems quite natural to me to think of these parameter estimates in terms of the predictions that they imply. Given that I agree with Harville on the central role of prediction, my comments below concern his focus on models and the relevance of the “nondenominational” approach.","PeriodicalId":50801,"journal":{"name":"American Statistician","volume":"68 2","pages":"83-84"},"PeriodicalIF":1.8,"publicationDate":"2014-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/00031305.2014.897257","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"32447447","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Logistic Regression with Multiple Random Effects: A Simulation Study of Estimation Methods and Statistical Packages.","authors":"Yoonsang Kim, Young-Ku Choi, Sherry Emery","doi":"10.1080/00031305.2013.817357","DOIUrl":"https://doi.org/10.1080/00031305.2013.817357","url":null,"abstract":"<p><p>Several statistical packages are capable of estimating generalized linear mixed models and these packages provide one or more of three estimation methods: penalized quasi-likelihood, Laplace, and Gauss-Hermite. Many studies have investigated these methods' performance for the mixed-effects logistic regression model. However, the authors focused on models with one or two random effects and assumed a simple covariance structure between them, which may not be realistic. When there are multiple correlated random effects in a model, the computation becomes intensive, and often an algorithm fails to converge. Moreover, in our analysis of smoking status and exposure to anti-tobacco advertisements, we have observed that when a model included multiple random effects, parameter estimates varied considerably from one statistical package to another even when using the same estimation method. This article presents a comprehensive review of the advantages and disadvantages of each estimation method. In addition, we compare the performances of the three methods across statistical packages via simulation, which involves two- and three-level logistic regression models with at least three correlated random effects. We apply our findings to a real dataset. Our results suggest that two packages-SAS GLIMMIX Laplace and SuperMix Gaussian quadrature-perform well in terms of accuracy, precision, convergence rates, and computing speed. We also discuss the strengths and weaknesses of the two packages in regard to sample sizes.</p>","PeriodicalId":50801,"journal":{"name":"American Statistician","volume":"67 3","pages":""},"PeriodicalIF":1.8,"publicationDate":"2013-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/00031305.2013.817357","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"31913836","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Note on an Identity Between Two Unbiased Variance Estimators for the Grand Mean in a Simple Random Effects Model.","authors":"Bruce Levin, Cheng-Shiun Leu","doi":"10.1080/00031305.2012.752105","DOIUrl":"https://doi.org/10.1080/00031305.2012.752105","url":null,"abstract":"<p><p>We demonstrate the algebraic equivalence of two unbiased variance estimators for the sample grand mean in a random sample of subjects from an infinite population where subjects provide repeated observations following a homoscedastic random effects model.</p>","PeriodicalId":50801,"journal":{"name":"American Statistician","volume":"67 1","pages":"42-43"},"PeriodicalIF":1.8,"publicationDate":"2013-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/00031305.2012.752105","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"31454485","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yeyi Zhu, Ladia M Hernandez, Peter Mueller, Yongquan Dong, Michele R Forman
{"title":"Data Acquisition and Preprocessing in Studies on Humans: What Is Not Taught in Statistics Classes?","authors":"Yeyi Zhu, Ladia M Hernandez, Peter Mueller, Yongquan Dong, Michele R Forman","doi":"10.1080/00031305.2013.842498","DOIUrl":"10.1080/00031305.2013.842498","url":null,"abstract":"<p><p>The aim of this paper is to address issues in research that may be missing from statistics classes and important for (bio-)statistics students. In the context of a case study, we discuss data acquisition and preprocessing steps that fill the gap between research questions posed by subject matter scientists and statistical methodology for formal inference. Issues include participant recruitment, data collection training and standardization, variable coding, data review and verification, data cleaning and editing, and documentation. Despite the critical importance of these details in research, most of these issues are rarely discussed in an applied statistics program. One reason for the lack of more formal training is the difficulty in addressing the many challenges that can possibly arise in the course of a study in a systematic way. This article can help to bridge this gap between research questions and formal statistical inference by using an illustrative case study for a discussion. We hope that reading and discussing this paper and practicing data preprocessing exercises will sensitize statistics students to these important issues and achieve optimal conduct, quality control, analysis, and interpretation of a study.</p>","PeriodicalId":50801,"journal":{"name":"American Statistician","volume":"67 4","pages":"235-241"},"PeriodicalIF":1.8,"publicationDate":"2013-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3912269/pdf/nihms537499.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"32104198","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Overview of Current Software Procedures for Fitting Linear Mixed Models.","authors":"Brady T West, Andrzej T Galecki","doi":"10.1198/tas.2011.11077","DOIUrl":"https://doi.org/10.1198/tas.2011.11077","url":null,"abstract":"At present, there are many software procedures available that enable statisticians to fit linear mixed models (LMMs) to continuous dependent variables in clustered or longitudinal datasets. LMMs are flexible tools for analyzing relationships among variables in these types of datasets, in that a variety of covariance structures can be used depending on the subject matter under study. The explicit random effects in LMMs allow analysts to make inferences about the variability between clusters or subjects in larger hypothetical populations, and examine cluster- or subject-level variables that explain portions of this variability. These models can also be used to analyze longitudinal or clustered datasets with data that are missing at random (MAR), and can accommodate time-varying covariates in longitudinal datasets. Although the software procedures currently available have many features in common, more specific analytic aspects of fitting LMMs (e.g., crossed random effects, appropriate hypothesis testing for variance components, diagnostics, incorporating sampling weights) may only be available in selected software procedures. With this article, we aim to perform a comprehensive and up-to-date comparison of the current capabilities of software procedures for fitting LMMs, and provide statisticians with a guide for selecting a software procedure appropriate for their analytic goals.","PeriodicalId":50801,"journal":{"name":"American Statistician","volume":"65 4","pages":"274-282"},"PeriodicalIF":1.8,"publicationDate":"2012-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1198/tas.2011.11077","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"31375746","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}