{"title":"Variable selection methods for model-based clustering","authors":"Michael Fop, T. B. Murphy","doi":"10.1214/18-SS119","DOIUrl":"https://doi.org/10.1214/18-SS119","url":null,"abstract":"Model-based clustering is a popular approach for clustering multivariate data which has seen applications in numerous fields. Nowadays, high-dimensional data are more and more common and the model-based clustering approach has adapted to deal with the increasing dimensionality. In particular, the development of variable selection techniques has received a lot of attention and research effort in recent years. Even for small size problems, variable selection has been advocated to facilitate the interpretation of the clustering results. This review provides a summary of the methods developed for variable selection in model-based clustering. Existing R packages implementing the different methods are indicated and illustrated in application to two data analysis examples.","PeriodicalId":46627,"journal":{"name":"Statistics Surveys","volume":"210 1","pages":"18-65"},"PeriodicalIF":3.3,"publicationDate":"2017-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76116167","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A survey of bootstrap methods in finite population sampling","authors":"Z. Mashreghi, D. Haziza, C. Léger","doi":"10.1214/16-SS113","DOIUrl":"https://doi.org/10.1214/16-SS113","url":null,"abstract":"","PeriodicalId":46627,"journal":{"name":"Statistics Surveys","volume":"12 1","pages":"1-52"},"PeriodicalIF":3.3,"publicationDate":"2016-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73260692","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Statistics SurveysPub Date : 2016-01-01Epub Date: 2016-11-17DOI: 10.1214/16-SS116
Julie Josse, Susan Holmes
{"title":"Measuring multivariate association and beyond.","authors":"Julie Josse, Susan Holmes","doi":"10.1214/16-SS116","DOIUrl":"https://doi.org/10.1214/16-SS116","url":null,"abstract":"<p><p>Simple correlation coefficients between two variables have been generalized to measure association between two matrices in many ways. Coefficients such as the RV coefficient, the distance covariance (dCov) coefficient and kernel based coefficients are being used by different research communities. Scientists use these coefficients to test whether two random vectors are linked. Once it has been ascertained that there is such association through testing, then a next step, often ignored, is to explore and uncover the association's underlying patterns. This article provides a survey of various measures of dependence between random vectors and tests of independence and emphasizes the connections and differences between the various approaches. After providing definitions of the coefficients and associated tests, we present the recent improvements that enhance their statistical properties and ease of interpretation. We summarize multi-table approaches and provide scenarii where the indices can provide useful summaries of heterogeneous multi-block data. We illustrate these different strategies on several examples of real data and suggest directions for future research.</p>","PeriodicalId":46627,"journal":{"name":"Statistics Surveys","volume":"10 ","pages":"132-167"},"PeriodicalIF":3.3,"publicationDate":"2016-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1214/16-SS116","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35553938","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Some models and methods for the analysis of observational data","authors":"J. A. Ferreira","doi":"10.1214/15-SS110","DOIUrl":"https://doi.org/10.1214/15-SS110","url":null,"abstract":"","PeriodicalId":46627,"journal":{"name":"Statistics Surveys","volume":"214 1","pages":"106-208"},"PeriodicalIF":3.3,"publicationDate":"2015-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75584060","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Semi-Parametric Estimation for Conditional Independence Multivariate Finite Mixture Models","authors":"D. Chauveau, D. Hunter, M. Levine","doi":"10.1214/15-SS108","DOIUrl":"https://doi.org/10.1214/15-SS108","url":null,"abstract":"The conditional independence assumption for nonparametric multivariate finite mixture models, a weaker form of the well-known conditional independence assumption for random effects models for longitudinal data, is the subject of an increasing number of theoretical and algorithmic developments in the statistical literature. After presenting a survey of this literature, including an in-depth discussion of the all-important identifiability results, this article describes and extends an algorithm for estimation of the parameters in these models. The algorithm works for any number of components in three or more dimensions. It possesses a descent property and can be easily adapted to situations where the data are grouped in blocks of conditionally independent variables. We discuss how to adapt this algorithm to various location-scale models that link component densities, and we even adapt it to a particular class of univariate mixture problems in which the components are assumed symmetric. We give a bandwidth selection procedure for our algorithm. Finally, we demonstrate the effectiveness of our algorithm using a simulation study and two psychometric datasets.","PeriodicalId":46627,"journal":{"name":"Statistics Surveys","volume":"1 1","pages":"1-31"},"PeriodicalIF":3.3,"publicationDate":"2015-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82153457","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A comparison of spatial predictors when datasets could be very large","authors":"J. Bradley, N. Cressie, Tao Shi","doi":"10.1214/16-SS115","DOIUrl":"https://doi.org/10.1214/16-SS115","url":null,"abstract":"In this article, we review and compare a number of methods of spatial prediction. To demonstrate the breadth of available choices, we consider both traditional and more-recently-introduced spatial predictors. Specifically, in our exposition we review: traditional stationary kriging, smoothing splines, negative-exponential distance-weighting, Fixed Rank Kriging, modified predictive processes, a stochastic partial differential equation approach, and lattice kriging. This comparison is meant to provide a service to practitioners wishing to decide between spatial predictors. Hence, we provide technical material for the unfamiliar, which includes the definition and motivation for each (deterministic and stochastic) spatial predictor. We use a benchmark dataset of $mathrm{CO}_{2}$ data from NASA's AIRS instrument to address computational efficiencies that include CPU time and memory usage. Furthermore, the predictive performance of each spatial predictor is assessed empirically using a hold-out subset of the AIRS data.","PeriodicalId":46627,"journal":{"name":"Statistics Surveys","volume":"9 1","pages":"100-131"},"PeriodicalIF":3.3,"publicationDate":"2014-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81924992","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Statistics SurveysPub Date : 2014-01-01Epub Date: 2014-12-09DOI: 10.1214/14-SS107
Adrien Saumard, Jon A Wellner
{"title":"Log-Concavity and Strong Log-Concavity: a review.","authors":"Adrien Saumard, Jon A Wellner","doi":"10.1214/14-SS107","DOIUrl":"https://doi.org/10.1214/14-SS107","url":null,"abstract":"<p><p>We review and formulate results concerning log-concavity and strong-log-concavity in both discrete and continuous settings. We show how preservation of log-concavity and strongly log-concavity on ℝ under convolution follows from a fundamental monotonicity result of Efron (1969). We provide a new proof of Efron's theorem using the recent asymmetric Brascamp-Lieb inequality due to Otto and Menz (2013). Along the way we review connections between log-concavity and other areas of mathematics and statistics, including concentration of measure, log-Sobolev inequalities, convex geometry, MCMC algorithms, Laplace approximations, and machine learning.</p>","PeriodicalId":46627,"journal":{"name":"Statistics Surveys","volume":"8 ","pages":"45-114"},"PeriodicalIF":3.3,"publicationDate":"2014-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1214/14-SS107","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34446771","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Adaptive clinical trial designs for phase I cancer studies","authors":"O. Sverdlov, W. Wong, Y. Ryeznik","doi":"10.1214/14-SS106","DOIUrl":"https://doi.org/10.1214/14-SS106","url":null,"abstract":"Adaptive clinical trials are becoming increasingly popular research designs for clinical investigation. Adaptive designs are particularly useful in phase I cancer studies where clinical data are scant and the goals are to assess the drug dose-toxicity profile and to determine the maximum tolerated dose while minimizing the number of study patients treated at suboptimal dose levels. In the current work we give an overview of adaptive design methods for phase I cancer trials. We find that modern statistical literature is replete with novel adaptive designs that have clearly defined objectives and established statistical properties, and are shown to outperform conventional dose finding methods such as the 3+3 design, both in terms of statistical efficiency and in terms of minimizing the number of patients treated at highly toxic or nonefficacious doses. We discuss statistical, logistical, and regulatory aspects of these designs and present some links to non-commercial statistical software for implementing these methods in practice. MSC 2010 subject classifications: Primary 62L05, 62L10, 62L12; secondary 62L20.","PeriodicalId":46627,"journal":{"name":"Statistics Surveys","volume":"154 1","pages":"2-44"},"PeriodicalIF":3.3,"publicationDate":"2014-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77126913","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"$M$-functionals of multivariate scatter","authors":"L. Duembgen, Markus Pauly, T. Schweizer","doi":"10.1214/15-SS109","DOIUrl":"https://doi.org/10.1214/15-SS109","url":null,"abstract":"This survey provides a self-contained account of M-estimation of multivariate scatter. In particular, we present new proofs for existence of the underlying M-functionals and discuss their weak continuity and differentiability. This is done in a rather general framework with matrix-valued random variables. By doing so we reveal a connection between Tyler's (1987) M-functional of scatter and the estimation of proportional covariance matrices. Moreover, this general framework allows us to treat a new class of scatter estimators, based on symmetrizations of arbitrary order. Finally these results are applied to M-estimation of multivariate location and scatter via multivariate t-distributions.","PeriodicalId":46627,"journal":{"name":"Statistics Surveys","volume":"49 1","pages":"32-105"},"PeriodicalIF":3.3,"publicationDate":"2013-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76269753","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The implementation of cross-sectional weights in household panel surveys","authors":"Matthias Schonlau, M. Kroh, N. Watson","doi":"10.1214/13-SS104","DOIUrl":"https://doi.org/10.1214/13-SS104","url":null,"abstract":"While household panel surveys are longitudinal in nature crosssectional sampling weights are also of interest. The computation of crosssectional weights is challenging because household compositions change over time. Sampling probabilities of household entrants after wave 1 are generally not known and assigning them zero weight is not satisfying. Two common approaches to cross-sectional weighting address this issue: (1) “shared weights” and (2) modeling or estimating unobserved sampling probabilities based on person-level characteristics. We survey how several well-known national household panels address cross-sectional weights for different groups of respondents (including immigrants and births) and in different situations (including household mergers and splits). When a new person moves into a household, both “shared weights” and “modeling” lead to reduced individual weights of pre-existing household members, but differences due to the approach arise elsewhere. The implementation of “shared weights” is problematic when the panel contains households without a household member already present in wave 1. Panels also differ in the treatment of immigrants, household merges, and sometimes on how weights are assigned to children born to wave 1 panel members.","PeriodicalId":46627,"journal":{"name":"Statistics Surveys","volume":" 10","pages":"37-57"},"PeriodicalIF":3.3,"publicationDate":"2013-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1214/13-SS104","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72385387","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}