{"title":"Statistical Depth Meets Machine Learning: Kernel Mean Embeddings and Depth in Functional Data Analysis","authors":"George Wynne, Stanislav Nagy","doi":"10.1111/insr.12611","DOIUrl":"https://doi.org/10.1111/insr.12611","url":null,"abstract":"<div>\u0000 \u0000 <p>Statistical depth is the act of gauging how representative a point is compared with a reference probability measure. The depth allows introducing rankings and orderings to data living in multivariate, or function spaces. Though widely applied and with much experimental success, little theoretical progress has been made in analysing functional depths. This article highlights how the common \u0000<span></span><math>\u0000 <mi>h</mi></math>-depth and related depths from functional data analysis can be viewed as a kernel mean embedding, widely used in statistical machine learning. This facilitates answers to several open questions regarding the statistical properties of functional depths. We show that (i) \u0000<span></span><math>\u0000 <mi>h</mi></math>-depth has the interpretation of a kernel-based method; (ii) several \u0000<span></span><math>\u0000 <mi>h</mi></math>-depths possess explicit expressions, without the need to estimate them using Monte Carlo procedures; (iii) under minimal assumptions, \u0000<span></span><math>\u0000 <mi>h</mi></math>-depths and their maximisers are uniformly strongly consistent and asymptotically Gaussian (also in infinite-dimensional spaces and for imperfectly observed functional data); and (iv) several \u0000<span></span><math>\u0000 <mi>h</mi></math>-depths uniquely characterise probability distributions in separable Hilbert spaces. In addition, we also provide a link between the depth and empirical characteristic function based procedures for functional data. Finally, the unveiled connections enable to design an extension of the \u0000<span></span><math>\u0000 <mi>h</mi></math>-depth towards regression problems.</p>\u0000 </div>","PeriodicalId":14479,"journal":{"name":"International Statistical Review","volume":"93 2","pages":"317-348"},"PeriodicalIF":1.8,"publicationDate":"2025-03-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144774096","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Feature Screening for Ultrahigh Dimensional Mixed Data via Wasserstein Distance","authors":"Bing Tian, Hong Wang","doi":"10.1111/insr.12609","DOIUrl":"https://doi.org/10.1111/insr.12609","url":null,"abstract":"<div>\u0000 \u0000 <p>This article develops a novel feature screening procedure for ultrahigh dimensional mixed data based on Wasserstein distance, termed as Wasserstein-SIS. To handle the mixture of continuous and discrete data, we use Wasserstein distance as a new marginal utility to measure the difference between the joint distribution and the product of marginal distributions. In theory, we establish the sure screening property under less restrictive assumptions on data types. The proposed procedure does not require model specification, gives a more effective geometric measure to compare the discrepancy between distributions and avoids introducing biases caused by the choice of slicing rules for continuous data. Numerical comparison indicates that the proposed Wasserstein-SIS method performs better than existing methods in various models. A real data application also validates the better practicability of Wasserstein-SIS.</p>\u0000 </div>","PeriodicalId":14479,"journal":{"name":"International Statistical Review","volume":"93 2","pages":"267-287"},"PeriodicalIF":1.8,"publicationDate":"2025-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144774049","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"How Do Applied Researchers Use the Causal Forest? A Methodological Review","authors":"Patrick Rehill","doi":"10.1111/insr.12610","DOIUrl":"https://doi.org/10.1111/insr.12610","url":null,"abstract":"<p>This methodological review examines the use of the causal forest method by applied researchers across 133 peer-reviewed papers. It shows that the emerging best practice relies heavily on the approach and tools created by the original authors of the causal forest such as their grf package and the approaches given by them in examples. Generally, researchers use the causal forest on a relatively low-dimensional dataset relying on observed controls or in some cases experiments to identify effects. There are several common ways to then communicate results–by mapping out the univariate distribution of individual-level treatment effect estimates, displaying variable importance results for the forest and graphing the distribution of treatment effects across covariates that are important either for theoretical reasons or because they have high variable importance. Some deviations from this common practice are interesting and deserve further development and use. Others are unnecessary or even harmful. The paper concludes by reflecting on the emerging best practice for causal forest use and paths for future research.</p>","PeriodicalId":14479,"journal":{"name":"International Statistical Review","volume":"93 2","pages":"288-316"},"PeriodicalIF":1.8,"publicationDate":"2025-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/insr.12610","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144774102","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On the Number of Components for Matrix-Variate Mixtures: A Comparison Among Information Criteria","authors":"Salvatore D. Tomarchio, Antonio Punzo","doi":"10.1111/insr.12607","DOIUrl":"https://doi.org/10.1111/insr.12607","url":null,"abstract":"<p>This study explores the crucial task of determining the optimal number of components in mixture models, known as mixture order, when considering matrix-variate data. Despite the growing interest in this data type among practitioners and researchers, the effectiveness of information criteria in selecting the mixture order remains largely unexplored in this branch of the literature. Although the Bayesian information criterion (BIC) is commonly utilised, its effectiveness is only marginally tested in this context, and several other potentially valuable criteria exist. An extensive simulation study evaluates the performance of 10 information criteria across various data structures, specifically focusing on matrix-variate normal mixtures.</p>","PeriodicalId":14479,"journal":{"name":"International Statistical Review","volume":"93 2","pages":"222-245"},"PeriodicalIF":1.8,"publicationDate":"2025-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/insr.12607","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144774150","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zizhong Tian, Vernon M. Chinchilli, Chan Shen, Shouhao Zhou
{"title":"Chance-Corrected Interrater Agreement Statistics for Two-Rater Dichotomous Responses: A Method Review With Comparative Assessment Under Possibly Correlated Decisions","authors":"Zizhong Tian, Vernon M. Chinchilli, Chan Shen, Shouhao Zhou","doi":"10.1111/insr.12606","DOIUrl":"https://doi.org/10.1111/insr.12606","url":null,"abstract":"<p>Measurement of the interrater agreement (IRA) is critical for assessing the reliability and validity of ratings in various disciplines. While numerous IRA statistics have been developed, there is a lack of guidance on selecting appropriate measures especially when raters' decisions could be correlated. To address this gap, we review a family of chance-corrected IRA statistics for two-rater dichotomous-response cases, a fundamental setting that not only serves as the theoretical foundation for categorical-response or multirater IRA methods but is also practically dominant in most empirical studies, and we propose a novel data-generating framework to simulate correlated decision processes between raters. Subsequently, a new estimand, which calibrates the ‘true’ chance-corrected IRA, is introduced while accounting for the potential ‘probabilistic certainty’. Extensive simulations were conducted to evaluate the performance of the reviewed IRA methods under various practical scenarios and were summarised by an agglomerative hierarchical clustering analysis. Finally, we provide recommendations for selecting appropriate IRA statistics based on outcome prevalence and rater characteristics and highlight the need for further advancements in IRA estimation methodologies.</p>","PeriodicalId":14479,"journal":{"name":"International Statistical Review","volume":"93 2","pages":"199-221"},"PeriodicalIF":1.8,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/insr.12606","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144773939","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Revisiting Estimation of Number of Trials in Binomial Distribution","authors":"Mina Georgieva, Brani Vidakovic","doi":"10.1111/insr.12608","DOIUrl":"https://doi.org/10.1111/insr.12608","url":null,"abstract":"<div>\u0000 \u0000 <p>Estimating the parameter \u0000<span></span><math>\u0000 <mi>n</mi></math> when \u0000<span></span><math>\u0000 <mi>p</mi></math> is known or simultaneous estimation of \u0000<span></span><math>\u0000 <mi>n</mi></math> and \u0000<span></span><math>\u0000 <mi>p</mi></math> of the binomial distribution based on \u0000<span></span><math>\u0000 <mi>k</mi>\u0000 <mo>≥</mo>\u0000 <mn>1</mn></math> independent observations has been considered by many authors over the last several decades. A range of estimators have been proposed, and questions regarding asymptotic and small sample properties received adequate treatment. In this paper, we provide an extensive review and a comprehensive performance comparison of the estimators from the literature. We propose a conceptually simple estimator of \u0000<span></span><math>\u0000 <mi>n</mi></math> that uses the marginal likelihood when \u0000<span></span><math>\u0000 <mi>p</mi></math> is integrated out by simultaneous optimisation w.r.t. \u0000<span></span><math>\u0000 <mi>n</mi></math> and the hyperparameters. We compare the proposed estimator with various existing estimators and find its performance competitive and, in some scenarios, superior.</p>\u0000 </div>","PeriodicalId":14479,"journal":{"name":"International Statistical Review","volume":"93 2","pages":"246-266"},"PeriodicalIF":1.8,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144773904","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Conversation With Amy Racine-Poon","authors":"Oleksandr Sverdlov","doi":"10.1111/insr.12605","DOIUrl":"https://doi.org/10.1111/insr.12605","url":null,"abstract":"<div>\u0000 \u0000 <p><b>Professor Dr. Amy Racine-Poon is best known for her interdisciplinary contributions as an applied Bayesian statistician in the pharmaceutical industry and healthcare. She was born in Hong Kong and obtained a BA with upper honors in Mathematics (1970) from the Chinese University of Hong Kong. She earned a PhD in statistics from the University of California, Berkeley, under the supervision of Erich L. Lehmann. She worked as a Lecturer at the Department of Statistics at UC Berkeley (1975–1977) and as a Statistician at the Biometry Branch of the National Institute of Environmental Health in Research Triangle Park, North Carolina (1977–1980). Amy moved to Basel, Switzerland, in 1981 to join Ciba-Geiby/Novartis AG, where she worked for 42 years (1981–2023) across different therapeutic areas and stages of drug development, applying her skills in advanced statistical and pharmacometric methodologies that led to the development of large number of new drugs. During her career, she was also a Visiting Professor at the Department of Mathematics, Imperial College London (1995–1997) and a Volunteer Statistical Expert at Bill & Melinda Gates Foundation, Seattle, Washington (2015–2019). Amy Racine-Poon's numerous honors include the Royal Statistical Society Greenfield Industrial Medal for Innovative Use of Statistics in the Industries (1995), Fellow of the American Statistical Association (1997), Novartis Distinguished Scientist Award (1999), American Statistical Association Youden Interlaboratory Research Award (2020) and the Sheiner-Beal Pharmacometrics Award (2024) from the American Society of Clinical Pharmacology and Therapeutics. The following conversation took place between Oleksandr Sverdlov (Alex) and Amy Racine-Poon (Amy) in October 2024.</b></p>\u0000 </div>","PeriodicalId":14479,"journal":{"name":"International Statistical Review","volume":"93 2","pages":"183-198"},"PeriodicalIF":1.8,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144773905","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Statistics: Multivariate Data Integration Using R; Methods and Applications With the mixOmics Package Kim-Anh Lê Cao, Zoe Marie WelhamChapman & Hall/CRC, 2021, xxi + 308 pages, £84.99/$115.00, hardcover ISBN: 978-1032128078 eBook ISBN: 9781003026860","authors":"Krzysztof Podgórski","doi":"10.1111/insr.12599","DOIUrl":"https://doi.org/10.1111/insr.12599","url":null,"abstract":"","PeriodicalId":14479,"journal":{"name":"International Statistical Review","volume":"92 3","pages":"483-484"},"PeriodicalIF":1.7,"publicationDate":"2024-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142579677","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Machine Learning Theory and Applications: Hands-On Use Cases With Python on Classical and Quantum Machines, Xavier Vasques, John Wiley & Sons, 2024, xx + 487 pages, $89.95, hardcover ISBN: 978-1-394-22061-8","authors":"Shuangzhe Liu","doi":"10.1111/insr.12602","DOIUrl":"https://doi.org/10.1111/insr.12602","url":null,"abstract":"","PeriodicalId":14479,"journal":{"name":"International Statistical Review","volume":"92 3","pages":"490-491"},"PeriodicalIF":1.7,"publicationDate":"2024-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142579757","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}