{"title":"Statistics: Multivariate Data Integration Using R; Methods and Applications With the mixOmics Package Kim-Anh Lê Cao, Zoe Marie WelhamChapman & Hall/CRC, 2021, xxi + 308 pages, £84.99/$115.00, hardcover ISBN: 978-1032128078 eBook ISBN: 9781003026860","authors":"Krzysztof Podgórski","doi":"10.1111/insr.12599","DOIUrl":"https://doi.org/10.1111/insr.12599","url":null,"abstract":"","PeriodicalId":14479,"journal":{"name":"International Statistical Review","volume":"92 3","pages":"483-484"},"PeriodicalIF":1.7,"publicationDate":"2024-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142579677","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Machine Learning Theory and Applications: Hands-On Use Cases With Python on Classical and Quantum Machines, Xavier Vasques, John Wiley & Sons, 2024, xx + 487 pages, $89.95, hardcover ISBN: 978-1-394-22061-8","authors":"Shuangzhe Liu","doi":"10.1111/insr.12602","DOIUrl":"https://doi.org/10.1111/insr.12602","url":null,"abstract":"","PeriodicalId":14479,"journal":{"name":"International Statistical Review","volume":"92 3","pages":"490-491"},"PeriodicalIF":1.7,"publicationDate":"2024-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142579757","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Object Oriented Data Analysis J. S. Marron and I. L. DrydenChapman & Hall/CRC, 2022, xii + 424 pages, softcover ISBN: 978-0-8153-9282-8 (hbk) ISBN: 978-1-032-11480-4 (pbk) ISBN: 978-1-351-18967-5 (ebk)","authors":"Debashis Ghosh","doi":"10.1111/insr.12600","DOIUrl":"https://doi.org/10.1111/insr.12600","url":null,"abstract":"","PeriodicalId":14479,"journal":{"name":"International Statistical Review","volume":"92 3","pages":"485-486"},"PeriodicalIF":1.7,"publicationDate":"2024-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142579760","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Roberto Benedetti, Federica Piersimoni, Monica Pratesi, Nicola Salvati, Thomas Suesse
{"title":"Handling Out‐of‐Sample Areas to Estimate the Unemployment Rate at Local Labour Market Areas in Italy","authors":"Roberto Benedetti, Federica Piersimoni, Monica Pratesi, Nicola Salvati, Thomas Suesse","doi":"10.1111/insr.12596","DOIUrl":"https://doi.org/10.1111/insr.12596","url":null,"abstract":"SummaryUnemployment rate estimates for small areas are used to efficiently support the distribution of services and the allocation of resources, grants and funding. A Fay–Herriot type model is the most used tool to obtain these estimates. Under this approach out‐of‐sample areas require some synthetic estimates. As the geographical context is extremely important for analysing local economies, in this paper, we allow for area random effects to be spatially correlated. The spatial model parameters are estimated by a marginal likelihood method and are used to predict in‐sample as well as out‐of‐sample areas. Extensive simulation experiments are used to assess the impact of the auto‐regression parameter and of the rate of out‐of‐sample areas on the performance of this approach. The paper concludes with an illustrative application on real data from the Italian Labour Force Survey in which the estimation of the unemployment rate in each Local Labour Market Area is addressed.","PeriodicalId":14479,"journal":{"name":"International Statistical Review","volume":"60 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142182405","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tuo Lin, Ruohui Chen, Jinyuan Liu, Tsungchin Wu, Toni T. Gui, Yangyi Li, Xinyi Huang, Kun Yang, Guanqing Chen, Tian Chen, David R. Strong, Karen Messer, Xin M. Tu
{"title":"On Frequency and Probability Weights: An In‐Depth Look at Duelling Weights","authors":"Tuo Lin, Ruohui Chen, Jinyuan Liu, Tsungchin Wu, Toni T. Gui, Yangyi Li, Xinyi Huang, Kun Yang, Guanqing Chen, Tian Chen, David R. Strong, Karen Messer, Xin M. Tu","doi":"10.1111/insr.12594","DOIUrl":"https://doi.org/10.1111/insr.12594","url":null,"abstract":"SummaryProbability weights have been widely used in addressing selection bias arising from a variety of contexts. Common examples of probability weights include sampling weights, missing data weights, and propensity score weights. Frequency weights, which are used to control for varying variabilities of aggregated outcomes, are both conceptually and analytically different from probability weights. Popular software such as R, SAS and STATA support both types of weights. Many users, including professional statisticians, become bewildered when they see identical estimates, but different standard errors and ‐values when probability weights are treated as frequency weights. Some even completely ignore the difference between the two types of weights and treat them as the same. Although a large body of literature exists on each type of weights, we have found little, if any, discussion that provides head‐to‐head comparisons of the two types of weights and associated inference methods. In this paper, we unveil the conceptual and analytic differences between the two types of weights within the context of parametric and semi‐parametric generalised linear models (GLM) and discuss valid inference for each type of weights. To the best of our knowledge, this is the first paper that looks into such differences by identifying the conditions under which the two types of weights can be treated the same analytically and providing clear guidance on the appropriate statistical models and inference procedures for each type of weights. We illustrate these considerations using real study data.","PeriodicalId":14479,"journal":{"name":"International Statistical Review","volume":"32 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142182486","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Clustering Longitudinal Data: A Review of Methods and Software Packages","authors":"Zihang Lu","doi":"10.1111/insr.12588","DOIUrl":"https://doi.org/10.1111/insr.12588","url":null,"abstract":"SummaryClustering of longitudinal data is becoming increasingly popular in many fields such as social sciences, business, environmental science, medicine and healthcare. However, it is often challenging due to the complex nature of the data, such as dependencies between observations collected over time, missingness, sparsity and non‐linearity, making it difficult to identify meaningful patterns and relationships among the data. Despite the increasingly common application of cluster analysis for longitudinal data, many existing methods are still less known to researchers, and limited guidance is provided in choosing between methods and software packages. In this paper, we review several commonly used methods for clustering longitudinal data. These methods are broadly classified into three categories, namely, model‐based approaches, algorithm‐based approaches and functional clustering approaches. We perform a comparison among these methods and their corresponding R software packages using real‐life datasets and simulated datasets under various conditions. Findings from the analyses and recommendations for using these approaches in practice are discussed.","PeriodicalId":14479,"journal":{"name":"International Statistical Review","volume":"12 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142182504","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Alternative Approaches for Estimating Highest‐Density Regions","authors":"Nina Deliu, Brunero Liseo","doi":"10.1111/insr.12592","DOIUrl":"https://doi.org/10.1111/insr.12592","url":null,"abstract":"SummaryAmong the variety of statistical intervals, highest‐density regions (HDRs) stand out for their ability to effectively summarise a distribution or sample, unveiling its distinctive and salient features. An HDR represents the minimum size set that satisfies a certain probability coverage, and current methods for their computation require knowledge or estimation of the underlying probability distribution or density . In this work, we illustrate a broader framework for computing HDRs, which generalises the classical density quantile method. The framework is based on <jats:italic>neighbourhood</jats:italic> measures, that is, measures that preserve the order induced in the sample by , and include the density as a special case. We explore a number of suitable distance‐based measures, such as the ‐nearest neighbourhood distance, and some probabilistic variants based on <jats:italic>copula models</jats:italic>. An extensive comparison is provided, showing the advantages of the copula‐based strategy, especially in those scenarios that exhibit complex structures (e.g. multimodalities or particular dependencies). Finally, we discuss the practical implications of our findings for estimating HDRs in real‐world applications.","PeriodicalId":14479,"journal":{"name":"International Statistical Review","volume":"33 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142182503","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Flexible Multivariate Mixture Models: A Comprehensive Approach for Modeling Mixtures of Non‐Identical Distributions","authors":"Samyajoy Pal, Christian Heumann","doi":"10.1111/insr.12593","DOIUrl":"https://doi.org/10.1111/insr.12593","url":null,"abstract":"SummaryThe mixture models are widely used to analyze data with cluster structures and the mixture of Gaussians is most common in practical applications. The use of mixtures involving other multivariate distributions, like the multivariate skew normal and multivariate generalised hyperbolic, is also found in the literature. However, in all such cases, only the mixtures of identical distributions are used to form a mixture model. We present an innovative and versatile approach for constructing mixture models involving identical and non‐identical distributions combined in all conceivable permutations (e.g. a mixture of multivariate skew normal and multivariate generalised hyperbolic). We also establish any conventional mixture model as a distinctive particular case of our proposed framework. The practical efficacy of our model is shown through its application to both simulated and real‐world data sets. Our comprehensive and flexible model excels at recognising inherent patterns and accurately estimating parameters.","PeriodicalId":14479,"journal":{"name":"International Statistical Review","volume":"23 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141933219","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zeyi Wang, Eric Bridgeford, Shangsi Wang, Joshua T. Vogelstein, Brian Caffo
{"title":"Statistical Analysis of Data Repeatability Measures","authors":"Zeyi Wang, Eric Bridgeford, Shangsi Wang, Joshua T. Vogelstein, Brian Caffo","doi":"10.1111/insr.12591","DOIUrl":"https://doi.org/10.1111/insr.12591","url":null,"abstract":"SummaryThe advent of modern data collection and processing techniques has seen the size, scale and complexity of data grow exponentially. A seminal step in leveraging these rich datasets for downstream inference is understanding the characteristics of the data which are repeatable—the aspects of the data that are able to be identified under duplicated analyses. Conflictingly, the utility of traditional repeatability measures, such as the intra‐class correlation coefficient, under these settings is limited. In recent work, novel data repeatability measures have been introduced in the context where a set of subjects are measured twice or more, including: fingerprinting, rank sums and generalisations of the intra‐class correlation coefficient. However, the relationships between, and the best practices among, these measures remains largely unknown. In this manuscript, we formalise a novel repeatability measure, discriminability. We show that it is deterministically linked with the intra‐class correlation coefficients under univariate random effect models and has the desired property of optimal accuracy for inferential tasks using multivariate measurements. Additionally, we overview and systematically compare existing repeatability statistics with discriminability, using both theoretical results and simulations. We show that the rank sum statistic is deterministically linked to a consistent estimator of discriminability. The statistical power of permutation tests derived from these measures are compared numerically under Gaussian and non‐Gaussian settings, with and without simulated batch effects. Motivated by both theoretical and empirical results, we provide methodological recommendations for each benchmark setting to serve as a resource for future analyses. We believe these recommendations will play an important role towards improving repeatability in fields such as functional magnetic resonance imaging, genomics, pharmacology and more.","PeriodicalId":14479,"journal":{"name":"International Statistical Review","volume":"29 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141933273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}