{"title":"Distributed inference for two‐sample U‐statistics in massive data analysis","authors":"Bingyao Huang, Yanyan Liu, Liuhua Peng","doi":"10.1111/sjos.12620","DOIUrl":null,"url":null,"abstract":"This paper considers distributed inference for two‐sample U‐statistics under the massive data setting. In order to reduce the computational complexity, this paper proposes distributed two‐sample U‐statistics and blockwise linear two‐sample U‐statistics. The blockwise linear two‐sample U‐statistic, which requires less communication cost, is more computationally efficient especially when the data are stored in different locations. The asymptotic properties of both types of distributed two‐sample U‐statistics are established. In addition, this paper proposes bootstrap algorithms to approximate the distributions of distributed two‐sample U‐statistics and blockwise linear two‐sample U‐statistics for both nondegenerate and degenerate cases. The distributed weighted bootstrap for the distributed two‐sample U‐statistic is new in the literature. The proposed bootstrap procedures are computationally efficient and are suitable for distributed computing platforms with theoretical guarantees. Extensive numerical studies illustrate that the proposed distributed approaches are feasible and effective.","PeriodicalId":49567,"journal":{"name":"Scandinavian Journal of Statistics","volume":"50 1","pages":"1090 - 1115"},"PeriodicalIF":0.8000,"publicationDate":"2022-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Scandinavian Journal of Statistics","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1111/sjos.12620","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}
引用次数: 2
Abstract
This paper considers distributed inference for two‐sample U‐statistics under the massive data setting. In order to reduce the computational complexity, this paper proposes distributed two‐sample U‐statistics and blockwise linear two‐sample U‐statistics. The blockwise linear two‐sample U‐statistic, which requires less communication cost, is more computationally efficient especially when the data are stored in different locations. The asymptotic properties of both types of distributed two‐sample U‐statistics are established. In addition, this paper proposes bootstrap algorithms to approximate the distributions of distributed two‐sample U‐statistics and blockwise linear two‐sample U‐statistics for both nondegenerate and degenerate cases. The distributed weighted bootstrap for the distributed two‐sample U‐statistic is new in the literature. The proposed bootstrap procedures are computationally efficient and are suitable for distributed computing platforms with theoretical guarantees. Extensive numerical studies illustrate that the proposed distributed approaches are feasible and effective.
期刊介绍:
The Scandinavian Journal of Statistics is internationally recognised as one of the leading statistical journals in the world. It was founded in 1974 by four Scandinavian statistical societies. Today more than eighty per cent of the manuscripts are submitted from outside Scandinavia.
It is an international journal devoted to reporting significant and innovative original contributions to statistical methodology, both theory and applications.
The journal specializes in statistical modelling showing particular appreciation of the underlying substantive research problems.
The emergence of specialized methods for analysing longitudinal and spatial data is just one example of an area of important methodological development in which the Scandinavian Journal of Statistics has a particular niche.