{"title":"Change point detection in high dimensional data with U-statistics","authors":"B. Cooper Boniece, Lajos Horváth, Peter M. Jacobs","doi":"10.1007/s11749-023-00900-y","DOIUrl":null,"url":null,"abstract":"<p>We consider the problem of detecting distributional changes in a sequence of high dimensional data. Our approach combines two separate statistics stemming from <span>\\(L_p\\)</span> norms whose behavior is similar under <span>\\(H_0\\)</span> but potentially different under <span>\\(H_A\\)</span>, leading to a testing procedure that that is flexible against a variety of alternatives. We establish the asymptotic distribution of our proposed test statistics separately in cases of weakly dependent and strongly dependent coordinates as <span>\\(\\min \\{N,d\\}\\rightarrow \\infty \\)</span>, where <i>N</i> denotes sample size and <i>d</i> is the dimension, and establish consistency of testing and estimation procedures in high dimensions under one-change alternative settings. Computational studies in single and multiple change point scenarios demonstrate our method can outperform other nonparametric approaches in the literature for certain alternatives in high dimensions. We illustrate our approach through an application to Twitter data concerning the mentions of U.S. governors.\n</p>","PeriodicalId":51189,"journal":{"name":"Test","volume":"3 1","pages":""},"PeriodicalIF":1.3000,"publicationDate":"2023-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Test","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1007/s11749-023-00900-y","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}
引用次数: 1
Abstract
We consider the problem of detecting distributional changes in a sequence of high dimensional data. Our approach combines two separate statistics stemming from \(L_p\) norms whose behavior is similar under \(H_0\) but potentially different under \(H_A\), leading to a testing procedure that that is flexible against a variety of alternatives. We establish the asymptotic distribution of our proposed test statistics separately in cases of weakly dependent and strongly dependent coordinates as \(\min \{N,d\}\rightarrow \infty \), where N denotes sample size and d is the dimension, and establish consistency of testing and estimation procedures in high dimensions under one-change alternative settings. Computational studies in single and multiple change point scenarios demonstrate our method can outperform other nonparametric approaches in the literature for certain alternatives in high dimensions. We illustrate our approach through an application to Twitter data concerning the mentions of U.S. governors.
期刊介绍:
TEST is an international journal of Statistics and Probability, sponsored by the Spanish Society of Statistics and Operations Research. English is the official language of the journal.
The emphasis of TEST is placed on papers containing original theoretical contributions of direct or potential value in applications. In this respect, the methodological contents are considered to be crucial for the papers published in TEST, but the practical implications of the methodological aspects are also relevant. Original sound manuscripts on either well-established or emerging areas in the scope of the journal are welcome.
One volume is published annually in four issues. In addition to the regular contributions, each issue of TEST contains an invited paper from a world-wide recognized outstanding statistician on an up-to-date challenging topic, including discussions.