arXiv - STAT - Statistics Theory最新文献

筛选
英文 中文
RandALO: Out-of-sample risk estimation in no time flat RandALO:快速进行样本外风险评估
arXiv - STAT - Statistics Theory Pub Date : 2024-09-15 DOI: arxiv-2409.09781
Parth T. Nobel, Daniel LeJeune, Emmanuel J. Candès
{"title":"RandALO: Out-of-sample risk estimation in no time flat","authors":"Parth T. Nobel, Daniel LeJeune, Emmanuel J. Candès","doi":"arxiv-2409.09781","DOIUrl":"https://doi.org/arxiv-2409.09781","url":null,"abstract":"Estimating out-of-sample risk for models trained on large high-dimensional\u0000datasets is an expensive but essential part of the machine learning process,\u0000enabling practitioners to optimally tune hyperparameters. Cross-validation (CV)\u0000serves as the de facto standard for risk estimation but poorly trades off high\u0000bias ($K$-fold CV) for computational cost (leave-one-out CV). We propose a\u0000randomized approximate leave-one-out (RandALO) risk estimator that is not only\u0000a consistent estimator of risk in high dimensions but also less computationally\u0000expensive than $K$-fold CV. We support our claims with extensive simulations on\u0000synthetic and real data and provide a user-friendly Python package implementing\u0000RandALO available on PyPI as randalo and at https://github.com/cvxgrp/randalo.","PeriodicalId":501379,"journal":{"name":"arXiv - STAT - Statistics Theory","volume":"2 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142267812","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Accuracy of the Ensemble Kalman Filter in the Near-Linear Setting 近线性环境下卡尔曼滤波器的精度
arXiv - STAT - Statistics Theory Pub Date : 2024-09-15 DOI: arxiv-2409.09800
Edoardo Calvello, Pierre Monmarché, Andrew M. Stuart, Urbain Vaes
{"title":"Accuracy of the Ensemble Kalman Filter in the Near-Linear Setting","authors":"Edoardo Calvello, Pierre Monmarché, Andrew M. Stuart, Urbain Vaes","doi":"arxiv-2409.09800","DOIUrl":"https://doi.org/arxiv-2409.09800","url":null,"abstract":"The filtering distribution captures the statistics of the state of a\u0000dynamical system from partial and noisy observations. Classical particle\u0000filters provably approximate this distribution in quite general settings;\u0000however they behave poorly for high dimensional problems, suffering weight\u0000collapse. This issue is circumvented by the ensemble Kalman filter which is an\u0000equal-weight interacting particle system. However, this finite particle system\u0000is only proven to approximate the true filter in the linear Gaussian case. In\u0000practice, however, it is applied in much broader settings; as a result,\u0000establishing its approximation properties more generally is important. There\u0000has been recent progress in the theoretical analysis of the algorithm,\u0000establishing stability and error estimates in non-Gaussian settings, but the\u0000assumptions on the dynamics and observation models rule out the unbounded\u0000vector fields that arise in practice and the analysis applies only to the mean\u0000field limit of the ensemble Kalman filter. The present work establishes error\u0000bounds between the filtering distribution and the finite particle ensemble\u0000Kalman filter when the model exhibits linear growth.","PeriodicalId":501379,"journal":{"name":"arXiv - STAT - Statistics Theory","volume":"20 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142267810","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Asymptotics for irregularly observed long memory processes 不规则观测长记忆过程的渐近线
arXiv - STAT - Statistics Theory Pub Date : 2024-09-14 DOI: arxiv-2409.09498
Mohamedou Ould-Haye, Anne Philippe
{"title":"Asymptotics for irregularly observed long memory processes","authors":"Mohamedou Ould-Haye, Anne Philippe","doi":"arxiv-2409.09498","DOIUrl":"https://doi.org/arxiv-2409.09498","url":null,"abstract":"We study the effect of observing a stationary process at irregular time\u0000points via a renewal process. We establish a sharp difference in the asymptotic\u0000behaviour of the self-normalized sample mean of the observed process depending\u0000on the renewal process. In particular, we show that if the renewal process has\u0000a moderate heavy tail distribution then the limit is a so-called Normal\u0000Variance Mixture (NVM) and we characterize the randomized variance part of the\u0000limiting NVM as an integral function of a L'evy stable motion. Otherwise, the\u0000normalized sample mean will be asymptotically normal.","PeriodicalId":501379,"journal":{"name":"arXiv - STAT - Statistics Theory","volume":"10 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142267813","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Statistical Viewpoint on Differential Privacy: Hypothesis Testing, Representation and Blackwell's Theorem 关于差异隐私的统计学观点:假设检验、表征和布莱克韦尔定理
arXiv - STAT - Statistics Theory Pub Date : 2024-09-14 DOI: arxiv-2409.09558
Weijie J. Su
{"title":"A Statistical Viewpoint on Differential Privacy: Hypothesis Testing, Representation and Blackwell's Theorem","authors":"Weijie J. Su","doi":"arxiv-2409.09558","DOIUrl":"https://doi.org/arxiv-2409.09558","url":null,"abstract":"Differential privacy is widely considered the formal privacy for\u0000privacy-preserving data analysis due to its robust and rigorous guarantees,\u0000with increasingly broad adoption in public services, academia, and industry.\u0000Despite originating in the cryptographic context, in this review paper we argue\u0000that, fundamentally, differential privacy can be considered a textit{pure}\u0000statistical concept. By leveraging a theorem due to David Blackwell, our focus\u0000is to demonstrate that the definition of differential privacy can be formally\u0000motivated from a hypothesis testing perspective, thereby showing that\u0000hypothesis testing is not merely convenient but also the right language for\u0000reasoning about differential privacy. This insight leads to the definition of\u0000$f$-differential privacy, which extends other differential privacy definitions\u0000through a representation theorem. We review techniques that render\u0000$f$-differential privacy a unified framework for analyzing privacy bounds in\u0000data analysis and machine learning. Applications of this differential privacy\u0000definition to private deep learning, private convex optimization, shuffled\u0000mechanisms, and U.S.~Census data are discussed to highlight the benefits of\u0000analyzing privacy bounds under this framework compared to existing\u0000alternatives.","PeriodicalId":501379,"journal":{"name":"arXiv - STAT - Statistics Theory","volume":"56 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142267818","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Random-effects Approach to Regression Involving Many Categorical Predictors and Their Interactions 涉及多个分类预测因子及其交互作用的随机效应回归方法
arXiv - STAT - Statistics Theory Pub Date : 2024-09-14 DOI: arxiv-2409.09355
Hanmei Sun, Jiangshan Zhang, Jiming Jiang
{"title":"A Random-effects Approach to Regression Involving Many Categorical Predictors and Their Interactions","authors":"Hanmei Sun, Jiangshan Zhang, Jiming Jiang","doi":"arxiv-2409.09355","DOIUrl":"https://doi.org/arxiv-2409.09355","url":null,"abstract":"Linear model prediction with a large number of potential predictors is both\u0000statistically and computationally challenging. The traditional approaches are\u0000largely based on shrinkage selection/estimation methods, which are applicable\u0000even when the number of potential predictors is (much) larger than the sample\u0000size. A situation of the latter scenario occurs when the candidate predictors\u0000involve many binary indicators corresponding to categories of some categorical\u0000predictors as well as their interactions. We propose an alternative approach to\u0000the shrinkage prediction methods in such a case based on mixed model\u0000prediction, which effectively treats combinations of the categorical effects as\u0000random effects. We establish theoretical validity of the proposed method, and\u0000demonstrate empirically its advantage over the shrinkage methods. We also\u0000develop measures of uncertainty for the proposed method and evaluate their\u0000performance empirically. A real-data example is considered.","PeriodicalId":501379,"journal":{"name":"arXiv - STAT - Statistics Theory","volume":"105 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142267817","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bounding the probability of causality under ordinal outcomes 限定序数结果下的因果关系概率
arXiv - STAT - Statistics Theory Pub Date : 2024-09-14 DOI: arxiv-2409.09297
Hanmei Sun, Chengfeng Shi, Qiang Zhao
{"title":"Bounding the probability of causality under ordinal outcomes","authors":"Hanmei Sun, Chengfeng Shi, Qiang Zhao","doi":"arxiv-2409.09297","DOIUrl":"https://doi.org/arxiv-2409.09297","url":null,"abstract":"The probability of causation (PC) is often used in liability assessments. In\u0000a legal context, for example, where a patient suffered the side effect after\u0000taking a medication and sued the pharmaceutical company as a result, the value\u0000of the PC can help assess the likelihood that the side effect was caused by the\u0000medication, in other words, how likely it is that the patient will win the\u0000case. Beyond the issue of legal disputes, the PC plays an equally large role\u0000when one wants to go about explaining causal relationships between events that\u0000have already occurred in other areas. This article begins by reviewing the\u0000definitions and bounds of the probability of causality for binary outcomes,\u0000then generalizes them to ordinal outcomes. It demonstrates that incorporating\u0000additional mediator variable information in a complete mediation analysis\u0000provides a more refined bound compared to the simpler scenario where only\u0000exposure and outcome variables are considered.","PeriodicalId":501379,"journal":{"name":"arXiv - STAT - Statistics Theory","volume":"49 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142267814","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Asymptotics of Wide Remedians 宽补数的渐近性
arXiv - STAT - Statistics Theory Pub Date : 2024-09-14 DOI: arxiv-2409.09528
Philip T. Labo
{"title":"The Asymptotics of Wide Remedians","authors":"Philip T. Labo","doi":"arxiv-2409.09528","DOIUrl":"https://doi.org/arxiv-2409.09528","url":null,"abstract":"The remedian uses a $ktimes b$ matrix to approximate the median of $nleq\u0000b^{k}$ streaming input values by recursively replacing buffers of $b$ values\u0000with their medians, thereby ignoring its $200(lceil b/2rceil / b)^{k}%$ most\u0000extreme inputs. Rousseeuw & Bassett (1990) and Chao & Lin (1993); Chen & Chen\u0000(2005) study the remedian's distribution as $krightarrowinfty$ and as\u0000$k,brightarrowinfty$. The remedian's breakdown point vanishes as\u0000$krightarrowinfty$, but approaches $(1/2)^{k}$ as $brightarrowinfty$. We\u0000study the remedian's robust-regime distribution as $brightarrowinfty$,\u0000deriving a normal distribution for standardized (mean, median, remedian,\u0000remedian rank) as $brightarrowinfty$, thereby illuminating the remedian's\u0000accuracy in approximating the sample median. We derive the asymptotic\u0000efficiency of the remedian relative to the mean and the median. Finally, we\u0000discuss the estimation of more than one quantile at once, proposing an\u0000asymptotic distribution for the random vector that results when we apply\u0000remedian estimation in parallel to the components of i.i.d. random vectors.","PeriodicalId":501379,"journal":{"name":"arXiv - STAT - Statistics Theory","volume":"7 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142267811","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Locally sharp goodness-of-fit testing in sup norm for high-dimensional counts 高维计数的超常规局部尖锐拟合优度测试
arXiv - STAT - Statistics Theory Pub Date : 2024-09-13 DOI: arxiv-2409.08871
Subhodh Kotekal, Julien Chhor, Chao Gao
{"title":"Locally sharp goodness-of-fit testing in sup norm for high-dimensional counts","authors":"Subhodh Kotekal, Julien Chhor, Chao Gao","doi":"arxiv-2409.08871","DOIUrl":"https://doi.org/arxiv-2409.08871","url":null,"abstract":"We consider testing the goodness-of-fit of a distribution against\u0000alternatives separated in sup norm. We study the twin settings of\u0000Poisson-generated count data with a large number of categories and\u0000high-dimensional multinomials. In previous studies of different separation\u0000metrics, it has been found that the local minimax separation rate exhibits\u0000substantial heterogeneity and is a complicated function of the null\u0000distribution; the rate-optimal test requires careful tailoring to the null. In\u0000the setting of sup norm, this remains the case and we establish that the local\u0000minimax separation rate is determined by the finer decay behavior of the\u0000category rates. The upper bound is obtained by a test involving the sample\u0000maximum, and the lower bound argument involves reducing the original\u0000heteroskedastic null to an auxiliary homoskedastic null determined by the decay\u0000of the rates. Further, in a particular asymptotic setup, the sharp constants\u0000are identified.","PeriodicalId":501379,"journal":{"name":"arXiv - STAT - Statistics Theory","volume":"209 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142267819","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
High-dimensional regression with a count response 计数响应的高维回归
arXiv - STAT - Statistics Theory Pub Date : 2024-09-13 DOI: arxiv-2409.08821
Or Zilberman, Felix Abramovich
{"title":"High-dimensional regression with a count response","authors":"Or Zilberman, Felix Abramovich","doi":"arxiv-2409.08821","DOIUrl":"https://doi.org/arxiv-2409.08821","url":null,"abstract":"We consider high-dimensional regression with a count response modeled by\u0000Poisson or negative binomial generalized linear model (GLM). We propose a\u0000penalized maximum likelihood estimator with a properly chosen complexity\u0000penalty and establish its adaptive minimaxity across models of various\u0000sparsity. To make the procedure computationally feasible for high-dimensional\u0000data we consider its LASSO and SLOPE convex surrogates. Their performance is\u0000illustrated through simulated and real-data examples.","PeriodicalId":501379,"journal":{"name":"arXiv - STAT - Statistics Theory","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142268001","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Self-Organized State-Space Models with Artificial Dynamics 人工动力学自组织状态空间模型
arXiv - STAT - Statistics Theory Pub Date : 2024-09-13 DOI: arxiv-2409.08928
Yuan Chen, Mathieu Gerber, Christophe Andrieu, Randal Douc
{"title":"Self-Organized State-Space Models with Artificial Dynamics","authors":"Yuan Chen, Mathieu Gerber, Christophe Andrieu, Randal Douc","doi":"arxiv-2409.08928","DOIUrl":"https://doi.org/arxiv-2409.08928","url":null,"abstract":"In this paper we consider a state-space model (SSM) parametrized by some\u0000parameter $theta$, and our aim is to perform joint parameter and state\u0000inference. A simple idea to perform this task, which almost dates back to the\u0000origin of the Kalman filter, is to replace the static parameter $theta$ by a\u0000Markov chain $(theta_t)_{tgeq 0}$ on the parameter space and then to apply a\u0000standard filtering algorithm to the extended, or self-organized SSM. However,\u0000the practical implementation of this idea in a theoretically justified way has\u0000remained an open problem. In this paper we fill this gap by introducing various\u0000possible constructions of the Markov chain $(theta_t)_{tgeq 0}$ that ensure\u0000the validity of the self-organized SSM (SO-SSM) for joint parameter and state\u0000inference. Notably, we show that theoretically valid SO-SSMs can be defined\u0000even if $|mathrm{Var}(theta_{t}|theta_{t-1})|$ converges to 0 slowly as\u0000$trightarrowinfty$. This result is important since, as illustrated in our\u0000numerical experiments, such models can be efficiently approximated using\u0000standard particle filter algorithms. While the idea studied in this work was\u0000first introduced for online inference in SSMs, it has also been proved to be\u0000useful for computing the maximum likelihood estimator (MLE) of a given SSM,\u0000since iterated filtering algorithms can be seen as particle filters applied to\u0000SO-SSMs for which the target parameter value is the MLE of interest. Based on\u0000this observation, we also derive constructions of $(theta_t)_{tgeq 0}$ and\u0000theoretical results tailored to these specific applications of SO-SSMs, and as\u0000a result, we introduce new iterated filtering algorithms. From a practical\u0000point of view, the algorithms introduced in this work have the merit of being\u0000simple to implement and only requiring minimal tuning to perform well.","PeriodicalId":501379,"journal":{"name":"arXiv - STAT - Statistics Theory","volume":"20 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142267996","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信