{"title":"Benign Overfitting for $α$ Sub-exponential Input","authors":"Kota Okudo, Kei Kobayashi","doi":"arxiv-2409.00733","DOIUrl":"https://doi.org/arxiv-2409.00733","url":null,"abstract":"This paper investigates the phenomenon of benign overfitting in binary\u0000classification problems with heavy-tailed input distributions. We extend the\u0000analysis of maximum margin classifiers to $alpha$ sub-exponential\u0000distributions, where $alpha in (0,2]$, generalizing previous work that\u0000focused on sub-gaussian inputs. Our main result provides generalization error\u0000bounds for linear classifiers trained using gradient descent on unregularized\u0000logistic loss in this heavy-tailed setting. We prove that under certain\u0000conditions on the dimensionality $p$ and feature vector magnitude $|mu|$,\u0000the misclassification error of the maximum margin classifier asymptotically\u0000approaches the noise level. This work contributes to the understanding of\u0000benign overfitting in more robust distribution settings and demonstrates that\u0000the phenomenon persists even with heavier-tailed inputs than previously\u0000studied.","PeriodicalId":501379,"journal":{"name":"arXiv - STAT - Statistics Theory","volume":"8 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142192659","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Exact Exploratory Bi-factor Analysis: A Constraint-based Optimisation Approach","authors":"Jiawei Qiao, Yunxiao Chen, Zhiliang Ying","doi":"arxiv-2409.00679","DOIUrl":"https://doi.org/arxiv-2409.00679","url":null,"abstract":"Bi-factor analysis is a form of confirmatory factor analysis widely used in\u0000psychological and educational measurement. The use of a bi-factor model\u0000requires the specification of an explicit bi-factor structure on the\u0000relationship between the observed variables and the group factors. In practice,\u0000the bi-factor structure is sometimes unknown, in which case an exploratory form\u0000of bi-factor analysis is needed to find the bi-factor structure. Unfortunately,\u0000there are few methods for exploratory bi-factor analysis, with the exception of\u0000a rotation-based method proposed in Jennrich and Bentler (2011, 2012). However,\u0000this method only finds approximate bi-factor structures, as it does not yield\u0000an exact bi-factor loading structure, even after applying hard thresholding. In\u0000this paper, we propose a constraint-based optimisation method that learns an\u0000exact bi-factor loading structure from data, overcoming the issue with the\u0000rotation-based method. The key to the proposed method is a mathematical\u0000characterisation of the bi-factor loading structure as a set of equality\u0000constraints, which allows us to formulate the exploratory bi-factor analysis\u0000problem as a constrained optimisation problem in a continuous domain and solve\u0000the optimisation problem with an augmented Lagrangian method. The power of the\u0000proposed method is shown via simulation studies and a real data example.\u0000Extending the proposed method to exploratory hierarchical factor analysis is\u0000also discussed. The codes are available on\u0000``https://anonymous.4open.science/r/Bifactor-ALM-C1E6\".","PeriodicalId":501379,"journal":{"name":"arXiv - STAT - Statistics Theory","volume":"88 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142192661","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Structural adaptation via directional regularity: rate accelerated estimation in multivariate functional data","authors":"Omar Kassi, Sunny G. W. Wang","doi":"arxiv-2409.00817","DOIUrl":"https://doi.org/arxiv-2409.00817","url":null,"abstract":"We introduce directional regularity, a new definition of anisotropy for\u0000multivariate functional data. Instead of taking the conventional view which\u0000determines anisotropy as a notion of smoothness along a dimension, directional\u0000regularity additionally views anisotropy through the lens of directions. We\u0000show that faster rates of convergence can be obtained through a change-of-basis\u0000by adapting to the directional regularity of a multivariate process. An\u0000algorithm for the estimation and identification of the change-of-basis matrix\u0000is constructed, made possible due to the unique replication structure of\u0000functional data. Non-asymptotic bounds are provided for our algorithm,\u0000supplemented by numerical evidence from an extensive simulation study. We\u0000discuss two possible applications of the directional regularity approach, and\u0000advocate its consideration as a standard pre-processing step in multivariate\u0000functional data analysis.","PeriodicalId":501379,"journal":{"name":"arXiv - STAT - Statistics Theory","volume":"42 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142192660","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Differentially Private Synthetic High-dimensional Tabular Stream","authors":"Girish Kumar, Thomas Strohmer, Roman Vershynin","doi":"arxiv-2409.00322","DOIUrl":"https://doi.org/arxiv-2409.00322","url":null,"abstract":"While differentially private synthetic data generation has been explored\u0000extensively in the literature, how to update this data in the future if the\u0000underlying private data changes is much less understood. We propose an\u0000algorithmic framework for streaming data that generates multiple synthetic\u0000datasets over time, tracking changes in the underlying private data. Our\u0000algorithm satisfies differential privacy for the entire input stream (continual\u0000differential privacy) and can be used for high-dimensional tabular data.\u0000Furthermore, we show the utility of our method via experiments on real-world\u0000datasets. The proposed algorithm builds upon a popular select, measure, fit,\u0000and iterate paradigm (used by offline synthetic data generation algorithms) and\u0000private counters for streams.","PeriodicalId":501379,"journal":{"name":"arXiv - STAT - Statistics Theory","volume":"12 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142192662","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Adaptive smoothness of function estimation in the three classical problems of the non-parametrical statistic in the three classical problems of the non-parametrical statistic","authors":"M. R. Formica, E. Ostrovsky, L. Sirota","doi":"arxiv-2409.00491","DOIUrl":"https://doi.org/arxiv-2409.00491","url":null,"abstract":"We offer in this short report the so-called adaptive functional smoothness\u0000estimation in the Hilbert space norm sense in the three classical problems of\u0000non-parametrical statistic: regression, density and spectral (density) function\u0000measurement (estimation).","PeriodicalId":501379,"journal":{"name":"arXiv - STAT - Statistics Theory","volume":"140 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142192793","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On the choice of the two tuning parameters for nonparametric estimation of an elliptical distribution generator","authors":"Victor Ryan, Alexis Derumigny","doi":"arxiv-2408.17087","DOIUrl":"https://doi.org/arxiv-2408.17087","url":null,"abstract":"Elliptical distributions are a simple and flexible class of distributions\u0000that depend on a one-dimensional function, called the density generator. In\u0000this article, we study the non-parametric estimator of this generator that was\u0000introduced by Liebscher (2005). This estimator depends on two tuning\u0000parameters: a bandwidth $h$ -- as usual in kernel smoothing -- and an\u0000additional parameter $a$ that control the behavior near the center of the\u0000distribution. We give an explicit expression for the asymptotic MSE at a point\u0000$x$, and derive explicit expressions for the optimal tuning parameters $h$ and\u0000$a$. Estimation of the derivatives of the generator is also discussed. A\u0000simulation study shows the performance of the new methods.","PeriodicalId":501379,"journal":{"name":"arXiv - STAT - Statistics Theory","volume":"144 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142192663","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Efstathios Paparoditis, Lea Wegner, Martin Wendler
{"title":"Functional Sieve Bootstrap for the Partial Sum Process with Application to Change-Point Detection without Dimension Reduction","authors":"Efstathios Paparoditis, Lea Wegner, Martin Wendler","doi":"arxiv-2408.05071","DOIUrl":"https://doi.org/arxiv-2408.05071","url":null,"abstract":"Change-points in functional time series can be detected using the\u0000CUSUM-statistic, which is a non-linear functional of the partial sum process.\u0000Various methods have been proposed to obtain critical values for this\u0000statistic. In this paper we use the functional autoregressive sieve bootstrap\u0000to imitate the behavior of the partial sum process and we show that this\u0000procedure asymptotically correct estimates critical values under the null\u0000hypothesis. We also establish the consistency of the corresponding bootstrap\u0000based test under local alternatives. The finite sample performance of the\u0000procedure is studied via simulations under the null -hypothesis and under the\u0000alternative.","PeriodicalId":501379,"journal":{"name":"arXiv - STAT - Statistics Theory","volume":"40 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141936897","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Identification of the parameters of complex constitutive models: Least squares minimization vs. Bayesian updating","authors":"Thomas Most","doi":"arxiv-2408.04928","DOIUrl":"https://doi.org/arxiv-2408.04928","url":null,"abstract":"In this study the common least-squares minimization approach is compared to\u0000the Bayesian updating procedure. In the content of material parameter\u0000identification the posterior parameter density function is obtained from its\u0000prior and the likelihood function of the measurements. By using Markov Chain\u0000Monte Carlo methods, such as the Metropolis-Hastings algorithm\u0000cite{Hastings1970}, the global density function including local peaks can be\u0000computed. Thus this procedure enables an accurate evaluation of the global\u0000parameter quality. However, the computational effort is remarkable larger\u0000compared to the minimization approach. Thus several methodologies for an\u0000efficient approximation of the likelihood function are discussed in the present\u0000study.","PeriodicalId":501379,"journal":{"name":"arXiv - STAT - Statistics Theory","volume":"23 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141936787","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Variance-based sensitivity analysis in the presence of correlated input variables","authors":"Thomas Most","doi":"arxiv-2408.04933","DOIUrl":"https://doi.org/arxiv-2408.04933","url":null,"abstract":"In this paper we propose an extension of the classical Sobol' estimator for\u0000the estimation of variance based sensitivity indices. The approach assumes a\u0000linear correlation model between the input variables which is used to decompose\u0000the contribution of an input variable into a correlated and an uncorrelated\u0000part. This method provides sampling matrices following the original joint\u0000probability distribution which are used directly to compute the model output\u0000without any assumptions or approximations of the model response function.","PeriodicalId":501379,"journal":{"name":"arXiv - STAT - Statistics Theory","volume":"77 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141936898","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Network and interaction models for data with hierarchical granularity via fragmentation and coagulation","authors":"Lancelot F. James, Juho Lee, Nathan Ross","doi":"arxiv-2408.04866","DOIUrl":"https://doi.org/arxiv-2408.04866","url":null,"abstract":"We introduce a nested family of Bayesian nonparametric models for network and\u0000interaction data with a hierarchical granularity structure that naturally\u0000arises through finer and coarser population labelings. In the case of network\u0000data, the structure is easily visualized by merging and shattering vertices,\u0000while respecting the edge structure. We further develop Bayesian inference\u0000procedures for the model family, and apply them to synthetic and real data. The\u0000family provides a connection of practical and theoretical interest between the\u0000Hollywood model of Crane and Dempsey, and the generalized-gamma graphex model\u0000of Caron and Fox. A key ingredient for the construction of the family is\u0000fragmentation and coagulation duality for integer partitions, and for this we\u0000develop novel duality relations that generalize those of Pitman and Dong,\u0000Goldschmidt and Martin. The duality is also crucially used in our inferential\u0000procedures.","PeriodicalId":501379,"journal":{"name":"arXiv - STAT - Statistics Theory","volume":"126 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141936794","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}