{"title":"A model-based approach for clustering binned data","authors":"Asael Fabian Martínez, Carlos Díaz-Avalos","doi":"arxiv-2409.07738","DOIUrl":"https://doi.org/arxiv-2409.07738","url":null,"abstract":"Binned data often appears in different fields of research, and it is\u0000generated after summarizing the original data in a sequence of pairs of bins\u0000(or their midpoints) and frequencies. There may exist different reasons to only\u0000provide this summary, but more importantly, it is necessary being able to\u0000perform statistical analyses based only on it. We present a Bayesian\u0000nonparametric model for clustering applicable for binned data. Clusters are\u0000modeled via random partitions, and within them a model-based approach is\u0000assumed. Inferences are performed by a Markov chain Monte Carlo method and the\u0000complete proposal is tested using simulated and real data. Having particular\u0000interest in studying marine populations, we analyze samples of Lobatus\u0000(Strobus) gigas' lengths and found the presence of up to three cohorts along\u0000the year.","PeriodicalId":501379,"journal":{"name":"arXiv - STAT - Statistics Theory","volume":"41 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142192577","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Asymptotics of Stochastic Gradient Descent with Dropout Regularization in Linear Models","authors":"Jiaqi Li, Johannes Schmidt-Hieber, Wei Biao Wu","doi":"arxiv-2409.07434","DOIUrl":"https://doi.org/arxiv-2409.07434","url":null,"abstract":"This paper proposes an asymptotic theory for online inference of the\u0000stochastic gradient descent (SGD) iterates with dropout regularization in\u0000linear regression. Specifically, we establish the geometric-moment contraction\u0000(GMC) for constant step-size SGD dropout iterates to show the existence of a\u0000unique stationary distribution of the dropout recursive function. By the GMC\u0000property, we provide quenched central limit theorems (CLT) for the difference\u0000between dropout and $ell^2$-regularized iterates, regardless of\u0000initialization. The CLT for the difference between the Ruppert-Polyak averaged\u0000SGD (ASGD) with dropout and $ell^2$-regularized iterates is also presented.\u0000Based on these asymptotic normality results, we further introduce an online\u0000estimator for the long-run covariance matrix of ASGD dropout to facilitate\u0000inference in a recursive manner with efficiency in computational time and\u0000memory. The numerical experiments demonstrate that for sufficiently large\u0000samples, the proposed confidence intervals for ASGD with dropout nearly achieve\u0000the nominal coverage probability.","PeriodicalId":501379,"journal":{"name":"arXiv - STAT - Statistics Theory","volume":"48 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142192580","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xingchi Liu, Lyudmila Mihaylova, Jemin George, Tien Pham
{"title":"Gaussian Process Upper Confidence Bounds in Distributed Point Target Tracking over Wireless Sensor Networks","authors":"Xingchi Liu, Lyudmila Mihaylova, Jemin George, Tien Pham","doi":"arxiv-2409.07652","DOIUrl":"https://doi.org/arxiv-2409.07652","url":null,"abstract":"Uncertainty quantification plays a key role in the development of autonomous\u0000systems, decision-making, and tracking over wireless sensor networks (WSNs).\u0000However, there is a need of providing uncertainty confidence bounds, especially\u0000for distributed machine learning-based tracking, dealing with different volumes\u0000of data collected by sensors. This paper aims to fill in this gap and proposes\u0000a distributed Gaussian process (DGP) approach for point target tracking and\u0000derives upper confidence bounds (UCBs) of the state estimates. A unique\u0000contribution of this paper includes the derived theoretical guarantees on the\u0000proposed approach and its maximum accuracy for tracking with and without\u0000clutter measurements. Particularly, the developed approaches with uncertainty\u0000bounds are generic and can provide trustworthy solutions with an increased\u0000level of reliability. A novel hybrid Bayesian filtering method is proposed to\u0000enhance the DGP approach by adopting a Poisson measurement likelihood model.\u0000The proposed approaches are validated over a WSN case study, where sensors have\u0000limited sensing ranges. Numerical results demonstrate the tracking accuracy and\u0000robustness of the proposed approaches. The derived UCBs constitute a tool for\u0000trustworthiness evaluation of DGP approaches. The simulation results reveal\u0000that the proposed UCBs successfully encompass the true target states with 88%\u0000and 42% higher probability in X and Y coordinates, respectively, when compared\u0000to the confidence interval-based method.","PeriodicalId":501379,"journal":{"name":"arXiv - STAT - Statistics Theory","volume":"396 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142192578","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Identifiability of Polynomial Models from First Principles and via a Gröbner Basis Approach","authors":"Janet D. Godolphin, James D. E. Grant","doi":"arxiv-2409.07062","DOIUrl":"https://doi.org/arxiv-2409.07062","url":null,"abstract":"The relationship between a set of design points and the class of hierarchical\u0000polynomial models identifiable from the design is investigated. Saturated\u0000models are of particular interest. Necessary and sufficient conditions are\u0000derived on the set of design points for specific terms to be included in leaves\u0000of the statistical fan. A practitioner led approach to building hierarchical\u0000saturated models that are identifiable is developed. This approach is compared\u0000to the method of model building based on Gr\"{o}bner bases. The main results\u0000are illustrated by examples.","PeriodicalId":501379,"journal":{"name":"arXiv - STAT - Statistics Theory","volume":"8 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142192579","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Where does the tail start? Inflection Points and Maximum Curvature as Boundaries","authors":"Rafael Cabral, Maria de Iorio, Andrea Cremaschi","doi":"arxiv-2409.06308","DOIUrl":"https://doi.org/arxiv-2409.06308","url":null,"abstract":"Understanding the tail behavior of distributions is crucial in statistical\u0000theory. For instance, the tail of a distribution plays a ubiquitous role in\u0000extreme value statistics, where it is associated with the likelihood of extreme\u0000events. There are several ways to characterize the tail of a distribution based\u0000on how the tail function, $bar{F}(x) = P(X>x)$, behaves when $xtoinfty$.\u0000However, for unimodal distributions, where does the core of the distribution\u0000end and the tail begin? This paper addresses this unresolved question and\u0000explores the usage of delimiting points obtained from the derivatives of the\u0000density function of continuous random variables, namely, the inflection point\u0000and the point of maximum curvature. These points are used to delimit the bulk\u0000of the distribution from its tails. We discuss the estimation of these\u0000delimiting points and compare them with other measures associated with the tail\u0000of a distribution, such as the kurtosis and extreme quantiles. We derive the\u0000proposed delimiting points for several known distributions and show that it can\u0000be a reasonable criterion for defining the starting point of the tail of a\u0000distribution.","PeriodicalId":501379,"journal":{"name":"arXiv - STAT - Statistics Theory","volume":"58 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142192582","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Asymptotic properties of the maximum likelihood estimator for Hidden Markov Models indexed by binary trees","authors":"Julien WeibelIDP, CERMICS","doi":"arxiv-2409.06295","DOIUrl":"https://doi.org/arxiv-2409.06295","url":null,"abstract":"We consider hidden Markov models indexed by a binary tree where the hidden\u0000state space is a general metric space. We study the maximum likelihood\u0000estimator (MLE) of the model parameters based only on the observed variables.\u0000In both stationary and non-stationary regimes, we prove strong consistency and\u0000asymptotic normality of the MLE under standard assumptions. Those standard\u0000assumptions imply uniform exponential memorylessness properties of the initial\u0000distribution conditional on the observations. The proofs rely on ergodic\u0000theorems for Markov chain indexed by trees with neighborhood-dependent\u0000functions.","PeriodicalId":501379,"journal":{"name":"arXiv - STAT - Statistics Theory","volume":"75 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142192585","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Many-sample tests for the equality and the proportionality hypotheses between large covariance matrices","authors":"Tianxing Mei, Chen Wang, Jianfeng Yao","doi":"arxiv-2409.06296","DOIUrl":"https://doi.org/arxiv-2409.06296","url":null,"abstract":"This paper proposes procedures for testing the equality hypothesis and the\u0000proportionality hypothesis involving a large number of $q$ covariance matrices\u0000of dimension $ptimes p$. Under a limiting scheme where $p$, $q$ and the sample\u0000sizes from the $q$ populations grow to infinity in a proper manner, the\u0000proposed test statistics are shown to be asymptotically normal. Simulation\u0000results show that finite sample properties of the test procedures are\u0000satisfactory under both the null and alternatives. As an application, we derive\u0000a test procedure for the Kronecker product covariance specification for\u0000transposable data. Empirical analysis of datasets from the Mouse Aging Project\u0000and the 1000 Genomes Project (phase 3) is also conducted.","PeriodicalId":501379,"journal":{"name":"arXiv - STAT - Statistics Theory","volume":"35 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142192581","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Enzyme kinetic reactions as interacting particle systems: Stochastic averaging and parameter inference","authors":"Arnab Ganguly, Wasiur R. KhudaBukhsh","doi":"arxiv-2409.06565","DOIUrl":"https://doi.org/arxiv-2409.06565","url":null,"abstract":"We consider a stochastic model of multistage Michaelis--Menten (MM) type\u0000enzyme kinetic reactions describing the conversion of substrate molecules to a\u0000product through several intermediate species. The high-dimensional, multiscale\u0000nature of these reaction networks presents significant computational\u0000challenges, especially in statistical estimation of reaction rates. This\u0000difficulty is amplified when direct data on system states are unavailable, and\u0000one only has access to a random sample of product formation times. To address\u0000this, we proceed in two stages. First, under certain technical assumptions akin\u0000to those made in the Quasi-steady-state approximation (QSSA) literature, we\u0000prove two asymptotic results: a stochastic averaging principle that yields a\u0000lower-dimensional model, and a functional central limit theorem that quantifies\u0000the associated fluctuations. Next, for statistical inference of the parameters\u0000of the original MM reaction network, we develop a mathematical framework\u0000involving an interacting particle system (IPS) and prove a propagation of chaos\u0000result that allows us to write a product-form likelihood function. The novelty\u0000of the IPS-based inference method is that it does not require information about\u0000the state of the system and works with only a random sample of product\u0000formation times. We provide numerical examples to illustrate the efficacy of\u0000the theoretical results.","PeriodicalId":501379,"journal":{"name":"arXiv - STAT - Statistics Theory","volume":"255 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142192589","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On Sparsity and Sub-Gaussianity in the Johnson-Lindenstrauss Lemma","authors":"Aurélien GarivierUMPA-ENSL, MC2, Emmanuel PilliatUMPA-ENSL","doi":"arxiv-2409.06275","DOIUrl":"https://doi.org/arxiv-2409.06275","url":null,"abstract":"We provide a simple proof of the Johnson-Lindenstrauss lemma for sub-Gaussian\u0000variables. We extend the analysis to identify how sparse projections can be,\u0000and what the cost of sparsity is on the target dimension.The\u0000Johnson-Lindenstrauss lemma is the theoretical core of the dimensionality\u0000reduction methods based on random projections. While its original formulation\u0000involves matrices with Gaussian entries, the computational cost of random\u0000projections can be drastically reduced by the use of simpler variables,\u0000especially if they vanish with a high probability. In this paper, we propose a\u0000simple and elementary analysis of random projections under classical\u0000assumptions that emphasizes the key role of sub-Gaussianity. Furthermore, we\u0000show how to extend it to sparse projections, emphasizing the limits induced by\u0000the sparsity of the data itself.","PeriodicalId":501379,"journal":{"name":"arXiv - STAT - Statistics Theory","volume":"76 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142192583","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Bootstrapping Estimators based on the Block Maxima Method","authors":"Axel Bücher, Torben Staud","doi":"arxiv-2409.05529","DOIUrl":"https://doi.org/arxiv-2409.05529","url":null,"abstract":"The block maxima method is a standard approach for analyzing the extremal\u0000behavior of a potentially multivariate time series. It has recently been found\u0000that the classical approach based on disjoint block maxima may be universally\u0000improved by considering sliding block maxima instead. However, the asymptotic\u0000variance formula for estimators based on sliding block maxima involves an\u0000integral over the covariance of a certain family of multivariate extreme value\u0000distributions, which makes its estimation, and inference in general, an\u0000intricate problem. As an alternative, one may rely on bootstrap approximations:\u0000we show that naive block-bootstrap approaches from time series analysis are\u0000inconsistent even in i.i.d. situations, and provide a consistent alternative\u0000based on resampling circular block maxima. As a by-product, we show consistency\u0000of the classical resampling bootstrap for disjoint block maxima, and that\u0000estimators based on circular block maxima have the same asymptotic variance as\u0000their sliding block maxima counterparts. The finite sample properties are\u0000illustrated by Monte Carlo experiments, and the methods are demonstrated by a\u0000case study of precipitation extremes.","PeriodicalId":501379,"journal":{"name":"arXiv - STAT - Statistics Theory","volume":"24 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142192984","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}