{"title":"A limit formula and a series expansion for the bivariate Normal tail probability","authors":"Siu-Kui Au","doi":"10.1007/s11222-024-10466-w","DOIUrl":"https://doi.org/10.1007/s11222-024-10466-w","url":null,"abstract":"<p>This work presents a limit formula for the bivariate Normal tail probability. It only requires the larger threshold to grow indefinitely, but otherwise has no restrictions on how the thresholds grow. The correlation parameter can change and possibly depend on the thresholds. The formula is applicable regardless of Salvage’s condition. Asymptotically, it reduces to Ruben’s formula and Hashorva’s formula under the corresponding conditions, and therefore can be considered a generalisation. Under a mild condition, it satisfies Plackett’s identity on the derivative with respect to the correlation parameter. Motivated by the limit formula, a series expansion is also obtained for the exact tail probability using derivatives of the univariate Mill’s ratio. Under similar conditions for the limit formula, the series converges and its truncated approximation has a small remainder term for large thresholds. To take advantage of this, a simple procedure is developed for the general case by remapping the parameters so that they satisfy the conditions. Examples are presented to illustrate the theoretical findings.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":null,"pages":null},"PeriodicalIF":2.2,"publicationDate":"2024-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141575837","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fabiana Camattari, Sabrina Guastavino, Francesco Marchetti, Michele Piana, Emma Perracchione
{"title":"Classifier-dependent feature selection via greedy methods","authors":"Fabiana Camattari, Sabrina Guastavino, Francesco Marchetti, Michele Piana, Emma Perracchione","doi":"10.1007/s11222-024-10460-2","DOIUrl":"https://doi.org/10.1007/s11222-024-10460-2","url":null,"abstract":"<p>The purpose of this study is to introduce a new approach to feature ranking for classification tasks, called in what follows greedy feature selection. In statistical learning, feature selection is usually realized by means of methods that are independent of the classifier applied to perform the prediction using that reduced number of features. Instead, the greedy feature selection identifies the most important feature at each step and according to the selected classifier. The benefits of such scheme are investigated in terms of model capacity indicators, such as the Vapnik-Chervonenkis dimension or the kernel alignment. This theoretical study proves that the iterative greedy algorithm is able to construct classifiers whose complexity capacity grows at each step. The proposed method is then tested numerically on various datasets and compared to the state-of-the-art techniques. The results show that our iterative scheme is able to truly capture only a few relevant features, and may improve, especially for real and noisy data, the accuracy scores of other techniques. The greedy scheme is also applied to the challenging application of predicting geo-effective manifestations of the active Sun.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":null,"pages":null},"PeriodicalIF":2.2,"publicationDate":"2024-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141575838","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sude Gurer, Han Lin Shang, Abhijit Mandal, Ufuk Beyaztas
{"title":"Locally sparse and robust partial least squares in scalar-on-function regression","authors":"Sude Gurer, Han Lin Shang, Abhijit Mandal, Ufuk Beyaztas","doi":"10.1007/s11222-024-10464-y","DOIUrl":"https://doi.org/10.1007/s11222-024-10464-y","url":null,"abstract":"<p>We present a novel approach for estimating a scalar-on-function regression model, leveraging a functional partial least squares methodology. Our proposed method involves computing the functional partial least squares components through sparse partial robust M regression, facilitating robust and locally sparse estimations of the regression coefficient function. This strategy delivers a robust decomposition for the functional predictor and regression coefficient functions. After the decomposition, model parameters are estimated using a weighted loss function, incorporating robustness through iterative reweighting of the partial least squares components. The robust decomposition feature of our proposed method enables the robust estimation of model parameters in the scalar-on-function regression model, ensuring reliable predictions in the presence of outliers and leverage points. Moreover, it accurately identifies zero and nonzero sub-regions where the slope function is estimated, even in the presence of outliers and leverage points. We assess our proposed method’s estimation and predictive performance through a series of Monte Carlo experiments and an empirical dataset—that is, data collected in relation to oriented strand board. Compared to existing methods our proposed method performs favorably. Notably, our robust procedure exhibits superior performance in the presence of outliers while maintaining competitiveness in their absence. Our method has been implemented in the <span>robsfplsr</span> package in .</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":null,"pages":null},"PeriodicalIF":2.2,"publicationDate":"2024-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141575839","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Efficient Shapley performance attribution for least-squares regression","authors":"Logan Bell, Nikhil Devanathan, Stephen Boyd","doi":"10.1007/s11222-024-10459-9","DOIUrl":"https://doi.org/10.1007/s11222-024-10459-9","url":null,"abstract":"<p>We consider the performance of a least-squares regression model, as judged by out-of-sample <span>(R^2)</span>. Shapley values give a fair attribution of the performance of a model to its input features, taking into account interdependencies between features. Evaluating the Shapley values exactly requires solving a number of regression problems that is exponential in the number of features, so a Monte Carlo-type approximation is typically used. We focus on the special case of least-squares regression models, where several tricks can be used to compute and evaluate regression models efficiently. These tricks give a substantial speed up, allowing many more Monte Carlo samples to be evaluated, achieving better accuracy. We refer to our method as least-squares Shapley performance attribution (LS-SPA), and describe our open-source implementation.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":null,"pages":null},"PeriodicalIF":2.2,"publicationDate":"2024-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141546756","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On weak convergence of quantile-based empirical likelihood process for ROC curves","authors":"Hu Jiang, Liu Yiming, Zhou Wang","doi":"10.1007/s11222-024-10457-x","DOIUrl":"https://doi.org/10.1007/s11222-024-10457-x","url":null,"abstract":"<p>The empirical likelihood (EL) method possesses desirable qualities such as automatically determining confidence regions and circumventing the need for variance estimation. As an extension, a quantile-based EL (QEL) method is considered, which results in a simpler form. In this paper, we explore the framework of the QEL method. Firstly, we explore the weak convergence of the −2 log empirical likelihood ratio for ROC curves. We also introduce a novel statistic for testing the entire ROC curve and the equality of two distributions. To validate our approach, we conduct simulation studies and analyze real data from hepatitis C patients, comparing our method with existing ones.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":null,"pages":null},"PeriodicalIF":2.2,"publicationDate":"2024-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141546755","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Systemic infinitesimal over-dispersion on graphical dynamic models","authors":"Ning Ning, Edward Ionides","doi":"10.1007/s11222-024-10443-3","DOIUrl":"https://doi.org/10.1007/s11222-024-10443-3","url":null,"abstract":"<p>Stochastic models for collections of interacting populations have crucial roles in many scientific fields such as epidemiology, ecology, performance engineering, and queueing theory, to name a few. However, the standard approach to extending an ordinary differential equation model to a Markov chain does not have sufficient flexibility in the mean-variance relationship to match data. To handle that, we develop new approaches using Dirichlet noise to construct collections of independent or dependent noise processes. This permits the modeling of high-frequency variation in transition rates both within and between the populations under study. Our theory is developed in a general framework of time-inhomogeneous Markov processes equipped with a general graphical structure. We demonstrate our approach on a widely analyzed measles dataset, adding Dirichlet noise to a classical Susceptible–Exposed–Infected–Recovered model. Our methodology shows improved statistical fit measured by log-likelihood and provides new insights into the dynamics of this biological system.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":null,"pages":null},"PeriodicalIF":2.2,"publicationDate":"2024-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141531285","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Douglas O. Cardoso, João Domingos Gomes da Silva Junior, Carla Silva Oliveira, Celso Marques, Laura Silva de Assis
{"title":"Greedy recursive spectral bisection for modularity-bound hierarchical divisive community detection","authors":"Douglas O. Cardoso, João Domingos Gomes da Silva Junior, Carla Silva Oliveira, Celso Marques, Laura Silva de Assis","doi":"10.1007/s11222-024-10451-3","DOIUrl":"https://doi.org/10.1007/s11222-024-10451-3","url":null,"abstract":"<p>Spectral clustering techniques depend on the eigenstructure of a similarity matrix to assign data points to clusters, so that points within the same cluster exhibit high similarity and are compared to those in different clusters. This work aimed to develop a spectral method that could be compared to clustering algorithms that represent the current state of the art. This investigation conceived a novel spectral clustering method, as well as five policies that guide its execution, based on spectral graph theory and embodying hierarchical clustering principles. Computational experiments comparing the proposed method with six state-of-the-art algorithms were undertaken in this study to evaluate the clustering methods under scrutiny. The assessment was performed using two evaluation metrics, specifically the adjusted Rand index, and modularity. The obtained results furnish compelling evidence, indicating that the proposed method is competitive and possesses distinctive properties compared to those elucidated in the existing literature. This suggests that our approach stands as a viable alternative, offering a robust choice within the spectrum of available same-purpose tools.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":null,"pages":null},"PeriodicalIF":2.2,"publicationDate":"2024-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141517773","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Flexible Bayesian quantile regression based on the generalized asymmetric Huberised-type distribution","authors":"Weitao Hu, Weiping Zhang","doi":"10.1007/s11222-024-10453-1","DOIUrl":"https://doi.org/10.1007/s11222-024-10453-1","url":null,"abstract":"<p>To enhance the robustness and flexibility of Bayesian quantile regression models using the asymmetric Laplace or asymmetric Huberised-type (AH) distribution, which lacks changeable mode, diminishing influence of outliers, and asymmetry under median regression, we propose a new generalized AH distribution which is achieved through a hierarchical mixture representation, thus leading to a flexible Bayesian Huberised quantile regression framework. With many parameters in the model, we develop an efficient Markov chain Monte Carlo procedure based on the Metropolis-within-Gibbs sampling algorithm. The robustness and flexibility of the new distribution are examined through intensive simulation studies and application to two real data sets.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":null,"pages":null},"PeriodicalIF":2.2,"publicationDate":"2024-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141529333","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Structured prior distributions for the covariance matrix in latent factor models","authors":"Sarah Elizabeth Heaps, Ian Hyla Jermyn","doi":"10.1007/s11222-024-10454-0","DOIUrl":"https://doi.org/10.1007/s11222-024-10454-0","url":null,"abstract":"<p>Factor models are widely used for dimension reduction in the analysis of multivariate data. This is achieved through decomposition of a <span>(p times p)</span> covariance matrix into the sum of two components. Through a latent factor representation, they can be interpreted as a diagonal matrix of idiosyncratic variances and a shared variation matrix, that is, the product of a <span>(p times k)</span> factor loadings matrix and its transpose. If <span>(k ll p)</span>, this defines a parsimonious factorisation of the covariance matrix. Historically, little attention has been paid to incorporating prior information in Bayesian analyses using factor models where, at best, the prior for the factor loadings is order invariant. In this work, a class of structured priors is developed that can encode ideas of dependence structure about the shared variation matrix. The construction allows data-informed shrinkage towards sensible parametric structures while also facilitating inference over the number of factors. Using an unconstrained reparameterisation of stationary vector autoregressions, the methodology is extended to stationary dynamic factor models. For computational inference, parameter-expanded Markov chain Monte Carlo samplers are proposed, including an efficient adaptive Gibbs sampler. Two substantive applications showcase the scope of the methodology and its inferential benefits.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":null,"pages":null},"PeriodicalIF":2.2,"publicationDate":"2024-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141506393","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}