{"title":"An embedded diachronic sense change model with a case study from ancient Greek","authors":"Schyan Zafar, Geoff K. Nicholls","doi":"10.1016/j.csda.2024.108011","DOIUrl":"https://doi.org/10.1016/j.csda.2024.108011","url":null,"abstract":"<div><p>Word meanings change over time, and word <em>senses</em> evolve, emerge or die out in the process. For ancient languages, where the corpora are often small and sparse, modelling such changes accurately proves challenging, and quantifying uncertainty in sense-change estimates consequently becomes important. GASC (Genre-Aware Semantic Change) and DiSC (Diachronic Sense Change) are existing generative models that have been used to analyse sense change for target words from an ancient Greek text corpus, using unsupervised learning without the help of any pre-training. These models represent the senses of a given target word such as “kosmos” (meaning decoration, order or world) as distributions over context words, and sense prevalence as a distribution over senses. The models are fitted using Markov Chain Monte Carlo (MCMC) methods to measure temporal changes in these representations. This paper introduces EDiSC, an Embedded DiSC model, which combines word embeddings with DiSC to provide superior model performance. It is shown empirically that EDiSC offers improved predictive accuracy, ground-truth recovery and uncertainty quantification, as well as better sampling efficiency and scalability properties with MCMC methods. The challenges of fitting these models are also discussed.</p></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"199 ","pages":"Article 108011"},"PeriodicalIF":1.5,"publicationDate":"2024-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0167947324000951/pdfft?md5=12930590074b9c3008e514576f2c4ba0&pid=1-s2.0-S0167947324000951-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141485448","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A double Pólya-Gamma data augmentation scheme for a hierarchical Negative Binomial - Binomial data model","authors":"Xuan Ma, Jenný Brynjarsdóttir, Thomas LaFramboise","doi":"10.1016/j.csda.2024.108009","DOIUrl":"https://doi.org/10.1016/j.csda.2024.108009","url":null,"abstract":"<div><p>A double Pólya-Gamma data augmentation scheme is developed for posterior sampling from a Bayesian hierarchical model of total and categorical count data. The scheme applies to a Negative Binomial - Binomial (NBB) hierarchical regression model with logit links and normal priors on regression coefficients. The approach is shown to be very efficient and in most cases out-performs the Stan program. The hierarchical modeling framework and the Pólya-Gamma data augmentation scheme are applied to human mitochondrial DNA data.</p></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"199 ","pages":"Article 108009"},"PeriodicalIF":1.5,"publicationDate":"2024-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0167947324000938/pdfft?md5=5e06b3420d4ee7efb587c1f231e8d551&pid=1-s2.0-S0167947324000938-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141485449","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Strong orthogonal Latin hypercubes for computer experiments","authors":"Chunyan Wang , Dennis K.J. Lin","doi":"10.1016/j.csda.2024.107999","DOIUrl":"https://doi.org/10.1016/j.csda.2024.107999","url":null,"abstract":"<div><p>Orthogonal Latin hypercubes are widely used for computer experiments. They achieve both orthogonality and the maximum one-dimensional stratification property. When two-factor (and higher-order) interactions are active, two- and three-dimensional stratifications are also important. Unfortunately, little is known about orthogonal Latin hypercubes with good two (and higher)–dimensional stratification properties. A method is proposed for constructing a new class of orthogonal Latin hypercubes whose columns can be partitioned into groups, such that the columns from different groups maintain two- and three-dimensional stratification properties. The proposed designs perform well under almost all popular criteria (e.g., the orthogonality, stratification, and maximin distance criterion). They are the most ideal designs for computer experiments. The construction method can be straightforward to implement, and the relevant theoretical supports are well established. The proposed strong orthogonal Latin hypercubes are tabulated for practical needs.</p></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"198 ","pages":"Article 107999"},"PeriodicalIF":1.5,"publicationDate":"2024-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141481268","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Nonnegative GARCH-type models with conditional Gamma distributions and their applications","authors":"Eunju Hwang, ChanHyeok Jeon","doi":"10.1016/j.csda.2024.108006","DOIUrl":"10.1016/j.csda.2024.108006","url":null,"abstract":"<div><p>Most of real data are characterized by positive, asymmetric and skewed distributions of various shapes. Modelling and forecasting of such data are addressed by proposing nonnegative conditional heteroscedastic time series models with Gamma distributions. Three types of time-varying parameters of Gamma distributions are adopted to construct the nonnegative GARCH models. A condition for the existence of a stationary Gamma-GARCH model is given. Parameter estimates are discussed via maximum likelihood estimation (MLE) method. A Monte-Carlo study is conducted to illustrate sample paths of the proposed models and to see finite-sample validity of the MLEs, as well as to evaluate model diagnostics using standardized Pearson residuals. Furthermore, out-of-sample forecasting analysis is performed to compute forecasting accuracy measures. Applications to oil price and Bitcoin data are given, respectively.</p></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"198 ","pages":"Article 108006"},"PeriodicalIF":1.5,"publicationDate":"2024-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141395917","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Conditional mean dimension reduction for tensor time series","authors":"Chung Eun Lee , Xin Zhang","doi":"10.1016/j.csda.2024.107998","DOIUrl":"10.1016/j.csda.2024.107998","url":null,"abstract":"<div><p>The dimension reduction problem for a stationary tensor time series is addressed. The goal is to remove linear combinations of the tensor time series that are mean independent of the past, without imposing any parametric models or distributional assumptions. To achieve this goal, a new metric called cumulative tensor martingale difference divergence is introduced and its theoretical properties are studied. Unlike existing methods, the proposed approach achieves dimension reduction by estimating a distinctive subspace that can fully retain the conditional mean information. By focusing on the conditional mean, the proposed dimension reduction method is potentially more accurate in prediction. The method can be viewed as a factor model-based approach that extends the existing techniques for estimating central subspace or central mean subspace in vector time series. The effectiveness of the proposed method is illustrated by extensive simulations and two real-world data applications.</p></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"199 ","pages":"Article 107998"},"PeriodicalIF":1.5,"publicationDate":"2024-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141389420","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Study of imputation procedures for nonparametric density estimation based on missing censored lifetimes","authors":"Sam Efromovich, Lirit Fuksman","doi":"10.1016/j.csda.2024.107994","DOIUrl":"https://doi.org/10.1016/j.csda.2024.107994","url":null,"abstract":"<div><p>Imputation is a standard procedure in dealing with missing data and there are many competing imputation methods. It is proposed to analyze imputation procedures via comparison with a benchmark developed by the asymptotic theory. Considered model is nonparametric density estimation of the missing right censored lifetime of interest. This model is of a special interest for understanding imputation because each underlying observation is the pair of censored lifetime and indicator of censoring. The latter creates a number of interesting scenarios and challenges for imputation when best methods may or may not be applicable. Further, the theory sheds light on why the effect of imputation depends on an underlying density. The methodology is tested on real life datasets and via intensive simulations. Data and R code are provided.</p></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"198 ","pages":"Article 107994"},"PeriodicalIF":1.8,"publicationDate":"2024-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141308344","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Inference for high-dimensional linear expectile regression with de-biasing method","authors":"Xiang Li , Yu-Ning Li , Li-Xin Zhang , Jun Zhao","doi":"10.1016/j.csda.2024.107997","DOIUrl":"https://doi.org/10.1016/j.csda.2024.107997","url":null,"abstract":"<div><p>The methodology for the inference problem in high-dimensional linear expectile regression is developed. By transforming the expectile loss into a weighted-least-squares form and applying a de-biasing strategy, Wald-type tests for multiple constraints within a regularized framework are established. An estimator for the pseudo-inverse of the generalized Hessian matrix in high dimension is constructed using general amenable regularizers, including Lasso and SCAD, with its consistency demonstrated through a novel proof technique. Simulation studies and real data applications demonstrate the efficacy of the proposed test statistic in both homoscedastic and heteroscedastic scenarios.</p></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"198 ","pages":"Article 107997"},"PeriodicalIF":1.8,"publicationDate":"2024-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141324737","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Latent event history models for quasi-reaction systems","authors":"Matteo Framba , Veronica Vinciotti , Ernst C. Wit","doi":"10.1016/j.csda.2024.107996","DOIUrl":"https://doi.org/10.1016/j.csda.2024.107996","url":null,"abstract":"<div><p>Various processes, such as cell differentiation and disease spreading, can be modelled as quasi-reaction systems of particles using stochastic differential equations. The existing Local Linear Approximation (LLA) method infers the parameters driving these systems from measurements of particle abundances over time. While dense observations of the process in time should in theory improve parameter estimation, LLA fails in these situations due to numerical instability. Defining a latent event history model of the underlying quasi-reaction system resolves this problem. A computationally efficient Expectation-Maximization algorithm is proposed for parameter estimation, incorporating an extended Kalman filter for evaluating the latent reactions. A simulation study demonstrates the method's performance and highlights the settings where it is particularly advantageous compared to the existing LLA approaches. An illustration of the method applied to the diffusion of COVID-19 in Italy is presented.</p></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"198 ","pages":"Article 107996"},"PeriodicalIF":1.8,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S016794732400080X/pdfft?md5=524e7377774b8a5df2e3a994373e6394&pid=1-s2.0-S016794732400080X-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141243341","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Medoid splits for efficient random forests in metric spaces","authors":"Matthieu Bulté , Helle Sørensen","doi":"10.1016/j.csda.2024.107995","DOIUrl":"https://doi.org/10.1016/j.csda.2024.107995","url":null,"abstract":"<div><p>An adaptation of the random forest algorithm for Fréchet regression is revisited, addressing the challenge of regression with random objects in metric spaces. To overcome the limitations of previous approaches, a new splitting rule is introduced, substituting the computationally expensive Fréchet means with a medoid-based approach. The asymptotic equivalence of this method to Fréchet mean-based procedures is demonstrated, along with the consistency of the associated regression estimator. This approach provides a sound theoretical framework and a more efficient computational solution to Fréchet regression, broadening its application to non-standard data types and complex use cases.</p></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"198 ","pages":"Article 107995"},"PeriodicalIF":1.5,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0167947324000793/pdfft?md5=90ce48cb2e6d039f213ac81b5b60098d&pid=1-s2.0-S0167947324000793-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141481267","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Consistent skinny Gibbs in probit regression","authors":"Jiarong Ouyang, Xuan Cao","doi":"10.1016/j.csda.2024.107993","DOIUrl":"https://doi.org/10.1016/j.csda.2024.107993","url":null,"abstract":"<div><p>Spike and slab priors have emerged as effective and computationally scalable tools for Bayesian variable selection in high-dimensional linear regression. However, the crucial model selection consistency and efficient computational strategies using spike and slab priors in probit regression have rarely been investigated. A hierarchical probit model with continuous spike and slab priors over regression coefficients is considered, and a highly scalable Gibbs sampler with a computational complexity that grows only linearly in the dimension of predictors is proposed. Specifically, the “Skinny Gibbs” algorithm is adapted to the setting of probit and negative binomial regression and model selection consistency for the proposed method under probit model is established, when the number of covariates is allowed to grow much larger than the sample size. Through simulation studies, the method is shown to achieve superior empirical performance compared with other state-of-the art methods. Gene expression data from 51 asthmatic and 44 non-asthmatic samples are analyzed and the performance for predicting asthma using the proposed approach is compared with existing approaches.</p></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"198 ","pages":"Article 107993"},"PeriodicalIF":1.8,"publicationDate":"2024-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141243339","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}