Computational Statistics最新文献_第9页

An effective method for identifying clusters of robot strengths 识别机器人优势集群的有效方法

IF 1.3 4区数学

Computational Statistics Pub Date : 2023-12-11 DOI: 10.1007/s00180-023-01442-5

Jen-Chieh Teng, Chin-Tsang Chiang, Alvin Lim

{"title":"An effective method for identifying clusters of robot strengths","authors":"Jen-Chieh Teng, Chin-Tsang Chiang, Alvin Lim","doi":"10.1007/s00180-023-01442-5","DOIUrl":"https://doi.org/10.1007/s00180-023-01442-5","url":null,"abstract":"In the analysis of qualification stage data from FIRST Robotics Competition (FRC) championships, the ratio (1.67–1.68) of the number of observations (110–114 matches) to the number of parameters (66–68 robots) in each division has been found to be quite small for the most commonly used winning margin power rating (WMPR) model. This usually leads to imprecise estimates and inaccurate predictions in such three-on-three matches that FRC tournaments are composed of. With the recognition of a clustering feature in estimated robot strengths, a more flexible model with latent clusters of robots was proposed to alleviate overparameterization of the WMPR model. Since its structure can be regarded as a dimension reduction of the parameter space in the WMPR model, the identification of clusters of robot strengths is naturally transformed into a model selection problem. Instead of comparing a huge number of competing models ((7.76times 10^{67}) to (3.66times 10^{70})), we develop an effective method to estimate the number of clusters, clusters of robots and robot strengths in the format of qualification stage data from the FRC championships. The new method consists of two parts: (i) a combination of hierarchical and non-hierarchical classifications to determine candidate models; and (ii) variant goodness-of-fit criteria to select optimal models. In contrast to existing hierarchical classification, each step of our proposed non-hierarchical classification is based on estimated robot strengths from a candidate model in the preceding non-hierarchical classification step. A great advantage of the proposed methodology is its ability to consider the possibility of reassigning robots to other clusters. To reduce overestimation of the number of clusters by the mean squared prediction error criteria, corresponding Bayesian information criteria are further established as alternatives for model selection. With a coherent assembly of these essential elements, a systematic procedure is presented to perform the estimation of parameters. In addition, we propose two indices to measure the nested relation between clusters from any two models and monotonic association between robot strengths from any two models. Data from the 2018 and 2019 FRC championships and a simulation study are also used to illustrate the applicability and superiority of our proposed methodology.","PeriodicalId":55223,"journal":{"name":"Computational Statistics","volume":"12 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2023-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138576940","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

High dimensional controlled variable selection with model-X knockoffs in the AFT model 在 AFT 模型中使用 X 模型山寨版进行高维受控变量选择

IF 1.3 4区数学

Computational Statistics Pub Date : 2023-12-09 DOI: 10.1007/s00180-023-01426-5

Baihua He, Di Xia, Yingli Pan

引用次数: 0

Dimension reduction and visualization of multiple time series data: a symbolic data analysis approach 多时间序列数据的降维与可视化：一种符号数据分析方法

IF 1.3 4区数学

Computational Statistics Pub Date : 2023-12-06 DOI: 10.1007/s00180-023-01440-7

Emily Chia-Yu Su, Han-Ming Wu

{"title":"Dimension reduction and visualization of multiple time series data: a symbolic data analysis approach","authors":"Emily Chia-Yu Su, Han-Ming Wu","doi":"10.1007/s00180-023-01440-7","DOIUrl":"https://doi.org/10.1007/s00180-023-01440-7","url":null,"abstract":"Exploratory analysis and visualization of multiple time series data are essential for discovering the underlying dynamics of a series before attempting modeling and forecasting. This study extends two dimension reduction methods - principal component analysis (PCA) and sliced inverse regression (SIR) - to multiple time series data. This is achieved through the innovative path point approach, a new addition to the symbolic data analysis framework. By transforming multiple time series data into time-dependent intervals marked by starting and ending values, each series is geometrically represented as successive directed segments with unique path points. These path points serve as the foundation of our novel representation approach. PCA and SIR are then applied to the data table formed by the coordinates of these path points, enabling visualization of temporal trajectories of objects within a reduced-dimensional subspace. Empirical studies encompassing simulations, microarray time series data from a yeast cell cycle, and financial data confirm the effectiveness of our path point approach in revealing the structure and behavior of objects within a 2D factorial plane. Comparative analyses with existing methods, such as the applied vector approach for PCA and SIR on time-dependent interval data, further underscore the strength and versatility of our path point representation in the realm of time series data.","PeriodicalId":55223,"journal":{"name":"Computational Statistics","volume":"93 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2023-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138548069","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An expectation maximization algorithm for the hidden markov models with multiparameter student-t observations 具有多参数student-t观测值的隐马尔可夫模型期望最大化算法

IF 1.3 4区数学

Computational Statistics Pub Date : 2023-12-06 DOI: 10.1007/s00180-023-01432-7

Emna Ghorbel, Mahdi Louati

{"title":"An expectation maximization algorithm for the hidden markov models with multiparameter student-t observations","authors":"Emna Ghorbel, Mahdi Louati","doi":"10.1007/s00180-023-01432-7","DOIUrl":"https://doi.org/10.1007/s00180-023-01432-7","url":null,"abstract":"Hidden Markov models are a class of probabilistic graphical models used to describe the evolution of a sequence of unknown variables from a set of observed variables. They are statistical models introduced by Baum and Petrie in Baum (JMA 101:789–810) and belong to the class of latent variable models. Initially developed and applied in the context of speech recognition, they have attracted much attention in many fields of application. The central objective of this research work is upon an extension of these models. More accurately, we define multiparameter hidden Markov models, using multiple observation processes and the Riesz distribution on the space of symmetric matrices as a natural extension of the gamma one. Some basic related properties are discussed and marginal and posterior distributions are derived. We conduct the Forward-Backward dynamic programming algorithm and the classical Expectation Maximization algorithm to estimate the global set of parameters. Using simulated data, the performance of these estimators is conveniently achieved by the Matlab program. This allows us to assess the quality of the proposed estimators by means of the mean square errors between the true and the estimated values.","PeriodicalId":55223,"journal":{"name":"Computational Statistics","volume":" 8","pages":""},"PeriodicalIF":1.3,"publicationDate":"2023-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138493829","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Sequential linear regression for conditional mean imputation of longitudinal continuous outcomes under reference-based assumptions 参考假设下纵向连续结果条件均值估算的序贯线性回归

IF 1.3 4区数学

Computational Statistics Pub Date : 2023-12-03 DOI: 10.1007/s00180-023-01439-0

Sean Yiu

{"title":"Sequential linear regression for conditional mean imputation of longitudinal continuous outcomes under reference-based assumptions","authors":"Sean Yiu","doi":"10.1007/s00180-023-01439-0","DOIUrl":"https://doi.org/10.1007/s00180-023-01439-0","url":null,"abstract":"In clinical trials of longitudinal continuous outcomes, reference based imputation (RBI) has commonly been applied to handle missing outcome data in settings where the estimand incorporates the effects of intercurrent events, e.g. treatment discontinuation. RBI was originally developed in the multiple imputation framework, however recently conditional mean imputation (CMI) combined with the jackknife estimator of the standard error was proposed as a way to obtain deterministic treatment effect estimates and correct frequentist inference. For both multiple and CMI, a mixed model for repeated measures (MMRM) is often used for the imputation model, but this can be computationally intensive to fit to multiple data sets (e.g. the jackknife samples) and lead to convergence issues with complex MMRM models with many parameters. Therefore, a step-wise approach based on sequential linear regression (SLR) of the outcomes at each visit was developed for the imputation model in the multiple imputation framework, but similar developments in the CMI framework are lacking. In this article, we fill this gap in the literature by proposing a SLR approach to implement RBI in the CMI framework, and justify its validity using theoretical results and simulations. We also illustrate our proposal on a real data application.","PeriodicalId":55223,"journal":{"name":"Computational Statistics","volume":" 9","pages":""},"PeriodicalIF":1.3,"publicationDate":"2023-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138493828","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Pair programming with ChatGPT for sampling and estimation of copulas 用ChatGPT进行结对编程的抽样和估计

IF 1.3 4区数学

Computational Statistics Pub Date : 2023-12-01 DOI: 10.1007/s00180-023-01437-2

Jan Górecki

{"title":"Pair programming with ChatGPT for sampling and estimation of copulas","authors":"Jan Górecki","doi":"10.1007/s00180-023-01437-2","DOIUrl":"https://doi.org/10.1007/s00180-023-01437-2","url":null,"abstract":"Without writing a single line of code by a human, an example Monte Carlo simulation-based application for stochastic dependence modeling with copulas is developed through pair programming involving a human partner and a large language model (LLM) fine-tuned for conversations. This process encompasses interacting with ChatGPT using both natural language and mathematical formalism. Under the careful supervision of a human expert, this interaction facilitated the creation of functioning code in MATLAB, Python, and R. The code performs a variety of tasks including sampling from a given copula model, evaluating the model’s density, conducting maximum likelihood estimation, optimizing for parallel computing on CPUs and GPUs, and visualizing the computed results. In contrast to other emerging studies that assess the accuracy of LLMs like ChatGPT on tasks from a selected area, this work rather investigates ways how to achieve a successful solution of a standard statistical task in a collaboration of a human expert and artificial intelligence (AI). Particularly, through careful prompt engineering, we separate successful solutions generated by ChatGPT from unsuccessful ones, resulting in a comprehensive list of related pros and cons. It is demonstrated that if the typical pitfalls are avoided, we can substantially benefit from collaborating with an AI partner. For example, we show that if ChatGPT is not able to provide a correct solution due to a lack of or incorrect knowledge, the human-expert can feed it with the correct knowledge, e.g., in the form of mathematical theorems and formulas, and make it to apply the gained knowledge in order to provide a correct solution. Such ability presents an attractive opportunity to achieve a programmed solution even for users with rather limited knowledge of programming techniques.","PeriodicalId":55223,"journal":{"name":"Computational Statistics","volume":"26 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138516699","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Wavelet-based Bayesian approximate kernel method for high-dimensional data analysis 基于小波的贝叶斯近似核方法用于高维数据分析

IF 1.3 4区数学

Computational Statistics Pub Date : 2023-11-26 DOI: 10.1007/s00180-023-01438-1

Wenxing Guo, Xueying Zhang, Bei Jiang, Linglong Kong, Yaozhong Hu

引用次数: 0

Two-sample Behrens–Fisher problems for high-dimensional data: a normal reference F-type test 高维数据的双样本Behrens-Fisher问题:一个正常的参考f型检验

IF 1.3 4区数学

Computational Statistics Pub Date : 2023-11-24 DOI: 10.1007/s00180-023-01433-6

Tianming Zhu, Pengfei Wang, Jin-Ting Zhang

{"title":"Two-sample Behrens–Fisher problems for high-dimensional data: a normal reference F-type test","authors":"Tianming Zhu, Pengfei Wang, Jin-Ting Zhang","doi":"10.1007/s00180-023-01433-6","DOIUrl":"https://doi.org/10.1007/s00180-023-01433-6","url":null,"abstract":"The problem of testing the equality of mean vectors for high-dimensional data has been intensively investigated in the literature. However, most of the existing tests impose strong assumptions on the underlying group covariance matrices which may not be satisfied or hardly be checked in practice. In this article, an F-type test for two-sample Behrens–Fisher problems for high-dimensional data is proposed and studied. When the two samples are normally distributed and when the null hypothesis is valid, the proposed F-type test statistic is shown to be an F-type mixture, a ratio of two independent (chi ^2)-type mixtures. Under some regularity conditions and the null hypothesis, it is shown that the proposed F-type test statistic and the above F-type mixture have the same normal and non-normal limits. It is then justified to approximate the null distribution of the proposed F-type test statistic by that of the F-type mixture, resulting in the so-called normal reference F-type test. Since the F-type mixture is a ratio of two independent (chi ^2)-type mixtures, we employ the Welch–Satterthwaite (chi ^2)-approximation to the distributions of the numerator and the denominator of the F-type mixture respectively, resulting in an approximation F-distribution whose degrees of freedom can be consistently estimated from the data. The asymptotic power of the proposed F-type test is established. Two simulation studies are conducted and they show that in terms of size control, the proposed F-type test outperforms two existing competitors. The good performance of the proposed F-type test is also illustrated by a COVID-19 data example.","PeriodicalId":55223,"journal":{"name":"Computational Statistics","volume":"18 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2023-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138516672","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A new bandwidth selection method for nonparametric modal regression based on generalized hyperbolic distributions 基于广义双曲分布的非参数模态回归带宽选择新方法

IF 1.3 4区数学

Computational Statistics Pub Date : 2023-11-18 DOI: 10.1007/s00180-023-01435-4

Hongpeng Yuan, Sijia Xiang, Weixin Yao

{"title":"A new bandwidth selection method for nonparametric modal regression based on generalized hyperbolic distributions","authors":"Hongpeng Yuan, Sijia Xiang, Weixin Yao","doi":"10.1007/s00180-023-01435-4","DOIUrl":"https://doi.org/10.1007/s00180-023-01435-4","url":null,"abstract":"As a complement to standard mean and quantile regression, nonparametric modal regression has been broadly applied in various fields. By focusing on the most likely conditional value of Y given x, the nonparametric modal regression is shown to be resistant to outliers and some forms of measurement error, and the prediction intervals are shorter when data is skewed. However, the bandwidth selection is critical but very challenging, since the traditional least-squares based cross-validation method cannot be applied. We propose to select the bandwidth by applying the asymptotic global optimal bandwidth and the flexible generalized hyperbolic (GH) distribution as the distribution of the error. Unlike the plug-in method, the new method does not require preliminary parameters to be chosen in advance, is easy to compute by any statistical software, and is computationally efficient compared to the existing kernel density estimator (KDE) based method. Numerical studies show that the GH based bandwidth performs better than existing bandwidth selector, in terms of higher coverage probabilities. Real data applications also illustrate the superior performance of the new bandwidth.","PeriodicalId":55223,"journal":{"name":"Computational Statistics","volume":"22 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2023-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138516650","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Simultaneous subgroup identification and variable selection for high dimensional data 高维数据的同时子群识别和变量选择

IF 1.3 4区数学

Computational Statistics Pub Date : 2023-11-17 DOI: 10.1007/s00180-023-01436-3

Huicong Yu, Jiaqi Wu, Weiping Zhang

引用次数: 0