Statistical Papers最新文献

Handling skewness and directional tails in model-based clustering. 在基于模型的聚类中处理偏度和方向尾。

IF 1.2 3区数学

Statistical Papers Pub Date : 2025-01-01 Epub Date: 2025-07-04 DOI: 10.1007/s00362-025-01723-9

Cristina Tortora, Antonio Punzo, Brian C Franczak

{"title":"Handling skewness and directional tails in model-based clustering.","authors":"Cristina Tortora, Antonio Punzo, Brian C Franczak","doi":"10.1007/s00362-025-01723-9","DOIUrl":"10.1007/s00362-025-01723-9","url":null,"abstract":"Model-based clustering is a powerful approach used in data analysis to unveil underlying patterns or groups within a data set. However, when applied to clusters that exhibit skewness, heavy tails, or both, the classification of data points becomes more challenging. In this study, we introduce two models considering two component-wise transformations of the observed data within a mixture of multiple scaled contaminated normal (MSCN) distributions. MSCN distributions are designed to enable a different tail behavior in each dimension and directional outlier detection in the direction of the principal components. Using the transformed MSCN distributions as components of a mixture, we obtain model-based clustering techniques that allow for 1) flexible cluster shapes in terms of skewness and kurtosis and 2) component-wise and directional outlier detection. We assess the efficacy of the proposed techniques by comparing them with model-based clustering methods that perform global or component-wise outlier detection using simulated and real data sets. This comparative analysis aims to demonstrate which practical clustering scenarios using the proposed MSCN-based approaches are advantageous.","PeriodicalId":51166,"journal":{"name":"Statistical Papers","volume":"66 5","pages":"114"},"PeriodicalIF":1.2,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12226708/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144576896","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Maximum likelihood estimation under the Emax model: existence, geometry and efficiency. Emax模型下的最大似然估计：存在性、几何和效率。

IF 1.2 3区数学

Statistical Papers Pub Date : 2025-01-01 Epub Date: 2025-06-10 DOI: 10.1007/s00362-025-01673-2

Giacomo Aletti, Nancy Flournoy, Caterina May, Chiara Tommasi

引用次数: 0

Local linear smoothing for regression surfaces on the simplex using Dirichlet kernels. 用狄利克雷核对单纯形上的回归曲面进行局部线性平滑。

IF 1.2 3区数学

Statistical Papers Pub Date : 2025-01-01 Epub Date: 2025-05-14 DOI: 10.1007/s00362-025-01708-8

Christian Genest, Frédéric Ouimet

引用次数: 0

The distribution of power-related random variables (and their use in clinical trials) 与功率有关的随机变量的分布（及其在临床试验中的应用）

IF 1.3 3区数学

Statistical Papers Pub Date : 2024-09-19 DOI: 10.1007/s00362-024-01599-1

Francesco Mariani, Fulvio De Santis, Stefania Gubbiotti

引用次数: 0

The cost of sequential adaptation and the lower bound for mean squared error 顺序适应的成本和均方误差的下限

IF 1.3 3区数学

Statistical Papers Pub Date : 2024-09-17 DOI: 10.1007/s00362-024-01565-x

Sergey Tarima, Nancy Flournoy

{"title":"The cost of sequential adaptation and the lower bound for mean squared error","authors":"Sergey Tarima, Nancy Flournoy","doi":"10.1007/s00362-024-01565-x","DOIUrl":"https://doi.org/10.1007/s00362-024-01565-x","url":null,"abstract":"Informative interim adaptations lead to random sample sizes. The random sample size becomes a component of the sufficient statistic and estimation based solely on observed samples or on the likelihood function does not use all available statistical evidence. The total Fisher Information (FI) is decomposed into the design FI and a conditional-on-design FI. The FI unspent by a design’s informative interim adaptation decomposes further into a weighted linear combination of FIs conditional-on-stopping decisions. Then, these components are used to determine the new lower mean squared error (MSE) in post-adaptation estimation because the Cramer–Rao lower bound (1945, 1946) and its sequential version suggested by Wolfowitz (Ann Math Stat 18(2):215–230, 1947) for non-informative stopping are not applicable to post-informative-adaptation estimation. In addition, we also show that the new proposed lower boundary on the MSE is reached by the maximum likelihood estimators in designs with informative adaptations when data are coming from one-parameter exponential family. Theoretical results are illustrated with simple normal samples collected according to a two-stage design with a possibility of early stopping.","PeriodicalId":51166,"journal":{"name":"Statistical Papers","volume":"207 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142268706","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Nested strong orthogonal arrays 嵌套强正交阵列

IF 1.3 3区数学

Statistical Papers Pub Date : 2024-09-16 DOI: 10.1007/s00362-024-01609-2

Chunwei Zheng, Wenlong Li, Jian-Feng Yang

引用次数: 0

Tests for time-varying coefficient spatial autoregressive panel data model with fixed effects 具有固定效应的时变系数空间自回归面板数据模型检验

IF 1.3 3区数学

Statistical Papers Pub Date : 2024-09-14 DOI: 10.1007/s00362-024-01607-4

Lingling Tian, Yunan Su, Chuanhua Wei

{"title":"Tests for time-varying coefficient spatial autoregressive panel data model with fixed effects","authors":"Lingling Tian, Yunan Su, Chuanhua Wei","doi":"10.1007/s00362-024-01607-4","DOIUrl":"https://doi.org/10.1007/s00362-024-01607-4","url":null,"abstract":"As an extension of the spatial autoregressive panel data model and the time-varying coefficient panel data model, the time-varying coefficient spatial autoregressive panel data model is useful in analysis of spatial panel data. While research has addressed the estimation problem of this model, less attention has been given to hypotheses tests. This paper studies two tests for this semiparametric spatial panel data model. One considers the existence of the spatial lag term, and the other determines whether some time-varying coefficients are constants. We employ the profile generalized likelihood ratio test procedure to construct the corresponding test statistic, and the residual-based bootstrap procedure is used to derive the p-value of the tests. Some simulations are conducted to evaluate the performance of the proposed test method, the results show that the proposed methods have good finite sample properties. Finally, we apply the proposed test methods to the provincial carbon emission data of China. Our findings suggest that the partially linear time-varying coefficients spatial autoregressive panel data model provides a better fit for the carbon emission data.","PeriodicalId":51166,"journal":{"name":"Statistical Papers","volume":"167 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142251182","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

On the consistency of supervised learning with missing values 论缺失值监督学习的一致性

IF 1.3 3区数学

Statistical Papers Pub Date : 2024-09-12 DOI: 10.1007/s00362-024-01550-4

Julie Josse, Jacob M. Chen, Nicolas Prost, Gaël Varoquaux, Erwan Scornet

{"title":"On the consistency of supervised learning with missing values","authors":"Julie Josse, Jacob M. Chen, Nicolas Prost, Gaël Varoquaux, Erwan Scornet","doi":"10.1007/s00362-024-01550-4","DOIUrl":"https://doi.org/10.1007/s00362-024-01550-4","url":null,"abstract":"In many application settings, data have missing entries, which makes subsequent analyses challenging. An abundant literature addresses missing values in an inferential framework, aiming at estimating parameters and their variance from incomplete tables. Here, we consider supervised-learning settings: predicting a target when missing values appear in both training and test data. We first rewrite classic missing values results for this setting. We then show the consistency of two approaches, test-time multiple imputation and single imputation in prediction. A striking result is that the widely-used method of imputing with a constant prior to learning is consistent when missing values are not informative. This contrasts with inferential settings where mean imputation is frowned upon as it distorts the distribution of the data. The consistency of such a popular simple approach is important in practice. Finally, to contrast procedures based on imputation prior to learning with procedures that optimize the missing-value handling for prediction, we consider decision trees. Indeed, decision trees are among the few methods that can tackle empirical risk minimization with missing values, due to their ability to handle the half-discrete nature of incomplete variables. After comparing empirically different missing values strategies in trees, we recommend using the “missing incorporated in attribute” method as it can handle both non-informative and informative missing values.","PeriodicalId":51166,"journal":{"name":"Statistical Papers","volume":"15 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142201029","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Maximum likelihood estimation for left-truncated log-logistic distributions with a given truncation point 对给定截断点的左截断对数-逻辑分布进行最大似然估计

IF 1.3 3区数学

Statistical Papers Pub Date : 2024-09-10 DOI: 10.1007/s00362-024-01603-8

Markus Kreer, Ayşe Kızılersü, Jake Guscott, Lukas Christopher Schmitz, Anthony W. Thomas

{"title":"Maximum likelihood estimation for left-truncated log-logistic distributions with a given truncation point","authors":"Markus Kreer, Ayşe Kızılersü, Jake Guscott, Lukas Christopher Schmitz, Anthony W. Thomas","doi":"10.1007/s00362-024-01603-8","DOIUrl":"https://doi.org/10.1007/s00362-024-01603-8","url":null,"abstract":"For a sample (X_1, X_2,ldots X_N) of independent identically distributed copies of a log-logistically distributed random variable X the maximum likelihood estimation is analysed in detail if a left-truncation point (x_L>0) is introduced. Due to scaling properties it is sufficient to investigate the case (x_L=1). Here the corresponding maximum likelihood equations for a normalised sample (i.e. a sample divided by (x_L)) do not always possess a solution. A simple criterion guarantees the existence of a solution: Let (mathbb {E}(cdot )) denote the expectation induced by the normalised sample and denote by (beta _0=mathbb {E}(ln {X})^{-1}), the inverse value of expectation of the logarithm of the sampled random variable X (which is greater than (x_L=1)). If this value (beta _0) is bigger than a certain positive number (beta _C) then a solution of the maximum likelihood equation exists. Here the number (beta _C) is the unique solution of a moment equation,(mathbb {E}(X^{-beta _C})=frac{1}{2}). In the case of existence a profile likelihood function can be constructed and the optimisation problem is reduced to one dimension leading to a robust numerical algorithm. When the maximum likelihood equations do not admit a solution for certain data samples, it is shown that the Pareto distribution is the (L^1)-limit of the degenerated left-truncated log-logistic distribution, where (L^1(mathbb {R}^+)) is the usual Banach space of functions whose absolute value is Lebesgue-integrable. A large sample analysis showing consistency and asymptotic normality complements our analysis. Finally, two applications to real world data are presented.","PeriodicalId":51166,"journal":{"name":"Statistical Papers","volume":"4 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142201030","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Confidence bounds for compound Poisson process 复合泊松过程的置信区间

IF 1.3 3区数学

Statistical Papers Pub Date : 2024-09-05 DOI: 10.1007/s00362-024-01604-7

Marek Skarupski, Qinhao Wu

引用次数: 0