Asta-Advances in Statistical Analysis最新文献_第5页

A family of consistent normally distributed tests for Poissonity Poissonity的一致正态分布检验族

IF 1.4 4区数学

Asta-Advances in Statistical Analysis Pub Date : 2023-06-15 DOI: 10.1007/s10182-023-00478-8

Antonio Di Noia, Marzia Marcheselli, Caterina Pisani, Luca Pratelli

引用次数: 0

Correlation-type goodness-of-fit tests based on independence characterizations 基于独立性特征的相关型拟合优度检验

IF 1.4 4区数学

Asta-Advances in Statistical Analysis Pub Date : 2023-05-04 DOI: 10.1007/s10182-023-00475-x

Katarina Halaj, Bojana Milošević, Marko Obradović, M. Dolores Jiménez-Gamero

引用次数: 0

Conditional feature importance for mixed data 混合数据的条件特征重要性

IF 1.4 4区数学

Asta-Advances in Statistical Analysis Pub Date : 2023-04-29 DOI: 10.1007/s10182-023-00477-9

Kristin Blesch, David S. Watson, Marvin N. Wright

{"title":"Conditional feature importance for mixed data","authors":"Kristin Blesch, David S. Watson, Marvin N. Wright","doi":"10.1007/s10182-023-00477-9","DOIUrl":"10.1007/s10182-023-00477-9","url":null,"abstract":"<div><p>Despite the popularity of feature importance (FI) measures in interpretable machine learning, the statistical adequacy of these methods is rarely discussed. From a statistical perspective, a major distinction is between analysing a variable’s importance before and after adjusting for covariates—i.e., between <i>marginal</i> and <i>conditional</i> measures. Our work draws attention to this rarely acknowledged, yet crucial distinction and showcases its implications. We find that few methods are available for testing conditional FI and practitioners have hitherto been severely restricted in method application due to mismatched data requirements. Most real-world data exhibits complex feature dependencies and incorporates both continuous and categorical features (i.e., mixed data). Both properties are oftentimes neglected by conditional FI measures. To fill this gap, we propose to combine the conditional predictive impact (CPI) framework with sequential knockoff sampling. The CPI enables conditional FI measurement that controls for any feature dependencies by sampling valid knockoffs—hence, generating synthetic data with similar statistical properties—for the data to be analysed. Sequential knockoffs were deliberately designed to handle mixed data and thus allow us to extend the CPI approach to such datasets. We demonstrate through numerous simulations and a real-world example that our proposed workflow controls type I error, achieves high power, and is in-line with results given by other conditional FI measures, whereas marginal FI metrics can result in misleading interpretations. Our findings highlight the necessity of developing statistically adequate, specialized methods for mixed data.</p></div>","PeriodicalId":55446,"journal":{"name":"Asta-Advances in Statistical Analysis","volume":"108 2","pages":"259 - 278"},"PeriodicalIF":1.4,"publicationDate":"2023-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10182-023-00477-9.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77609605","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Clustering of extreme values: estimation and application 极值聚类：估计和应用。

IF 1.4 4区数学

Asta-Advances in Statistical Analysis Pub Date : 2023-03-31 DOI: 10.1007/s10182-023-00474-y

Marta Ferreira

{"title":"Clustering of extreme values: estimation and application","authors":"Marta Ferreira","doi":"10.1007/s10182-023-00474-y","DOIUrl":"10.1007/s10182-023-00474-y","url":null,"abstract":"<div><p>The extreme value theory (EVT) encompasses a set of methods that allow inferring about the risk inherent to various phenomena in the scope of economic, financial, actuarial, environmental, hydrological, climatic sciences, as well as various areas of engineering. In many situations the clustering effect of high values may have an impact on the risk of occurrence of extreme phenomena. For example, extreme temperatures that last over time and result in drought situations, the permanence of intense rains leading to floods, stock markets in successive falls and consequent catastrophic losses. The extremal index is a measure of EVT associated with the degree of clustering of extreme values. In many situations, and under certain conditions, it corresponds to the arithmetic inverse of the average size of high-value clusters. The estimation of the extremal index generally entails two sources of uncertainty: the level at which high observations are considered and the identification of clusters. There are several contributions in the literature on the estimation of the extremal index, including methodologies to overcome the aforementioned sources of uncertainty. In this work we will revisit several existing estimators, apply automatic choice methods, both for the threshold and for the clustering parameter, and compare the performance of the methods. We will end with an application to meteorological data.</p></div>","PeriodicalId":55446,"journal":{"name":"Asta-Advances in Statistical Analysis","volume":"108 1","pages":"101 - 125"},"PeriodicalIF":1.4,"publicationDate":"2023-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10064624/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9769919","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A spatial semiparametric M-quantile regression for hedonic price modelling 特征价格模型的空间半参数M-分位数回归

IF 1.4 4区数学

Asta-Advances in Statistical Analysis Pub Date : 2023-03-30 DOI: 10.1007/s10182-023-00476-w

Francesco Schirripa Spagnolo, Riccardo Borgoni, Antonella Carcagnì, Alessandra Michelangeli, Nicola Salvati

{"title":"A spatial semiparametric M-quantile regression for hedonic price modelling","authors":"Francesco Schirripa Spagnolo, Riccardo Borgoni, Antonella Carcagnì, Alessandra Michelangeli, Nicola Salvati","doi":"10.1007/s10182-023-00476-w","DOIUrl":"10.1007/s10182-023-00476-w","url":null,"abstract":"<div><p>This paper proposes an M-quantile regression approach to address the heterogeneity of the housing market in a modern European city. We show how M-quantile modelling is a rich and flexible tool for empirical market price data analysis, allowing us to obtain a robust estimation of the hedonic price function whilst accounting for different sources of heterogeneity in market prices. The suggested methodology can generally be used to analyse nonlinear interactions between prices and predictors. In particular, we develop a spatial semiparametric M-quantile model to capture both the potential nonlinear effects of the cultural environment on pricing and spatial trends. In both cases, nonlinearity is introduced into the model using appropriate bases functions. We show how the implicit price associated with the variable that measures cultural amenities can be determined in this semiparametric framework. Our findings show that the effect of several housing attributes and urban amenities differs significantly across the response distribution, suggesting that buyers of lower-priced properties behave differently than buyers of higher-priced properties.\u0000</p></div>","PeriodicalId":55446,"journal":{"name":"Asta-Advances in Statistical Analysis","volume":"108 1","pages":"159 - 183"},"PeriodicalIF":1.4,"publicationDate":"2023-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10182-023-00476-w.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41823433","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Robust estimation of fixed effect parameters and variances of linear mixed models: the minimum density power divergence approach 线性混合模型固定效应参数和方差的稳健估计:最小密度功率散度法

IF 1.4 4区数学

Asta-Advances in Statistical Analysis Pub Date : 2023-03-29 DOI: 10.1007/s10182-023-00473-z

Giovanni Saraceno, Abhik Ghosh, Ayanendranath Basu, Claudio Agostinelli

{"title":"Robust estimation of fixed effect parameters and variances of linear mixed models: the minimum density power divergence approach","authors":"Giovanni Saraceno, Abhik Ghosh, Ayanendranath Basu, Claudio Agostinelli","doi":"10.1007/s10182-023-00473-z","DOIUrl":"10.1007/s10182-023-00473-z","url":null,"abstract":"<div><p>Many real-life data sets can be analyzed using linear mixed models (LMMs). Since these are ordinarily based on normality assumptions, under small deviations from the model the inference can be highly unstable when the associated parameters are estimated by classical methods. On the other hand, the density power divergence (DPD) family, which measures the discrepancy between two probability density functions, has been successfully used to build robust estimators with high stability associated with minimal loss in efficiency. Here, we develop the minimum DPD estimator (MDPDE) for independent but non-identically distributed observations for LMMs according to the variance components model. We prove that the theoretical properties hold, including consistency and asymptotic normality of the estimators. The influence function and sensitivity measures are computed to explore the robustness properties. As a data-based choice of the MDPDE tuning parameter <span>(alpha)</span> is very important, we propose two candidates as “optimal” choices, where optimality is in the sense of choosing the strongest downweighting that is necessary for the particular data set. We conduct a simulation study comparing the proposed MDPDE, for different values of <span>(alpha)</span>, with S-estimators, M-estimators and the classical maximum likelihood estimator, considering different levels of contamination. Finally, we illustrate the performance of our proposal on a real-data example.</p></div>","PeriodicalId":55446,"journal":{"name":"Asta-Advances in Statistical Analysis","volume":"108 1","pages":"127 - 157"},"PeriodicalIF":1.4,"publicationDate":"2023-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10182-023-00473-z.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47139711","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Lasso-based variable selection methods in text regression: the case of short texts 文本回归中基于Lasso的变量选择方法：以短文本为例

IF 1.4 4区数学

Asta-Advances in Statistical Analysis Pub Date : 2023-03-20 DOI: 10.1007/s10182-023-00472-0

Marzia Freo, Alessandra Luati

{"title":"Lasso-based variable selection methods in text regression: the case of short texts","authors":"Marzia Freo, Alessandra Luati","doi":"10.1007/s10182-023-00472-0","DOIUrl":"10.1007/s10182-023-00472-0","url":null,"abstract":"<div><p>Communication through websites is often characterised by short texts, made of few words, such as image captions or tweets. This paper explores the class of supervised learning methods for the analysis of short texts, as an alternative to unsupervised methods, widely employed to infer topics from structured texts. The aim is to assess the effectiveness of text data in social sciences, when they are used as explanatory variables in regression models. To this purpose, we compare different variable selection procedures when text regression models are fitted to real, short, text data. We discuss the results obtained by several variants of lasso, screening-based methods and randomisation-based models, such as sure independence screening and stability selection, in terms of number and importance of selected variables, assessed through goodness-of-fit measures, inclusion frequency and model class reliance. Latent Dirichlet allocation results are also considered as a term of comparison. Our perspective is primarily empirical and our starting point is the analysis of two real case studies, though bootstrap replications of each dataset are considered. The first case study aims at explaining price variations based on the information contained in the description of items on sale on e-commerce platforms. The second regards open questions in surveys on satisfaction ratings. The case studies are different in nature and representative of different kinds of short texts, as, in one case, a concise descriptive text is considered, whereas, in the other case, the text expresses an opinion.</p></div>","PeriodicalId":55446,"journal":{"name":"Asta-Advances in Statistical Analysis","volume":"108 1","pages":"69 - 99"},"PeriodicalIF":1.4,"publicationDate":"2023-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10182-023-00472-0.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43416978","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Correction: Bayesian ridge regression for survival data based on a vine copula-based prior 更正：基于藤蔓协整先验的生存数据贝叶斯脊回归

IF 1.4 4区数学

Asta-Advances in Statistical Analysis Pub Date : 2023-02-14 DOI: 10.1007/s10182-023-00470-2

Hirofumi Michimae, Takeshi Emura

引用次数: 0

A dynamic causal modeling of the second outbreak of COVID-19 in Italy 意大利 COVID-19 第二次爆发的动态因果模型。

IF 1.4 4区数学

Asta-Advances in Statistical Analysis Pub Date : 2023-02-07 DOI: 10.1007/s10182-023-00469-9

Massimo Bilancia, Domenico Vitale, Fabio Manca, Paola Perchinunno, Luigi Santacroce

引用次数: 0

Left-truncated health insurance claims data: theoretical review and empirical application 左截断医疗保险理赔数据:理论回顾与实证应用

IF 1.4 4区数学

Asta-Advances in Statistical Analysis Pub Date : 2023-02-02 DOI: 10.1007/s10182-023-00471-1

Rafael Weißbach, Achim Dörre, Dominik Wied, Gabriele Doblhammer, Anne Fink

{"title":"Left-truncated health insurance claims data: theoretical review and empirical application","authors":"Rafael Weißbach, Achim Dörre, Dominik Wied, Gabriele Doblhammer, Anne Fink","doi":"10.1007/s10182-023-00471-1","DOIUrl":"10.1007/s10182-023-00471-1","url":null,"abstract":"<div><p>From the inventory of the health insurer AOK in 2004, we draw a sample of a quarter million people and follow each person’s health claims continuously until 2013. Our aim is to estimate the effect of a stroke on the dementia onset probability for Germans born in the first half of the 20th century. People deceased before 2004 are randomly left-truncated, and especially their number is unknown. Filtrations, modelling the missing data, enable circumventing the unknown number of truncated persons by using a conditional likelihood. Dementia onset after 2013 is a fixed right-censoring event. For each observed health history, Jacod’s formula yields its conditional likelihood contribution. Asymptotic normality of the estimated intensities is derived, related to a sample size definition including the number of truncated people. The standard error results from the asymptotic normality and is easily computable, despite the unknown sample size. The claims data reveal that after a stroke, with time measured in years, the intensity of dementia onset increases from 0.02 to 0.07. Using the independence of the two estimated intensities, a 95% confidence interval for their difference is [0.053, 0.057]. The effect halves when we extend the analysis to an age-inhomogeneous model, but does not change further when we additionally adjust for multi-morbidity.</p></div>","PeriodicalId":55446,"journal":{"name":"Asta-Advances in Statistical Analysis","volume":"108 1","pages":"31 - 68"},"PeriodicalIF":1.4,"publicationDate":"2023-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10182-023-00471-1.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42282189","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0