Asta-Advances in Statistical Analysis最新文献

筛选
英文 中文
Lasso-based variable selection methods in text regression: the case of short texts 文本回归中基于Lasso的变量选择方法:以短文本为例
IF 1.4 4区 数学
Asta-Advances in Statistical Analysis Pub Date : 2023-03-20 DOI: 10.1007/s10182-023-00472-0
Marzia Freo, Alessandra Luati
{"title":"Lasso-based variable selection methods in text regression: the case of short texts","authors":"Marzia Freo,&nbsp;Alessandra Luati","doi":"10.1007/s10182-023-00472-0","DOIUrl":"10.1007/s10182-023-00472-0","url":null,"abstract":"<div><p>Communication through websites is often characterised by short texts, made of few words, such as image captions or tweets. This paper explores the class of supervised learning methods for the analysis of short texts, as an alternative to unsupervised methods, widely employed to infer topics from structured texts. The aim is to assess the effectiveness of text data in social sciences, when they are used as explanatory variables in regression models. To this purpose, we compare different variable selection procedures when text regression models are fitted to real, short, text data. We discuss the results obtained by several variants of lasso, screening-based methods and randomisation-based models, such as sure independence screening and stability selection, in terms of number and importance of selected variables, assessed through goodness-of-fit measures, inclusion frequency and model class reliance. Latent Dirichlet allocation results are also considered as a term of comparison. Our perspective is primarily empirical and our starting point is the analysis of two real case studies, though bootstrap replications of each dataset are considered. The first case study aims at explaining price variations based on the information contained in the description of items on sale on e-commerce platforms. The second regards open questions in surveys on satisfaction ratings. The case studies are different in nature and representative of different kinds of short texts, as, in one case, a concise descriptive text is considered, whereas, in the other case, the text expresses an opinion.</p></div>","PeriodicalId":55446,"journal":{"name":"Asta-Advances in Statistical Analysis","volume":"108 1","pages":"69 - 99"},"PeriodicalIF":1.4,"publicationDate":"2023-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10182-023-00472-0.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43416978","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Correction: Bayesian ridge regression for survival data based on a vine copula-based prior 更正:基于藤蔓协整先验的生存数据贝叶斯脊回归
IF 1.4 4区 数学
Asta-Advances in Statistical Analysis Pub Date : 2023-02-14 DOI: 10.1007/s10182-023-00470-2
Hirofumi Michimae, Takeshi Emura
{"title":"Correction: Bayesian ridge regression for survival data based on a vine copula-based prior","authors":"Hirofumi Michimae,&nbsp;Takeshi Emura","doi":"10.1007/s10182-023-00470-2","DOIUrl":"10.1007/s10182-023-00470-2","url":null,"abstract":"","PeriodicalId":55446,"journal":{"name":"Asta-Advances in Statistical Analysis","volume":"108 3","pages":"703 - 703"},"PeriodicalIF":1.4,"publicationDate":"2023-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135797364","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A dynamic causal modeling of the second outbreak of COVID-19 in Italy 意大利 COVID-19 第二次爆发的动态因果模型。
IF 1.4 4区 数学
Asta-Advances in Statistical Analysis Pub Date : 2023-02-07 DOI: 10.1007/s10182-023-00469-9
Massimo Bilancia, Domenico Vitale, Fabio Manca, Paola Perchinunno, Luigi Santacroce
{"title":"A dynamic causal modeling of the second outbreak of COVID-19 in Italy","authors":"Massimo Bilancia,&nbsp;Domenico Vitale,&nbsp;Fabio Manca,&nbsp;Paola Perchinunno,&nbsp;Luigi Santacroce","doi":"10.1007/s10182-023-00469-9","DOIUrl":"10.1007/s10182-023-00469-9","url":null,"abstract":"<div><p>While the vaccination campaign against COVID-19 is having its positive impact, we retrospectively analyze the causal impact of some decisions made by the Italian government on the second outbreak of the SARS-CoV-2 pandemic in Italy, when no vaccine was available. First, we analyze the causal impact of reopenings after the first lockdown in 2020. In addition, we also analyze the impact of reopening schools in September 2020. Our results provide an unprecedented opportunity to evaluate the causal relationship between the relaxation of restrictions and the transmission in the community of a highly contagious respiratory virus that causes severe illness in the absence of prophylactic vaccination programs. We present a purely data-analytic approach based on a Bayesian methodology and discuss possible interpretations of the results obtained and implications for policy makers.</p></div>","PeriodicalId":55446,"journal":{"name":"Asta-Advances in Statistical Analysis","volume":"108 1","pages":"1 - 30"},"PeriodicalIF":1.4,"publicationDate":"2023-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9904269/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10712587","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Left-truncated health insurance claims data: theoretical review and empirical application 左截断医疗保险理赔数据:理论回顾与实证应用
IF 1.4 4区 数学
Asta-Advances in Statistical Analysis Pub Date : 2023-02-02 DOI: 10.1007/s10182-023-00471-1
Rafael Weißbach, Achim Dörre, Dominik Wied, Gabriele Doblhammer, Anne Fink
{"title":"Left-truncated health insurance claims data: theoretical review and empirical application","authors":"Rafael Weißbach,&nbsp;Achim Dörre,&nbsp;Dominik Wied,&nbsp;Gabriele Doblhammer,&nbsp;Anne Fink","doi":"10.1007/s10182-023-00471-1","DOIUrl":"10.1007/s10182-023-00471-1","url":null,"abstract":"<div><p>From the inventory of the health insurer AOK in 2004, we draw a sample of a quarter million people and follow each person’s health claims continuously until 2013. Our aim is to estimate the effect of a stroke on the dementia onset probability for Germans born in the first half of the 20th century. People deceased before 2004 are randomly left-truncated, and especially their number is unknown. Filtrations, modelling the missing data, enable circumventing the unknown number of truncated persons by using a conditional likelihood. Dementia onset after 2013 is a fixed right-censoring event. For each observed health history, Jacod’s formula yields its conditional likelihood contribution. Asymptotic normality of the estimated intensities is derived, related to a sample size definition including the number of truncated people. The standard error results from the asymptotic normality and is easily computable, despite the unknown sample size. The claims data reveal that after a stroke, with time measured in years, the intensity of dementia onset increases from 0.02 to 0.07. Using the independence of the two estimated intensities, a 95% confidence interval for their difference is [0.053, 0.057]. The effect halves when we extend the analysis to an age-inhomogeneous model, but does not change further when we additionally adjust for multi-morbidity.</p></div>","PeriodicalId":55446,"journal":{"name":"Asta-Advances in Statistical Analysis","volume":"108 1","pages":"31 - 68"},"PeriodicalIF":1.4,"publicationDate":"2023-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10182-023-00471-1.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42282189","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Statistical guarantees for sparse deep learning 稀疏深度学习的统计保障
IF 1.4 4区 数学
Asta-Advances in Statistical Analysis Pub Date : 2023-01-24 DOI: 10.1007/s10182-022-00467-3
Johannes Lederer
{"title":"Statistical guarantees for sparse deep learning","authors":"Johannes Lederer","doi":"10.1007/s10182-022-00467-3","DOIUrl":"10.1007/s10182-022-00467-3","url":null,"abstract":"<div><p>Neural networks are becoming increasingly popular in applications, but our mathematical understanding of their potential and limitations is still limited. In this paper, we further this understanding by developing statistical guarantees for sparse deep learning. In contrast to previous work, we consider different types of sparsity, such as few active connections, few active nodes, and other norm-based types of sparsity. Moreover, our theories cover important aspects that previous theories have neglected, such as multiple outputs, regularization, and <span>(ell_{2})</span>-loss. The guarantees have a mild dependence on network widths and depths, which means that they support the application of sparse but wide and deep networks from a statistical perspective. Some of the concepts and tools that we use in our derivations are uncommon in deep learning and, hence, might be of additional interest.</p></div>","PeriodicalId":55446,"journal":{"name":"Asta-Advances in Statistical Analysis","volume":"108 2","pages":"231 - 258"},"PeriodicalIF":1.4,"publicationDate":"2023-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10182-022-00467-3.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136118419","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Addressing non-normality in multivariate analysis using the t-distribution 利用t分布解决多元分析中的非正态性
IF 1.4 4区 数学
Asta-Advances in Statistical Analysis Pub Date : 2023-01-21 DOI: 10.1007/s10182-022-00468-2
Felipe Osorio, Manuel Galea, Claudio Henríquez, Reinaldo Arellano-Valle
{"title":"Addressing non-normality in multivariate analysis using the t-distribution","authors":"Felipe Osorio,&nbsp;Manuel Galea,&nbsp;Claudio Henríquez,&nbsp;Reinaldo Arellano-Valle","doi":"10.1007/s10182-022-00468-2","DOIUrl":"10.1007/s10182-022-00468-2","url":null,"abstract":"<div><p>The main aim of this paper is to propose a set of tools for assessing non-normality taking into consideration the class of multivariate <i>t</i>-distributions. Assuming second moment existence, we consider a reparameterized version of the usual <i>t</i> distribution, so that the scale matrix coincides with covariance matrix of the distribution. We use the local influence procedure and the Kullback–Leibler divergence measure to propose quantitative methods to evaluate deviations from the normality assumption. In addition, the possible non-normality due to the presence of both skewness and heavy tails is also explored. Our findings based on two real datasets are complemented by a simulation study to evaluate the performance of the proposed methodology on finite samples.</p></div>","PeriodicalId":55446,"journal":{"name":"Asta-Advances in Statistical Analysis","volume":"107 4","pages":"785 - 813"},"PeriodicalIF":1.4,"publicationDate":"2023-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46365758","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bayesian ridge regression for survival data based on a vine copula-based prior 基于vine copula先验的生存数据贝叶斯脊回归
IF 1.4 4区 数学
Asta-Advances in Statistical Analysis Pub Date : 2022-12-30 DOI: 10.1007/s10182-022-00466-4
Hirofumi Michimae, Takeshi Emura
{"title":"Bayesian ridge regression for survival data based on a vine copula-based prior","authors":"Hirofumi Michimae,&nbsp;Takeshi Emura","doi":"10.1007/s10182-022-00466-4","DOIUrl":"10.1007/s10182-022-00466-4","url":null,"abstract":"<div><p>Ridge regression estimators can be interpreted as a Bayesian posterior mean (or mode) when the regression coefficients follow multivariate normal prior. However, the multivariate normal prior may not give efficient posterior estimates for regression coefficients, especially in the presence of interaction terms. In this paper, the vine copula-based priors are proposed for Bayesian ridge estimators under the Cox proportional hazards model. The semiparametric Cox models are built on the posterior density under two likelihoods: Cox’s partial likelihood and the full likelihood under the gamma process prior. The simulations show that the full likelihood is generally more efficient and stable for estimating regression coefficients than the partial likelihood. We also show via simulations and a data example that the Archimedean copula priors (the Clayton and Gumbel copula) are superior to the multivariate normal prior and the Gaussian copula prior.</p></div>","PeriodicalId":55446,"journal":{"name":"Asta-Advances in Statistical Analysis","volume":"107 4","pages":"755 - 784"},"PeriodicalIF":1.4,"publicationDate":"2022-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47123911","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hedonic pricing modelling with unstructured predictors: an application to Italian Fashion Industry 具有非结构化预测因子的Hedonic定价模型在意大利时装业中的应用
IF 1.4 4区 数学
Asta-Advances in Statistical Analysis Pub Date : 2022-12-13 DOI: 10.1007/s10182-022-00465-5
Federico Crescenzi
{"title":"Hedonic pricing modelling with unstructured predictors: an application to Italian Fashion Industry","authors":"Federico Crescenzi","doi":"10.1007/s10182-022-00465-5","DOIUrl":"10.1007/s10182-022-00465-5","url":null,"abstract":"<div><p>This study proposes a comparison of hedonic pricing models that use attributes obtained by featurizing text. We collected prices of items sold on the websites of five famous fashion producers in order to estimate hedonic pricing models that leverage the information contained in product descriptions. We mapped product descriptions to a high-dimensional feature space and compared predictive accuracy and variable selection properties of some statistical estimators that leverage sparse modelling, topic modelling and aggregated predictors, to test whether better predictive accuracy comes with an empirically consistent selection of attributes. We call this approach Hedonic Text-Regression modelling. Its novelty is that by using attributes obtained by text-mining of product descriptions, we obtain an estimate of the implicit price of the words contained therein. Empirically, all the proposed models outperformed the traditional hedonic pricing model in terms of predictive accuracy, while also providing consistent variable selection.</p></div>","PeriodicalId":55446,"journal":{"name":"Asta-Advances in Statistical Analysis","volume":"107 4","pages":"733 - 753"},"PeriodicalIF":1.4,"publicationDate":"2022-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44348728","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Estimating the Impact of Medical Care Usage on Work Absenteeism by a Trivariate Probit Model with Two Binary Endogenous Explanatory Variables 用二元内生解释变量的三元Probit模型估计医疗服务使用对工作缺勤的影响
IF 1.4 4区 数学
Asta-Advances in Statistical Analysis Pub Date : 2022-10-18 DOI: 10.1007/s10182-022-00464-6
Panagiota Filippou, Giampiero Marra, Rosalba Radice, David Zimmer
{"title":"Estimating the Impact of Medical Care Usage on Work Absenteeism by a Trivariate Probit Model with Two Binary Endogenous Explanatory Variables","authors":"Panagiota Filippou,&nbsp;Giampiero Marra,&nbsp;Rosalba Radice,&nbsp;David Zimmer","doi":"10.1007/s10182-022-00464-6","DOIUrl":"10.1007/s10182-022-00464-6","url":null,"abstract":"<div><p>The aim of this paper is to estimate the effects of seeking medical care on missing work. Specifically, our case study explores the question: Does visiting a medical provider cause an employee to miss work? To address this, we employ a model that can consistently estimate the impacts of two endogenous binary regressors. The model is based on three equations connected via a multivariate Gaussian distribution, which makes it possible to model the correlations among the equations, hence accounting for unobserved heterogeneity. Parameter estimation is reliably carried out via a trust region algorithm with analytical derivative information. We find that, observationally, having a curative visit associates with a nearly 80% increase in the probability of missing work, while having a preventive visit correlates with a smaller 13% increase in the likelihood of missing work. However, after addressing potential endogeneity, neither type of visit appears to significantly relate to missing work. That finding also applies to visits that occur during the previous year. Therefore, we conclude that the observed links between medical usage and absenteeism derive from unobserved heterogeneity, rather than direct causal channels. The modeling framework is available through the <span>R</span> package <span>GJRM</span>.</p></div>","PeriodicalId":55446,"journal":{"name":"Asta-Advances in Statistical Analysis","volume":"107 4","pages":"713 - 731"},"PeriodicalIF":1.4,"publicationDate":"2022-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42881312","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Control charts for measurement error models 测量误差模型控制图
IF 1.4 4区 数学
Asta-Advances in Statistical Analysis Pub Date : 2022-10-05 DOI: 10.1007/s10182-022-00462-8
Vasyl Golosnoy, Benno Hildebrandt, Steffen Köhler, Wolfgang Schmid, Miriam Isabel Seifert
{"title":"Control charts for measurement error models","authors":"Vasyl Golosnoy,&nbsp;Benno Hildebrandt,&nbsp;Steffen Köhler,&nbsp;Wolfgang Schmid,&nbsp;Miriam Isabel Seifert","doi":"10.1007/s10182-022-00462-8","DOIUrl":"10.1007/s10182-022-00462-8","url":null,"abstract":"<div><p>We consider a linear measurement error model (MEM) with AR(1) process in the state equation which is widely used in applied research. This MEM could be equivalently re-written as ARMA(1,1) process, where the MA(1) parameter is related to the variance of measurement errors. As the MA(1) parameter is of essential importance for these linear MEMs, it is of much relevance to provide instruments for online monitoring in order to detect its possible changes. In this paper we develop control charts for online detection of such changes, i.e., from AR(1) to ARMA(1,1) and vice versa, as soon as they occur. For this purpose, we elaborate on both cumulative sum (CUSUM) and exponentially weighted moving average (EWMA) control charts and investigate their performance in a Monte Carlo simulation study. The empirical illustration of our approach is conducted based on time series of daily realized volatilities.\u0000</p></div>","PeriodicalId":55446,"journal":{"name":"Asta-Advances in Statistical Analysis","volume":"107 4","pages":"693 - 712"},"PeriodicalIF":1.4,"publicationDate":"2022-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9533293/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"33498201","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信