Rafael Weißbach, Achim Dörre, Dominik Wied, Gabriele Doblhammer, Anne Fink
{"title":"Left-truncated health insurance claims data: theoretical review and empirical application","authors":"Rafael Weißbach, Achim Dörre, Dominik Wied, Gabriele Doblhammer, Anne Fink","doi":"10.1007/s10182-023-00471-1","DOIUrl":"10.1007/s10182-023-00471-1","url":null,"abstract":"<div><p>From the inventory of the health insurer AOK in 2004, we draw a sample of a quarter million people and follow each person’s health claims continuously until 2013. Our aim is to estimate the effect of a stroke on the dementia onset probability for Germans born in the first half of the 20th century. People deceased before 2004 are randomly left-truncated, and especially their number is unknown. Filtrations, modelling the missing data, enable circumventing the unknown number of truncated persons by using a conditional likelihood. Dementia onset after 2013 is a fixed right-censoring event. For each observed health history, Jacod’s formula yields its conditional likelihood contribution. Asymptotic normality of the estimated intensities is derived, related to a sample size definition including the number of truncated people. The standard error results from the asymptotic normality and is easily computable, despite the unknown sample size. The claims data reveal that after a stroke, with time measured in years, the intensity of dementia onset increases from 0.02 to 0.07. Using the independence of the two estimated intensities, a 95% confidence interval for their difference is [0.053, 0.057]. The effect halves when we extend the analysis to an age-inhomogeneous model, but does not change further when we additionally adjust for multi-morbidity.</p></div>","PeriodicalId":55446,"journal":{"name":"Asta-Advances in Statistical Analysis","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2023-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10182-023-00471-1.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42282189","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Statistical guarantees for sparse deep learning","authors":"Johannes Lederer","doi":"10.1007/s10182-022-00467-3","DOIUrl":"10.1007/s10182-022-00467-3","url":null,"abstract":"<div><p>Neural networks are becoming increasingly popular in applications, but our mathematical understanding of their potential and limitations is still limited. In this paper, we further this understanding by developing statistical guarantees for sparse deep learning. In contrast to previous work, we consider different types of sparsity, such as few active connections, few active nodes, and other norm-based types of sparsity. Moreover, our theories cover important aspects that previous theories have neglected, such as multiple outputs, regularization, and <span>(ell_{2})</span>-loss. The guarantees have a mild dependence on network widths and depths, which means that they support the application of sparse but wide and deep networks from a statistical perspective. Some of the concepts and tools that we use in our derivations are uncommon in deep learning and, hence, might be of additional interest.</p></div>","PeriodicalId":55446,"journal":{"name":"Asta-Advances in Statistical Analysis","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2023-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10182-022-00467-3.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136118419","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Felipe Osorio, Manuel Galea, Claudio Henríquez, Reinaldo Arellano-Valle
{"title":"Addressing non-normality in multivariate analysis using the t-distribution","authors":"Felipe Osorio, Manuel Galea, Claudio Henríquez, Reinaldo Arellano-Valle","doi":"10.1007/s10182-022-00468-2","DOIUrl":"10.1007/s10182-022-00468-2","url":null,"abstract":"<div><p>The main aim of this paper is to propose a set of tools for assessing non-normality taking into consideration the class of multivariate <i>t</i>-distributions. Assuming second moment existence, we consider a reparameterized version of the usual <i>t</i> distribution, so that the scale matrix coincides with covariance matrix of the distribution. We use the local influence procedure and the Kullback–Leibler divergence measure to propose quantitative methods to evaluate deviations from the normality assumption. In addition, the possible non-normality due to the presence of both skewness and heavy tails is also explored. Our findings based on two real datasets are complemented by a simulation study to evaluate the performance of the proposed methodology on finite samples.</p></div>","PeriodicalId":55446,"journal":{"name":"Asta-Advances in Statistical Analysis","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2023-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46365758","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Bayesian ridge regression for survival data based on a vine copula-based prior","authors":"Hirofumi Michimae, Takeshi Emura","doi":"10.1007/s10182-022-00466-4","DOIUrl":"10.1007/s10182-022-00466-4","url":null,"abstract":"<div><p>Ridge regression estimators can be interpreted as a Bayesian posterior mean (or mode) when the regression coefficients follow multivariate normal prior. However, the multivariate normal prior may not give efficient posterior estimates for regression coefficients, especially in the presence of interaction terms. In this paper, the vine copula-based priors are proposed for Bayesian ridge estimators under the Cox proportional hazards model. The semiparametric Cox models are built on the posterior density under two likelihoods: Cox’s partial likelihood and the full likelihood under the gamma process prior. The simulations show that the full likelihood is generally more efficient and stable for estimating regression coefficients than the partial likelihood. We also show via simulations and a data example that the Archimedean copula priors (the Clayton and Gumbel copula) are superior to the multivariate normal prior and the Gaussian copula prior.</p></div>","PeriodicalId":55446,"journal":{"name":"Asta-Advances in Statistical Analysis","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2022-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47123911","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Hedonic pricing modelling with unstructured predictors: an application to Italian Fashion Industry","authors":"Federico Crescenzi","doi":"10.1007/s10182-022-00465-5","DOIUrl":"10.1007/s10182-022-00465-5","url":null,"abstract":"<div><p>This study proposes a comparison of hedonic pricing models that use attributes obtained by featurizing text. We collected prices of items sold on the websites of five famous fashion producers in order to estimate hedonic pricing models that leverage the information contained in product descriptions. We mapped product descriptions to a high-dimensional feature space and compared predictive accuracy and variable selection properties of some statistical estimators that leverage sparse modelling, topic modelling and aggregated predictors, to test whether better predictive accuracy comes with an empirically consistent selection of attributes. We call this approach Hedonic Text-Regression modelling. Its novelty is that by using attributes obtained by text-mining of product descriptions, we obtain an estimate of the implicit price of the words contained therein. Empirically, all the proposed models outperformed the traditional hedonic pricing model in terms of predictive accuracy, while also providing consistent variable selection.</p></div>","PeriodicalId":55446,"journal":{"name":"Asta-Advances in Statistical Analysis","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2022-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44348728","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Panagiota Filippou, Giampiero Marra, Rosalba Radice, David Zimmer
{"title":"Estimating the Impact of Medical Care Usage on Work Absenteeism by a Trivariate Probit Model with Two Binary Endogenous Explanatory Variables","authors":"Panagiota Filippou, Giampiero Marra, Rosalba Radice, David Zimmer","doi":"10.1007/s10182-022-00464-6","DOIUrl":"10.1007/s10182-022-00464-6","url":null,"abstract":"<div><p>The aim of this paper is to estimate the effects of seeking medical care on missing work. Specifically, our case study explores the question: Does visiting a medical provider cause an employee to miss work? To address this, we employ a model that can consistently estimate the impacts of two endogenous binary regressors. The model is based on three equations connected via a multivariate Gaussian distribution, which makes it possible to model the correlations among the equations, hence accounting for unobserved heterogeneity. Parameter estimation is reliably carried out via a trust region algorithm with analytical derivative information. We find that, observationally, having a curative visit associates with a nearly 80% increase in the probability of missing work, while having a preventive visit correlates with a smaller 13% increase in the likelihood of missing work. However, after addressing potential endogeneity, neither type of visit appears to significantly relate to missing work. That finding also applies to visits that occur during the previous year. Therefore, we conclude that the observed links between medical usage and absenteeism derive from unobserved heterogeneity, rather than direct causal channels. The modeling framework is available through the <span>R</span> package <span>GJRM</span>.</p></div>","PeriodicalId":55446,"journal":{"name":"Asta-Advances in Statistical Analysis","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2022-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42881312","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Control charts for measurement error models","authors":"Vasyl Golosnoy, Benno Hildebrandt, Steffen Köhler, Wolfgang Schmid, Miriam Isabel Seifert","doi":"10.1007/s10182-022-00462-8","DOIUrl":"10.1007/s10182-022-00462-8","url":null,"abstract":"<div><p>We consider a linear measurement error model (MEM) with AR(1) process in the state equation which is widely used in applied research. This MEM could be equivalently re-written as ARMA(1,1) process, where the MA(1) parameter is related to the variance of measurement errors. As the MA(1) parameter is of essential importance for these linear MEMs, it is of much relevance to provide instruments for online monitoring in order to detect its possible changes. In this paper we develop control charts for online detection of such changes, i.e., from AR(1) to ARMA(1,1) and vice versa, as soon as they occur. For this purpose, we elaborate on both cumulative sum (CUSUM) and exponentially weighted moving average (EWMA) control charts and investigate their performance in a Monte Carlo simulation study. The empirical illustration of our approach is conducted based on time series of daily realized volatilities.\u0000</p></div>","PeriodicalId":55446,"journal":{"name":"Asta-Advances in Statistical Analysis","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2022-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9533293/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"33498201","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Sieve bootstrapping the memory parameter in long-range dependent stationary functional time series","authors":"Han Lin Shang","doi":"10.1007/s10182-022-00463-7","DOIUrl":"10.1007/s10182-022-00463-7","url":null,"abstract":"<div><p>We consider a sieve bootstrap procedure to quantify the estimation uncertainty of long-memory parameters in stationary functional time series. We use a semiparametric local Whittle estimator to estimate the long-memory parameter. In the local Whittle estimator, discrete Fourier transform and periodogram are constructed from the first set of principal component scores via a functional principal component analysis. The sieve bootstrap procedure uses a general vector autoregressive representation of the estimated principal component scores. It generates bootstrap replicates that adequately mimic the dependence structure of the underlying stationary process. We first compute the estimated first set of principal component scores for each bootstrap replicate and then apply the semiparametric local Whittle estimator to estimate the memory parameter. By taking quantiles of the estimated memory parameters from these bootstrap replicates, we can nonparametrically construct confidence intervals of the long-memory parameter. As measured by coverage probability differences between the empirical and nominal coverage probabilities at three levels of significance, we demonstrate the advantage of using the sieve bootstrap compared to the asymptotic confidence intervals based on normality.</p></div>","PeriodicalId":55446,"journal":{"name":"Asta-Advances in Statistical Analysis","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10182-022-00463-7.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46807934","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Distributional properties of continuous time processes: from CIR to bates","authors":"Ostap Okhrin, Michael Rockinger, Manuel Schmid","doi":"10.1007/s10182-022-00459-3","DOIUrl":"10.1007/s10182-022-00459-3","url":null,"abstract":"<div><p>In this paper, we compute closed-form expressions of moments and comoments for the CIR process which allows us to provide a new construction of the transition probability density based on a moment argument that differs from the historic approach. For Bates’ model with stochastic volatility and jumps, we show that finite difference approximations of higher moments such as the skewness and the kurtosis are unstable and, as a remedy, provide exact analytic formulas for log-returns. Our approach does not assume a constant mean for log-price differentials but correctly incorporates volatility resulting from Ito’s lemma. We also provide R, MATLAB, and Mathematica modules with exact implementations of the theoretical conditional and unconditional moments. These modules should prove useful for empirical research.</p></div>","PeriodicalId":55446,"journal":{"name":"Asta-Advances in Statistical Analysis","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2022-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10182-022-00459-3.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50046215","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Hierarchical disjoint principal component analysis","authors":"Carlo Cavicchia, Maurizio Vichi, Giorgia Zaccaria","doi":"10.1007/s10182-022-00458-4","DOIUrl":"10.1007/s10182-022-00458-4","url":null,"abstract":"<div><p>Dimension reduction, by means of Principal Component Analysis (PCA), is often employed to obtain a reduced set of components preserving the largest possible part of the total variance of the observed variables. Several methodologies have been proposed either to improve the interpretation of PCA results (e.g., by means of orthogonal, oblique rotations, shrinkage methods), or to model oblique components or factors with a hierarchical structure, such as in Bi-factor and High-Order Factor analyses. In this paper, we propose a new methodology, called Hierarchical Disjoint Principal Component Analysis (HierDPCA), that aims at building a hierarchy of disjoint principal components of maximum variance associated with disjoint groups of observed variables, from <i>Q</i> up to a unique, general one. HierDPCA also allows choosing the type of the relationship among disjoint principal components of two sequential levels, from the lowest upwards, by testing the component correlation per level and changing from a reflective to a formative approach when this correlation turns out to be not statistically significant. The methodology is formulated in a semi-parametric least-squares framework and a coordinate descent algorithm is proposed to estimate the model parameters. A simulation study and two real applications are illustrated to highlight the empirical properties of the proposed methodology.</p></div>","PeriodicalId":55446,"journal":{"name":"Asta-Advances in Statistical Analysis","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2022-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10182-022-00458-4.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42994839","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}