{"title":"Machine Learning Approach for Analyzing Mixed Case Interval Censored Data with a Cured Subgroup.","authors":"Wisdom Aselisewine, Suvra Pal","doi":"10.1007/s10182-025-00544-3","DOIUrl":"10.1007/s10182-025-00544-3","url":null,"abstract":"<p><p>We introduce a novel two-component framework for analyzing mixed case interval censored (MCIC) data featuring a cured subgroup. In such data, the time-to-event is known only within certain intervals determined by multiple random examination time points. Moreover, a portion of the subjects will never experience the event. The first component of our model focuses on estimating the likelihood of being cured (incidence), departing from the conventional generalized linear model to adopt a more adaptable support vector machine (SVM) approach capable of accommodating complex or non-linear covariate effects. The second component addresses the survival distribution of the uncured individuals (latency) and employs a Cox proportional hazards structure to maintain the straightforward interpretation of covariate effects. We develop an expectation maximization algorithm, incorporating the Platt scaling method, to estimate the probability of being cured. Our simulation study demonstrates that our model outperforms both logit-based and spline-based models in capturing complex classification boundaries, leading to more accurate estimates of cured/uncured probabilities and enhanced predictive accuracy for cure. We emphasize that enhancing the estimation accuracy regarding incidence subsequently improves the estimation outcomes concerning latency. Finally, we illustrate the efficacy of our methodology by applying it to the NASA's Hypobaric Decompression Sickness Data.</p>","PeriodicalId":55446,"journal":{"name":"Asta-Advances in Statistical Analysis","volume":" ","pages":""},"PeriodicalIF":1.4,"publicationDate":"2025-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12514071/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145281887","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Robust corrected empirical likelihood for partially linear measurement error models","authors":"Huihui Sun, Qiang Liu, Yuying Jiang","doi":"10.1007/s10182-024-00518-x","DOIUrl":"10.1007/s10182-024-00518-x","url":null,"abstract":"<div><p>This paper considers a partially linear model in which the covariates of parametric part are measured with normal distributed errors. A newly robust corrected empirical likelihood procedure based on the corrected score function is proposed to attenuate the effects of measurement errors as well as outliers. What’s more, profit from the QR decomposition technique, the parametric and nonparametric components of the models can be estimated separately. The asymptotic properties of the proposed robust corrected empirical likelihood approach are established under some regularity conditions. Simulation studies are demonstrated to show that our proposed method performs well in finite samples. Boston housing price data are applied to illustrate the proposed estimation procedure.</p></div>","PeriodicalId":55446,"journal":{"name":"Asta-Advances in Statistical Analysis","volume":"109 2","pages":"337 - 361"},"PeriodicalIF":1.4,"publicationDate":"2025-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145168072","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On random coefficient INAR processes with long memory","authors":"Jan Beran, Frieder Droullier","doi":"10.1007/s10182-025-00523-8","DOIUrl":"10.1007/s10182-025-00523-8","url":null,"abstract":"<div><p>We consider random coefficient INAR(1) processes with a strongly dependent latent random coefficient process. It is shown that, in spite of its conditional Markovian structure, the unconditional process exhibits long-range dependence. Short-term prediction and estimation of parameters involved in the prediction are considered. Asymptotic rates of convergence are derived.</p></div>","PeriodicalId":55446,"journal":{"name":"Asta-Advances in Statistical Analysis","volume":"109 2","pages":"281 - 311"},"PeriodicalIF":1.4,"publicationDate":"2025-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10182-025-00523-8.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145168071","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Antonello Maruotti, Pierfrancesco Alaimo Di Loro, Cathleen Johnson
{"title":"Beyond catastrophic payments: modeling household health expenditure shares with endogenous selection","authors":"Antonello Maruotti, Pierfrancesco Alaimo Di Loro, Cathleen Johnson","doi":"10.1007/s10182-024-00519-w","DOIUrl":"10.1007/s10182-024-00519-w","url":null,"abstract":"<div><p>The primary purpose of this paper is to assess households’ burden due to out-of-pocket healthcare expenditures. These payments are modeled on a representative sample of 25668 Italian households as the fraction of out-of-pocket healthcare expenditures over the households’ capacity to pay. For this purpose, we propose extending the analysis of the so-called catastrophic payments by looking at the entire distribution of this ratio. We introduce a novel finite mixture regression able to capture different levels of heterogeneity in the data. By using such a model specification, the fairness of the Italian National Health Service and its determinants are investigated.</p></div>","PeriodicalId":55446,"journal":{"name":"Asta-Advances in Statistical Analysis","volume":"109 2","pages":"363 - 386"},"PeriodicalIF":1.4,"publicationDate":"2024-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145167896","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Nowcasting GDP using machine learning methods","authors":"Dennis Kant, Andreas Pick, Jasper de Winter","doi":"10.1007/s10182-024-00515-0","DOIUrl":"10.1007/s10182-024-00515-0","url":null,"abstract":"<div><p>This paper compares the ability of several econometric and machine learning methods to nowcast GDP in (pseudo) real-time. The analysis takes the example of Dutch GDP over the period 1992Q1–2018Q4 using a broad data set of monthly indicators. It discusses the forecast accuracy but also analyzes the use of information from the large data set of macroeconomic and financial predictors. We find that, on average, the random forest provides the most accurate forecast and nowcasts, whilst the dynamic factor model provides the most accurate backcasts.</p></div>","PeriodicalId":55446,"journal":{"name":"Asta-Advances in Statistical Analysis","volume":"109 1","pages":"1 - 24"},"PeriodicalIF":1.4,"publicationDate":"2024-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10182-024-00515-0.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143530002","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Seonghun Cho, Minsup Shin, Young Hyun Cho, Johan Lim
{"title":"Change point detection in high dimensional covariance matrix using Pillai’s statistics","authors":"Seonghun Cho, Minsup Shin, Young Hyun Cho, Johan Lim","doi":"10.1007/s10182-024-00516-z","DOIUrl":"10.1007/s10182-024-00516-z","url":null,"abstract":"<div><p>This research proposes a method to test and estimate change points in the covariance structure of high-dimensional multivariate series data. Our method uses the trace of the beta matrix, known as Pillai’s statistics, to test the change in covariance matrix at each time point. We study the asymptotic normality of Pillai’s statistics for testing the equality of two covariance matrices when both sample size and dimension increase at the same rate. We test the existence of a single change point in a given time period using Cauchy combination test, the test using an weighted sum of Cauchy transformed <i>p</i>-values, and estimate the change point as the point whose statistic is the greatest. To test and estimate multiple change points, we use the idea of the wild binary segmentation and repeatedly apply the procedure for a single change point to each segmented period until no significant change point exists. We numerically provide the size and power of our method. We finally apply our procedure to finding abnormal behavior in the investment of a private equity fund.</p></div>","PeriodicalId":55446,"journal":{"name":"Asta-Advances in Statistical Analysis","volume":"109 1","pages":"53 - 84"},"PeriodicalIF":1.4,"publicationDate":"2024-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10182-024-00516-z.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143530001","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On the equivalence of two mixture models for rating data","authors":"Matteo Ventura, Ambra Macis, Marica Manisera, Paola Zuccolotto","doi":"10.1007/s10182-024-00513-2","DOIUrl":"10.1007/s10182-024-00513-2","url":null,"abstract":"<div><p>Questionnaires are useful tool for exploring respondents’ perceptions through ratings, assumed to result from a latent decision process (DP). The DP varies when respondents rate on Likert or Semantic Differential scales. A possible paradigm to formalize the DP is based on the presence of a feeling and an uncertainty latent component, originally proposed as the foundations of the CUB (Combination of Uniform and shifted Binomial) class. It can be assumed that with Likert scales, respondents begin reasoning from the bottom, progressing upwards based on their sensations. Conversely, Semantic Differential scale users are assumed to start from the middle and move either upward or downward. The CUM (Combination of Uniform and Multinomial), a new model in the CUB class, derived from this DP, analyzes rating data on a Semantic Differential scale. This paper defines the concept of local and global unidirectional equivalence and studies, from an analytical point of view, the conditions under which CUB and CUM models generate identical theoretical probabilities, in order to enhance the interpretative understanding of the models.</p></div>","PeriodicalId":55446,"journal":{"name":"Asta-Advances in Statistical Analysis","volume":"109 2","pages":"387 - 411"},"PeriodicalIF":1.4,"publicationDate":"2024-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145170213","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Hidden-Markov models for ordinal time series","authors":"Christian H. Weiß, Osama Swidan","doi":"10.1007/s10182-024-00514-1","DOIUrl":"10.1007/s10182-024-00514-1","url":null,"abstract":"<div><p>A common approach for modeling categorical time series is Hidden-Markov models (HMMs), where the actual observations are assumed to depend on hidden states in their behavior and transitions. Such categorical HMMs are even applicable to nominal data but suffer from a large number of model parameters. In the ordinal case, however, the natural order among the categorical outcomes offers the potential to reduce the number of parameters while improving their interpretability at the same time. The class of ordinal HMMs proposed in this article link a latent-variable approach with categorical HMMs. They are characterized by parametric parsimony and allow the easy calculation of relevant stochastic properties, such as marginal and bivariate probabilities. These points are illustrated by numerical examples and simulation experiments, where the performance of maximum likelihood estimation is analyzed in finite samples. The developed methodology is applied to real-world data from a health application.</p></div>","PeriodicalId":55446,"journal":{"name":"Asta-Advances in Statistical Analysis","volume":"109 2","pages":"217 - 239"},"PeriodicalIF":1.4,"publicationDate":"2024-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10182-024-00514-1.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145165788","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Goodness-of-fit testing in bivariate count time series based on a bivariate dispersion index","authors":"Huiqiao Wang, Christian H. Weiß, Mingming Zhang","doi":"10.1007/s10182-024-00512-3","DOIUrl":"10.1007/s10182-024-00512-3","url":null,"abstract":"<div><p>A common choice for the marginal distribution of a bivariate count time series is the bivariate Poisson distribution. In practice, however, when the count data exhibit zero inflation, overdispersion or non-stationarity features, such that a marginal bivariate Poisson distribution is not suitable. To test the discrepancy between the actual count data and the bivariate Poisson distribution, we propose a new goodness-of-fit test based on a bivariate dispersion index. The asymptotic distribution of the test statistic under the null hypothesis of a first-order bivariate integer-valued autoregressive model with marginal bivariate Poisson distribution is derived, and the finite-sample performance of the goodness-of-fit test is analyzed by simulations. A real-data example illustrate the application and usefulness of the test in practice.</p></div>","PeriodicalId":55446,"journal":{"name":"Asta-Advances in Statistical Analysis","volume":"109 2","pages":"241 - 279"},"PeriodicalIF":1.4,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10182-024-00512-3.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142253230","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Bayesian joint relatively quantile regression of latent ordinal multivariate linear models with application to multirater agreement analysis","authors":"YuZhu Tian, ChunHo Wu, ManLai Tang, MaoZai Tian","doi":"10.1007/s10182-024-00509-y","DOIUrl":"10.1007/s10182-024-00509-y","url":null,"abstract":"<div><p>In this paper, we propose a Bayesian quantile regression (QR) approach to jointly model multivariate ordinal data. Firstly, a multivariate latent variable model is used to link the multivariate ordinal data and latent continuous responses and the multivariate asymmetric Laplace (MAL) distribution is employed to construct the joint QR-based working likelihood for the considered model. Secondly, adaptive-<span>(L_{1/2})</span> penalization priors of regression parameters are incorporated into the working likelihood to implement high-dimensional Bayesian joint QR inference. Markov Chain Monte Carlo (MCMC) algorithm is utilized to derive the fully conditional posterior distributions of all parameters. Thirdly, Bayesian joint relatively QR estimation approach is recommended to result in more efficient estimation results. Finally, Monte Carlo simulation studies and a real instance analysis of multirater agreement data are presented to illustrate the performance of the proposed Bayesian joint relatively QR approach.</p></div>","PeriodicalId":55446,"journal":{"name":"Asta-Advances in Statistical Analysis","volume":"109 1","pages":"85 - 116"},"PeriodicalIF":1.4,"publicationDate":"2024-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142202706","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}