L. Fahrmeir, G. Kauermann, G. Tutz, Michael Windmann
{"title":"Spatial smoothing revisited: An application to rental data in Munich","authors":"L. Fahrmeir, G. Kauermann, G. Tutz, Michael Windmann","doi":"10.1177/1471082x231158465","DOIUrl":"https://doi.org/10.1177/1471082x231158465","url":null,"abstract":"Spatial smoothing makes use of spatial information to obtain better estimates in regression models. In particular flexible smoothing with B-splines and penalties, which has been propagated by Eilers and Marx (1996) , provides strong tools that can be used to include available spatial information. We consider alternative smoothing methods in spatial additive regression and employ them for analysing rental data in Munich. The first method applies tensor product P-splines to the geolocation of apartments, measured on a continuous scale through the centroid of the quarter where an apartment is. The alternative approach exploits the neighbourhood structure of districts on a discrete scale, where districts consist of a set of neighbouring quarters. The discrete modelling approach yields smooth estimates when using ridge-type penalties but can also enforce spatial clustering of districts with a homogeneous structure when using Lasso-type penalties.","PeriodicalId":49476,"journal":{"name":"Statistical Modelling","volume":" ","pages":""},"PeriodicalIF":1.0,"publicationDate":"2023-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46273613","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Tensor product P-splines using a sparse mixed model formulation","authors":"M. Boer","doi":"10.1177/1471082x231178591","DOIUrl":"https://doi.org/10.1177/1471082x231178591","url":null,"abstract":"A new approach to represent P-splines as a mixed model is presented. The corresponding matrices are sparse allowing the new approach can find the optimal values of the penalty parameters in a computationally efficient manner. Whereas the new mixed model P-splines formulation is similar to the original P-splines, a key difference is that the fixed effects are modelled explicitly, and extra constraints are added to the random part of the model. An important feature ensuring that the entire computation is fast is a sparse implementation of the Automated Differentiation of the Cholesky algorithm. It is shown by means of two examples that the new approach is fast compared to existing methods. The methodology has been implemented in the R-package LMMsolver available on CRAN ( https://CRAN.R-project.org/package=LMMsolver ).","PeriodicalId":49476,"journal":{"name":"Statistical Modelling","volume":" ","pages":""},"PeriodicalIF":1.0,"publicationDate":"2023-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47324621","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Mayr, T. Wistuba, Jan Speller, F. Gudé, B. Hofner
{"title":"Linear or smooth? Enhanced model choice in boosting via deselection of base-learners","authors":"A. Mayr, T. Wistuba, Jan Speller, F. Gudé, B. Hofner","doi":"10.1177/1471082x231170045","DOIUrl":"https://doi.org/10.1177/1471082x231170045","url":null,"abstract":"The specification of a particular type of effect (e.g., linear or non-linear) of a covariate in a regression model can be either based on graphical assessment, subject matter knowledge or also on data-driven model choice procedures. For the latter variant, we present a boosting approach that is available for a huge number of different model classes. Boosting is an indirect regularization technique that leads to variable selection and can easily incorporate also non-linear or smooth effects. Furthermore, the algorithm can be adapted in a way to automatically select whether to model a continuous variable with a smooth or a linear effect. We enhance this model choice procedure by trying to compensate the inherent bias towards the more complex effect by incorporating a pragmatic and simple deselection technique that was originally implemented for enhanced variable selection. We illustrate our approach in the analysis of T3 thyroid hormone levels from a larger Galician cohort and investigate its performance in a simulation study.","PeriodicalId":49476,"journal":{"name":"Statistical Modelling","volume":"1 1","pages":""},"PeriodicalIF":1.0,"publicationDate":"2023-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41395613","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Joint modelling of non-crossing additive quantile regression via constrained B-spline varying coefficients","authors":"V. Muggeo, G. Sottile, G. Cilluffo","doi":"10.1177/1471082x231181734","DOIUrl":"https://doi.org/10.1177/1471082x231181734","url":null,"abstract":"We present a unified framework able to fit the entire quantile process, namely to estimate simultaneously multiple non-crossing quantile curves. The framework relies on assuming each regression parameter varies smoothly across the percentile direction according to B-splines whose coefficients obey proper restrictions. Multiple linear and penalized smooth terms are allowed and the corresponding tuning parameters are estimated efficiently as part of the model fitting. Monotonicity and concavity constraints on the smoothed relationships are also easily accounted for in the framework. Simulation results provide evidence our proposal exhibits good statistical performance with respect to competitors while guaranteeing the non-crossing property and modest computational load. Analyses on a real dataset related to vocabulary size growth are presented to illustrate the model capability in practice.","PeriodicalId":49476,"journal":{"name":"Statistical Modelling","volume":" ","pages":""},"PeriodicalIF":1.0,"publicationDate":"2023-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46411772","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Bayesian semiparametric mixed effects proportional hazards model for clustered partly interval-censored data","authors":"Chun Pan, B. Cai","doi":"10.1177/1471082x231165559","DOIUrl":"https://doi.org/10.1177/1471082x231165559","url":null,"abstract":"Clustered partly interval-censored survival data naturally arise from many medical and epidemiological studies. We propose a Bayesian semiparametric approach for fitting a mixed effects proportional hazards (PH) model to clustered partly interval-censored data. The proposed method allows for not only a random intercept as most frailty models do for clustered survival data, but also random effects of covariates. We assume a normal prior for each random intercept/random effect, seeing the instability of a gamma prior for a frailty in this situation. Simulation studies with data generated from both mixed effects PH model and mixed effects accelerated failure times model are conducted, to evaluate the performance of the proposed method and compare it with the three methods currently available in the literature. The application of the proposed approach is illustrated through analyzing the progression-free survival data derived from a phase III metastatic colorectal cancer clinical trial.","PeriodicalId":49476,"journal":{"name":"Statistical Modelling","volume":" ","pages":""},"PeriodicalIF":1.0,"publicationDate":"2023-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49442378","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Robust clustering based on finite mixture of multivariate fragmental distributions","authors":"M. Maleki, G. McLachlan, Sharon X. Lee","doi":"10.1177/1471082X211048660","DOIUrl":"https://doi.org/10.1177/1471082X211048660","url":null,"abstract":"A flexible class of multivariate distributions called scale mixtures of fragmental normal (SMFN) distributions, is introduced. Its extension to the case of a finite mixture of SMFN (FM-SMFN) distributions is also proposed. The SMFN family of distributions is convenient and effective for modelling data with skewness, discrepant observations and population heterogeneity. It also possesses some other desirable properties, including an analytically tractable density and ease of computation for simulation and estimation of parameters. A stochastic representation of the SMFN distribution is given and then a hierarchical representation is described, the latter aids in parameter estimation, derivation of statistical properties and simulations. Maximum likelihood estimation of the FM-SMFN distribution via the expectation–maximization (EM) algorithm is outlined before the clustering performance of the proposed mixture model is illustrated using simulated and real datasets. In particular, the ability of FM-SMFN distributions to model data generated from various well-known families is demonstrated.","PeriodicalId":49476,"journal":{"name":"Statistical Modelling","volume":"23 1","pages":"247 - 272"},"PeriodicalIF":1.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46752796","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Canonical correlation analysis in high dimensions with structured regularization.","authors":"Elena Tuzhilina, Leonardo Tozzi, Trevor Hastie","doi":"10.1177/1471082x211041033","DOIUrl":"https://doi.org/10.1177/1471082x211041033","url":null,"abstract":"<p><p>Canonical correlation analysis (CCA) is a technique for measuring the association between two multivariate data matrices. A regularized modification of canonical correlation analysis (RCCA) which imposes an <i>ℓ</i><sub>2</sub> penalty on the CCA coefficients is widely used in applications with high-dimensional data. One limitation of such regularization is that it ignores any data structure, treating all the features equally, which can be ill-suited for some applications. In this article we introduce several approaches to regularizing CCA that take the underlying data structure into account. In particular, the proposed group regularized canonical correlation analysis (GRCCA) is useful when the variables are correlated in groups. We illustrate some computational strategies to avoid excessive computations with regularized CCA in high dimensions. We demonstrate the application of these methods in our motivating application from neuroscience, as well as in a small simulation example.</p>","PeriodicalId":49476,"journal":{"name":"Statistical Modelling","volume":"23 3","pages":"203-227"},"PeriodicalIF":1.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10274416/pdf/nihms-1834734.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9711519","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alexander Razen, Wolfgang A. Brunauer, N. Klein, T. Kneib, S. Lang, Nikolaus Umlauf
{"title":"A multilevel analysis of real estate valuation using distributional and quantile regression","authors":"Alexander Razen, Wolfgang A. Brunauer, N. Klein, T. Kneib, S. Lang, Nikolaus Umlauf","doi":"10.1177/1471082x231157205","DOIUrl":"https://doi.org/10.1177/1471082x231157205","url":null,"abstract":"Real estate valuation is typically based on hedonic regression models where the expected price of a property is explained in dependence of its attributes. However, investors in the housing market are equally interested in the distribution of real estate market values (including price variation), that is, determining the impact of attributes of a property on the entire conditional distribution. We therefore consider Bayesian structured additive distributional and quantile regression models for real estate valuation. In the first approach, each parameter of a potentially complex parametric response distribution is related to a structured additive predictor. In contrast, the second approach proceeds differently and models arbitrary quantiles of the response distribution directly and nonparametrically. Both models presented are based on a multilevel version of structured additive regression thereby utilizing the typical hierarchical structure of real estate data. We demonstrate the proposed methodology within a detailed case study based on more than 3 000 owner-occupied single family homes in Austria, discuss interpretation of the resulting effect estimates, and compare models based on their predictive ability.","PeriodicalId":49476,"journal":{"name":"Statistical Modelling","volume":" ","pages":""},"PeriodicalIF":1.0,"publicationDate":"2023-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45655548","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. Najera-Zuloaga, Dae-Jin Lee, C. Esteban, I. Arostegui
{"title":"Multidimensional beta-binomial regression model: A joint analysis of patient-reported outcomes","authors":"J. Najera-Zuloaga, Dae-Jin Lee, C. Esteban, I. Arostegui","doi":"10.1177/1471082x231151311","DOIUrl":"https://doi.org/10.1177/1471082x231151311","url":null,"abstract":"Patient-reported outcomes (PROs) are often used as primary outcomes in clinical research studies. PROs are usually measured in ordinal scales and they tend to have excess variability beyond the binomial distribution, a property called overdispersion. Beta-binomial distribution has been previously proposed in this context in order to fit PROs, and beta-binomial regression (BBR) as a good alternative for modelling purposes, including the extension to mixed-effects models in a longitudinal framework. Many PROs have various health dimensions, which are commonly correlated within subjects. However, in clinical analysis, dimensions are separately analysed. In this work, we propose a multidimensional BBR model that incorporates a multidimensional outcome including several PROs in a joint analysis. The proposal has been evaluated and compared to the independent analysis through a simulation study and a real data application with patients with respiratory disease. Results show the advantages that a multidimensional approach offers in terms of parameter significance and interpretation. Additionally, the methods proposed in this work are implemented in the PROreg R-package developed by the authors.","PeriodicalId":49476,"journal":{"name":"Statistical Modelling","volume":" ","pages":""},"PeriodicalIF":1.0,"publicationDate":"2023-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42363693","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A two-part measurement error model to estimate participation in undeclared work and related earnings","authors":"Maria Felice Arezzo, Serena Arima, G. Guagnano","doi":"10.1177/1471082x221145240","DOIUrl":"https://doi.org/10.1177/1471082x221145240","url":null,"abstract":"In undeclared work research, the estimation of the magnitude of the phenomenon (i.e., the amount of income and/or the percentage of workers involved) is of major interest. This has been done either using indirect methods or by means of ad hoc surveys such as the Eurobarometer special survey on undeclared work, our motivating study. The extent of undeclared work can be measured by means of two different outcomes: the event of working off-the-book (binary variable) and, when the event occurs, the amount of earnings deriving from the undeclared activity (continuous variable). This setup has been typically modeled via the so called two-part model: a binary choice model for the probability of observing a positive-versus-zero outcome and then, conditional on a positive outcome, a regression model for the positive outcome. We propose an extension of the two-part model that goes in two directions. The first regards the measurement error that, given the very nature of undeclared activities, is most likely to affect both the outcomes of interest. The second is that we generalize the linear regression part of the model to allow individual-level means. We also conduct an extensive simulation study to investigate the performance of the proposed model and compare it with traditional approaches.","PeriodicalId":49476,"journal":{"name":"Statistical Modelling","volume":" ","pages":""},"PeriodicalIF":1.0,"publicationDate":"2023-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47608085","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}