Sophie Potts, Elisabeth Bergherr, Constantin Reinke, Colin Griesbach
{"title":"Prediction-based variable selection for component-wise gradient boosting.","authors":"Sophie Potts, Elisabeth Bergherr, Constantin Reinke, Colin Griesbach","doi":"10.1515/ijb-2023-0052","DOIUrl":"10.1515/ijb-2023-0052","url":null,"abstract":"<p><p>Model-based component-wise gradient boosting is a popular tool for data-driven variable selection. In order to improve its prediction and selection qualities even further, several modifications of the original algorithm have been developed, that mainly focus on different stopping criteria, leaving the actual variable selection mechanism untouched. We investigate different prediction-based mechanisms for the variable selection step in model-based component-wise gradient boosting. These approaches include Akaikes Information Criterion (AIC) as well as a selection rule relying on the component-wise test error computed via cross-validation. We implemented the AIC and cross-validation routines for Generalized Linear Models and evaluated them regarding their variable selection properties and predictive performance. An extensive simulation study revealed improved selection properties whereas the prediction error could be lowered in a real world application with age-standardized COVID-19 incidence rates.</p>","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":" ","pages":"293-314"},"PeriodicalIF":1.2,"publicationDate":"2023-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138435376","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Bayesian second-order sensitivity of longitudinal inferences to non-ignorability: an application to antidepressant clinical trial data.","authors":"Elahe Momeni Roochi, Samaneh Eftekhari Mahabadi","doi":"10.1515/ijb-2022-0014","DOIUrl":"10.1515/ijb-2022-0014","url":null,"abstract":"<p><p>Incomplete data is a prevalent complication in longitudinal studies due to individuals' drop-out before intended completion time. Currently available methods via commercial software for analyzing incomplete longitudinal data at best rely on the ignorability of the drop-outs. If the underlying missing mechanism was non-ignorable, potential bias arises in the statistical inferences. To remove the bias when the drop-out is non-ignorable, joint complete-data and drop-out models have been proposed which involve computational difficulties and untestable assumptions. Since the critical ignorability assumption is unverifiable based on the observed part of the sample, some local sensitivity indices have been proposed in the literature. Specifically, Eftekhari Mahabadi (Second-order local sensitivity to non-ignorability in Bayesian inferences. Stat Med 2018;59:55-95) proposed a second-order local sensitivity tool for Bayesian analysis of cross-sectional studies and show its better performance for handling bias compared with the first-order ones. In this paper, we aim to extend this index for the Bayesian sensitivity analysis of normal longitudinal studies with drop-outs. The index is driven based on a selection model for the drop-out mechanism and a Bayesian linear mixed-effect complete-data model. The presented formulas are calculated using the posterior estimation and draws from the simpler ignorable model. The method is illustrated via some simulation studies and sensitivity analysis of a real antidepressant clinical trial data. Overall, the numerical analysis showed that when repeated outcomes are subject to missingness, regression coefficient estimates are nearly approximated well by a linear function in the neighbourhood of MAR model, but there are a considerable amount of second-order sensitivity for the error term and random effect variances in Bayesian linear mixed-effect model framework.</p>","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":" ","pages":"599-629"},"PeriodicalIF":1.2,"publicationDate":"2023-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138441586","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Revisiting incidence rates comparison under right censorship.","authors":"Pablo Martínez-Camblor, Susana Díaz-Coto","doi":"10.1515/ijb-2023-0025","DOIUrl":"10.1515/ijb-2023-0025","url":null,"abstract":"<p><p>Data description is the first step for understanding the nature of the problem at hand. Usually, it is a simple task that does not require any particular assumption. However, the interpretation of the used descriptive measures can be a source of confusion and misunderstanding. The incidence rate is the quotient between the number of observed events and the sum of time that the studied population was at risk of having this event (person-time). Despite this apparently simple definition, its interpretation is not free of complexity. In this piece of research, we revisit the incidence rate estimator under right-censorship. We analyze the effect that the censoring time distribution can have on the observed results, and its relevance in the comparison of two or more incidence rates. We propose a solution for limiting the impact that the data collection process can have on the results of the hypothesis testing. We explore the finite-sample behavior of the considered estimators from Monte Carlo simulations. Two examples based on synthetic data illustrate the considered problem. The R code and data used are provided as Supplementary Material.</p>","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":" ","pages":"491-506"},"PeriodicalIF":1.2,"publicationDate":"2023-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89720368","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Testing for association between ordinal traits and genetic variants in pedigree-structured samples by collapsing and kernel methods.","authors":"Li-Chu Chien","doi":"10.1515/ijb-2022-0123","DOIUrl":"10.1515/ijb-2022-0123","url":null,"abstract":"<p><p>In genome-wide association studies (GWAS), logistic regression is one of the most popular analytics methods for binary traits. Multinomial regression is an extension of binary logistic regression that allows for multiple categories. However, many GWAS methods have been limited application to binary traits. These methods have improperly often been used to account for ordinal traits, which causes inappropriate type I error rates and poor statistical power. Owing to the lack of analysis methods, GWAS of ordinal traits has been known to be problematic and gaining attention. In this paper, we develop a general framework for identifying ordinal traits associated with genetic variants in pedigree-structured samples by collapsing and kernel methods. We use the local odds ratios GEE technology to account for complicated correlation structures between family members and ordered categorical traits. We use the retrospective idea to treat the genetic markers as random variables for calculating genetic correlations among markers. The proposed genetic association method can accommodate ordinal traits and allow for the covariate adjustment. We conduct simulation studies to compare the proposed tests with the existing models for analyzing the ordered categorical data under various configurations. We illustrate application of the proposed tests by simultaneously analyzing a family study and a cross-sectional study from the Genetic Analysis Workshop 19 (GAW19) data.</p>","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":" ","pages":"677-690"},"PeriodicalIF":1.2,"publicationDate":"2023-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41177324","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wentian Li, S. Cetin, A. Ulgen, M. Cetin, Hakan Şıvgın, Yaning Yang
{"title":"Approximate reciprocal relationship between two cause-specific hazard ratios in COVID-19 data with mutually exclusive events","authors":"Wentian Li, S. Cetin, A. Ulgen, M. Cetin, Hakan Şıvgın, Yaning Yang","doi":"10.1101/2021.04.22.21255955","DOIUrl":"https://doi.org/10.1101/2021.04.22.21255955","url":null,"abstract":"Abstract COVID-19 survival data presents a special situation where not only the time-to-event period is short, but also the two events or outcome types, death and release from hospital, are mutually exclusive, leading to two cause-specific hazard ratios (csHR d and csHR r ). The eventual mortality/release outcome is also analyzed by logistic regression to obtain odds-ratio (OR). We have the following three empirical observations: (1) The magnitude of OR is an upper limit of the csHR d : |log(OR)| ≥ |log(csHR d )|. This relationship between OR and HR might be understood from the definition of the two quantities; (2) csHR d and csHR r point in opposite directions: log(csHR d ) ⋅ log(csHR r ) < 0; This relation is a direct consequence of the nature of the two events; and (3) there is a tendency for a reciprocal relation between csHR d and csHR r : csHR d ∼ 1/csHR r . Though an approximate reciprocal trend between the two hazard ratios is in indication that the same factor causing faster death also lead to slow recovery by a similar mechanism, and vice versa, a quantitative relation between csHR d and csHR r in this context is not obvious. These results may help future analyses of data from COVID-19 or other similar diseases, in particular if the deceased patients are lacking, whereas surviving patients are abundant.","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":"0 1","pages":""},"PeriodicalIF":1.2,"publicationDate":"2021-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42193520","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Christian Palmes, Tobias Bluhmki, Benedikt Funke, E. Bluhmki
{"title":"Asymptotic properties of the two one-sided t-tests – new insights and the Schuirmann-constant","authors":"Christian Palmes, Tobias Bluhmki, Benedikt Funke, E. Bluhmki","doi":"10.1515/IJB-2020-0057","DOIUrl":"https://doi.org/10.1515/IJB-2020-0057","url":null,"abstract":"Abstract The two one-sided t-tests (TOST) method is the most popular statistical equivalence test with many areas of application, i.e., in the pharmaceutical industry. Proper sample size calculation is needed in order to show equivalence with a certain power. Here, the crucial problem of choosing a suitable mean-difference in TOST sample size calculations is addressed. As an alternative concept, it is assumed that the mean-difference follows an a-priori distribution. Special interest is given to the uniform and some centered triangle a-priori distributions. Using a newly developed asymptotical theory a helpful analogy principle is found: every a-priori distribution corresponds to a point mean-difference, which we call its Schuirmann-constant. This constant does not depend on the standard deviation and aims to support the investigator in finding a well-considered mean-difference for proper sample size calculations in complex data situations. In addition to the proposed concept, we demonstrate that well-known sample size approximation formulas in the literature are in fact biased and state their unbiased corrections as well. Moreover, an R package is provided for a right away application of our newly developed concepts.","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":"18 1","pages":"19 - 38"},"PeriodicalIF":1.2,"publicationDate":"2021-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/IJB-2020-0057","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46667419","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Estimation of semi-Markov multi-state models: a comparison of the sojourn times and transition intensities approaches","authors":"A. Asanjarani, B. Liquet, Y. Nazarathy","doi":"10.1515/IJB-2020-0083","DOIUrl":"https://doi.org/10.1515/IJB-2020-0083","url":null,"abstract":"Abstract Semi-Markov models are widely used for survival analysis and reliability analysis. In general, there are two competing parameterizations and each entails its own interpretation and inference properties. On the one hand, a semi-Markov process can be defined based on the distribution of sojourn times, often via hazard rates, together with transition probabilities of an embedded Markov chain. On the other hand, intensity transition functions may be used, often referred to as the hazard rates of the semi-Markov process. We summarize and contrast these two parameterizations both from a probabilistic and an inference perspective, and we highlight relationships between the two approaches. In general, the intensity transition based approach allows the likelihood to be split into likelihoods of two-state models having fewer parameters, allowing efficient computation and usage of many survival analysis tools. Nevertheless, in certain cases the sojourn time based approach is natural and has been exploited extensively in applications. In contrasting the two approaches and contemporary relevant R packages used for inference, we use two real datasets highlighting the probabilistic and inference properties of each approach. This analysis is accompanied by an R vignette.","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":"18 1","pages":"243 - 262"},"PeriodicalIF":1.2,"publicationDate":"2020-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/IJB-2020-0083","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43491644","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Incorporating Contact Network Uncertainty in Individual Level Models of Infectious Disease using Approximate Bayesian Computation","authors":"Waleed Almutiry, R. Deardon","doi":"10.1515/ijb-2017-0092","DOIUrl":"https://doi.org/10.1515/ijb-2017-0092","url":null,"abstract":"Abstract Infectious disease transmission between individuals in a heterogeneous population is often best modelled through a contact network. However, such contact network data are often unobserved. Such missing data can be accounted for in a Bayesian data augmented framework using Markov chain Monte Carlo (MCMC). Unfortunately, fitting models in such a framework can be highly computationally intensive. We investigate the fitting of network-based infectious disease models with completely unknown contact networks using approximate Bayesian computation population Monte Carlo (ABC-PMC) methods. This is done in the context of both simulated data, and data from the UK 2001 foot-and-mouth disease epidemic. We show that ABC-PMC is able to obtain reasonable approximations of the underlying infectious disease model with huge savings in computation time when compared to a full Bayesian MCMC analysis.","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":" ","pages":""},"PeriodicalIF":1.2,"publicationDate":"2019-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/ijb-2017-0092","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42487422","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}