{"title":"Weighted Euclidean balancing for a matrix exposure in estimating causal effect.","authors":"Juan Chen, Yingchun Zhou","doi":"10.1515/ijb-2024-0021","DOIUrl":"https://doi.org/10.1515/ijb-2024-0021","url":null,"abstract":"<p><p>With the increasing complexity of data, researchers in various fields have become increasingly interested in estimating the causal effect of a matrix exposure, which involves complex multivariate treatments, on an outcome. Balancing covariates for the matrix exposure is essential to achieve this goal. While exact balancing and approximate balancing methods have been proposed for multiple balancing constraints, dealing with a matrix treatment introduces a large number of constraints, making it challenging to achieve exact balance or select suitable threshold parameters for approximate balancing methods. To address this challenge, the weighted Euclidean balancing method is proposed, which offers an approximate balance of covariates from an overall perspective. In this study, both parametric and nonparametric methods for estimating the causal effect of a matrix treatment is proposed, along with providing theoretical properties of the two estimations. To validate the effectiveness of our approach, extensive simulation results demonstrate that the proposed method outperforms alternative approaches across various scenarios. Finally, we apply the method to analyze the causal impact of the omics variables on the drug sensitivity of Vandetanib. The results indicate that EGFR CNV has a significant positive causal effect on Vandetanib efficacy, whereas EGFR methylation exerts a significant negative causal effect.</p>","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":" ","pages":""},"PeriodicalIF":1.2,"publicationDate":"2025-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144152240","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Guidance on individualized treatment rule estimation in high dimensions.","authors":"Philippe Boileau, Ning Leng, Sandrine Dudoit","doi":"10.1515/ijb-2024-0005","DOIUrl":"https://doi.org/10.1515/ijb-2024-0005","url":null,"abstract":"<p><p>Individualized treatment rules, cornerstones of precision medicine, inform patient treatment decisions with the goal of optimizing patient outcomes. These rules are generally unknown functions of patients' pre-treatment covariates, meaning they must be estimated from clinical or observational study data. Myriad methods have been developed to learn these rules, and these procedures are demonstrably successful in traditional asymptotic settings with moderate number of covariates. The finite-sample performance of these methods in high-dimensional covariate settings, which are increasingly the norm in modern clinical trials, has not been well characterized, however. We perform a comprehensive comparison of state-of-the-art individualized treatment rule estimators, assessing performance on the basis of the estimators' rule quality, interpretability, and computational efficiency. Sixteen data-generating processes with continuous outcomes and binary treatment assignments are considered, reflecting a diversity of randomized and observational studies. We summarize our findings and provide succinct advice to practitioners needing to estimate individualized treatment rules in high dimensions. Owing to individualized treatment rule estimators' poor interpretability, we propose a novel pre-treatment covariate filtering procedure based on recent work for uncovering treatment effect modifiers. We show that it improves estimators' rule quality and interpretability. All code is made publicly available, facilitating modifications and extensions to our simulation study.</p>","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":" ","pages":""},"PeriodicalIF":1.2,"publicationDate":"2025-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144151742","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xueqing Yin, Craig Anderson, Duncan Lee, Gary Napier
{"title":"Risk estimation and boundary detection in Bayesian disease mapping.","authors":"Xueqing Yin, Craig Anderson, Duncan Lee, Gary Napier","doi":"10.1515/ijb-2023-0138","DOIUrl":"https://doi.org/10.1515/ijb-2023-0138","url":null,"abstract":"<p><p>Bayesian hierarchical models with a spatially smooth conditional autoregressive prior distribution are commonly used to estimate the spatio-temporal pattern in disease risk from areal unit data. However, most of the modeling approaches do not take possible boundaries of step changes in disease risk between geographically neighbouring areas into consideration, which may lead to oversmoothing of the risk surfaces, prevent the detection of high-risk areas and yield biased estimation of disease risk. In this paper, we propose a two-stage method to jointly estimate the disease risk in small areas over time and detect the locations of boundaries that separate pairs of neighbouring areas exhibiting vastly different risks. In the first stage, we use a graph-based optimisation algorithm to construct a set of candidate neighbourhood matrices that represent a range of possible boundary structures for the disease data. In the second stage, a Bayesian hierarchical spatio-temporal model that takes the boundaries into account is fitted to the data. The performance of the methodology is evidenced by simulation, before being applied to a study of respiratory disease risk in Greater Glasgow, Scotland.</p>","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":" ","pages":""},"PeriodicalIF":1.2,"publicationDate":"2025-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144151799","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An improved estimator of the logarithmic odds ratio for small sample sizes using a Bayesian approach.","authors":"Toru Ogura, Takemi Yanagimoto","doi":"10.1515/ijb-2024-0105","DOIUrl":"https://doi.org/10.1515/ijb-2024-0105","url":null,"abstract":"<p><p>The logarithmic odds ratio is a well-known method for comparing binary data between two independent groups. Although various existing methods proposed for estimating a logarithmic odds ratio, most methods estimate two proportions in each group independently and then estimate the logarithmic odds ratio using the two estimated proportions. When using a logarithmic odds ratio, researchers are more interested in the logarithmic odds ratio than proportions for each group. Parameter estimations, generally, incur random and systematic errors. These errors in initially estimated parameter may affect later estimated parameter. We propose a Bayesian estimator to directly estimate a logarithmic odds ratio without using proportions for each group. Many existing methods need to estimate two parameters (two proportions in each group) to estimate a logarithmic odds ratio; however, the proposed method only estimates one parameter (logarithmic odds ratio). Therefore, the proposed estimator can be closer to the population's logarithmic odds ratio than existing estimators. Additionally, the validity of the proposed estimator is verified using numerical calculations and applications.</p>","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":" ","pages":""},"PeriodicalIF":1.2,"publicationDate":"2025-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144006634","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Homogeneity test and sample size of response rates for <i>AC</i> <sub>1</sub> in a stratified evaluation design.","authors":"Jingwei Jia, Yuanbo Liu, Jikai Yang, Zhiming Li","doi":"10.1515/ijb-2024-0080","DOIUrl":"https://doi.org/10.1515/ijb-2024-0080","url":null,"abstract":"<p><p>Gwet's first-order agreement coefficient (<i>AC</i> <sub>1</sub>) is widely used to evaluate the consistency between raters. Considering the existence of a certain relationship between the raters, the paper aims to test the equality of response rates and the dependency between two raters of modified <i>AC</i> <sub>1</sub>'s in a stratified design and estimates the sample size for a given significance level. We first establish a probability model and then estimate the unknown parameters. Further, we explore the homogeneity test of these <i>AC</i> <sub>1</sub>'s under the asymptotic method, such as likelihood ratio, score, and Wald-type statistics. In numerical simulation, the performance of statistics is investigated in terms of type I error rates (TIEs) and power while finding a suitable sample size under a given power. The results show that the Wald-type statistic has robust TIEs and satisfactory power and is suitable for large samples (n≥50). Under the same power, the sample size of the Wald-type test is smaller when the number of strata is large. The higher the power, the larger the required sample size. Finally, two real examples are given to illustrate these methods.</p>","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":" ","pages":""},"PeriodicalIF":1.2,"publicationDate":"2025-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144025779","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A hybrid hazard-based model using two-piece distributions.","authors":"Worku Biyadgie Ewnetu, Irène Gijbels, Anneleen Verhasselt","doi":"10.1515/ijb-2023-0153","DOIUrl":"https://doi.org/10.1515/ijb-2023-0153","url":null,"abstract":"<p><p>Cox proportional hazards model is widely used to study the relationship between the survival time of an event and covariates. Its primary objective is parameter estimation assuming a constant relative hazard throughout the entire follow-up time. The baseline hazard is thus treated as a nuisance parameter. However, if the interest is to predict possible outcomes like specific quantiles of the distribution (e.g. median survival time), survival and hazard functions, it may be more convenient to use a parametric baseline distribution. Such a parametric model should however be flexible enough to allow for various shapes of e.g. the hazard function. In this paper we propose flexible hazard-based models for right censored data using a large class of two-piece asymmetric baseline distributions. The effect of covariates is characterized through time-scale changes on hazard progression and on the relative hazard ratio; and can take three possible functional forms: parametric, semi-parametric (partly linear) and non-parametric. In the first case, the usual full likelihood estimation method is applied. In the semi-parametric and non-parametric settings a general profile (local) likelihood estimation approach is proposed. An extensive simulation study investigates the finite-sample performances of the proposed method. Its use in data analysis is illustrated in real data examples.</p>","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":" ","pages":""},"PeriodicalIF":1.2,"publicationDate":"2025-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144038766","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A review of survival stacking: a method to cast survival regression analysis as a classification problem.","authors":"Erin Craig, Chenyang Zhong, Robert Tibshirani","doi":"10.1515/ijb-2022-0055","DOIUrl":"https://doi.org/10.1515/ijb-2022-0055","url":null,"abstract":"<p><p>While there are many well-developed data science methods for classification and regression, there are relatively few methods for working with right-censored data. Here, we review survival stacking, a method for casting a survival regression analysis problem as a classification problem, thereby allowing the use of general classification methods and software in a survival setting. Inspired by the Cox partial likelihood, survival stacking collects features and outcomes of survival data in a large data frame with a binary outcome. We show that survival stacking with logistic regression is approximately equivalent to the Cox proportional hazards model. We further illustrate survival stacking on real and simulated data. By reframing survival regression problems as classification problems, survival stacking removes the reliance on specialized tools for survival regression, and makes it straightforward for data scientists to use well-known learning algorithms and software for classification in the survival setting. This in turn lowers the barrier for flexible survival modeling.</p>","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":" ","pages":""},"PeriodicalIF":1.2,"publicationDate":"2025-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143732819","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dimitra Eleftheriou, Thomas Piper, Mario Thevis, Tereza Neocleous
{"title":"A multivariate Bayesian learning approach for improved detection of doping in athletes using urinary steroid profiles.","authors":"Dimitra Eleftheriou, Thomas Piper, Mario Thevis, Tereza Neocleous","doi":"10.1515/ijb-2024-0019","DOIUrl":"https://doi.org/10.1515/ijb-2024-0019","url":null,"abstract":"<p><p>Biomarker analysis of athletes' urinary steroid profiles is crucial for the success of anti-doping efforts. Current statistical analysis methods generate personalised limits for each athlete based on univariate modelling of longitudinal biomarker values from the urinary steroid profile. However, simultaneous modelling of multiple biomarkers has the potential to further enhance abnormality detection. In this study, we propose a multivariate Bayesian adaptive model for longitudinal data analysis, which extends the established single-biomarker model in forensic toxicology. The proposed approach employs Markov chain Monte Carlo sampling methods and addresses the scarcity of confirmed abnormal values through a one-class classification algorithm. By adapting decision boundaries as new measurements are obtained, the model provides robust and personalised detection thresholds for each athlete. We tested the proposed approach on a database of 229 athletes, which includes longitudinal steroid profiles containing samples classified as normal, atypical, or confirmed abnormal. Our results demonstrate improved detection performance, highlighting the potential value of a multivariate approach in doping detection.</p>","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":" ","pages":""},"PeriodicalIF":1.2,"publicationDate":"2025-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143732816","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Regression analysis of clustered current status data with informative cluster size under a transformed survival model.","authors":"Yanqin Feng, Shijiao Yin, Jieli Ding","doi":"10.1515/ijb-2023-0130","DOIUrl":"https://doi.org/10.1515/ijb-2023-0130","url":null,"abstract":"<p><p>In this paper, we study inference methods for regression analysis of clustered current status data with informative cluster sizes. When the correlated failure times of interest arise from a general class of semiparametric transformation frailty models, we develop a nonparametric maximum likelihood estimation based method for regression analysis and conduct an expectation-maximization algorithm to implement it. The asymptotic properties including consistency and asymptotic normality of the proposed estimators are established. Extensive simulation studies are conducted and indicate that the proposed method works well. The developed approach is applied to analyze a real-life data set from a tumorigenicity study.</p>","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":" ","pages":""},"PeriodicalIF":1.2,"publicationDate":"2025-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143674819","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lauren D Liao, Emilie Højbjerre-Frandsen, Alan E Hubbard, Alejandro Schuler
{"title":"Prognostic adjustment with efficient estimators to unbiasedly leverage historical data in randomized trials.","authors":"Lauren D Liao, Emilie Højbjerre-Frandsen, Alan E Hubbard, Alejandro Schuler","doi":"10.1515/ijb-2024-0018","DOIUrl":"https://doi.org/10.1515/ijb-2024-0018","url":null,"abstract":"<p><p>Although randomized controlled trials (RCTs) are a cornerstone of comparative effectiveness, they typically have much smaller sample size than observational studies due to financial and ethical considerations. Therefore there is interest in using plentiful historical data (either observational data or prior trials) to reduce trial sizes. Previous estimators developed for this purpose rely on unrealistic assumptions, without which the added data can bias the treatment effect estimate. Recent work proposed an alternative method (prognostic covariate adjustment) that imposes no additional assumptions and increases efficiency in trial analyses. The idea is to use historical data to learn a prognostic model: a regression of the outcome onto the covariates. The predictions from this model, generated from the RCT subjects' baseline variables, are then used as a covariate in a linear regression analysis of the trial data. In this work, we extend prognostic adjustment to trial analyses with nonparametric efficient estimators, which are more powerful than linear regression. We provide theory that explains why prognostic adjustment improves small-sample point estimation and inference without any possibility of bias. Simulations corroborate the theory: efficient estimators using prognostic adjustment compared to without provides greater power (i.e., smaller standard errors) when the trial is small. Population shifts between historical and trial data attenuate benefits but do not introduce bias. We showcase our estimator using clinical trial data provided by Novo Nordisk A/S that evaluates insulin therapy for individuals with type 2 diabetes.</p>","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":" ","pages":""},"PeriodicalIF":1.2,"publicationDate":"2025-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143598241","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}