{"title":"Comments on “sensitivity of estimands in clinical trials with imperfect compliance” by Chen and Heitjan","authors":"Stuart G. Baker, Karen S. Lindeman","doi":"10.1515/ijb-2023-0127","DOIUrl":"https://doi.org/10.1515/ijb-2023-0127","url":null,"abstract":"Chen and Heitjan (Sensitivity of estimands in clinical trials with imperfect compliance. Int J Biostat. 2023) used linear extrapolation to estimate the population average causal effect (PACE) from the complier average causal effect (CACE) in multiple randomized trials with all-or-none compliance. For extrapolating from CACE to PACE in this setting and in the paired availability design involving different availabilities of treatment among before-and-after studies, we recommend the sensitivity analysis in Baker and Lindeman (J Causal Inference, 2013) because it is not restricted to a linear model, as it involves various random effect and trend models.","PeriodicalId":49058,"journal":{"name":"International Journal of Biostatistics","volume":"72 1","pages":""},"PeriodicalIF":1.2,"publicationDate":"2024-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141785513","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Detecting differentially expressed genes from RNA-seq data using fuzzy clustering","authors":"Yuki Ando, Asanao Shimokawa","doi":"10.1515/ijb-2023-0125","DOIUrl":"https://doi.org/10.1515/ijb-2023-0125","url":null,"abstract":"A two-group comparison test is generally performed on RNA sequencing data to detect differentially expressed genes (DEGs). However, the accuracy of this method is low due to the small sample size. To address this, we propose a method using fuzzy clustering that artificially generates data with expression patterns similar to those of DEGs to identify genes that are highly likely to be classified into the same cluster as the initial cluster data. The proposed method is advantageous in that it does not perform any test. Furthermore, a certain level of accuracy can be maintained even when the sample size is biased, and we show that such a situation may improve the accuracy of the proposed method. We compared the proposed method with the conventional method using simulations. In the simulations, we changed the sample size and difference between the expression levels of group 1 and group 2 in the DEGs to obtain the desired accuracy of the proposed method. The results show that the proposed method is superior in all cases under the conditions simulated. We also show that the effect of the difference between group 1 and group 2 on the accuracy is more prominent when the sample size is biased.","PeriodicalId":49058,"journal":{"name":"International Journal of Biostatistics","volume":"61 1","pages":""},"PeriodicalIF":1.2,"publicationDate":"2024-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141778930","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Matthew Berkowitz, Rachel MacKay Altman, Thomas M. Loughin
{"title":"Random forests for survival data: which methods work best and under what conditions?","authors":"Matthew Berkowitz, Rachel MacKay Altman, Thomas M. Loughin","doi":"10.1515/ijb-2023-0056","DOIUrl":"https://doi.org/10.1515/ijb-2023-0056","url":null,"abstract":"Few systematic comparisons of methods for constructing survival trees and forests exist in the literature. Importantly, when the goal is to predict a survival time or estimate a survival function, the optimal choice of method is unclear. We use an extensive simulation study to systematically investigate various factors that influence survival forest performance – forest construction method, censoring, sample size, distribution of the response, structure of the linear predictor, and presence of correlated or noisy covariates. In particular, we study 11 methods that have recently been proposed in the literature and identify 6 top performers. We find that all the factors that we investigate have significant impact on the methods’ relative accuracy of point predictions of survival times and survival function estimates. We use our results to make recommendations for which methods to use in a given context and offer explanations for the observed differences in relative performance.","PeriodicalId":49058,"journal":{"name":"International Journal of Biostatistics","volume":"8 1","pages":""},"PeriodicalIF":1.2,"publicationDate":"2024-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140801340","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Kalman filter with impulse noised outliers: a robust sequential algorithm to filter data with a large number of outliers","authors":"Bertrand Cloez, Bénédicte Fontez, Eliel González-García, Isabelle Sanchez","doi":"10.1515/ijb-2023-0065","DOIUrl":"https://doi.org/10.1515/ijb-2023-0065","url":null,"abstract":"Impulse noised outliers are data points that differ significantly from other observations. They are generally removed from the data set through local regression or the Kalman filter algorithm. However, these methods, or their generalizations, are not well suited when the number of outliers is of the same order as the number of low-noise data (often called <jats:italic>nominal measurement</jats:italic>). In this article, we propose a new model for impulsed noise outliers. It is based on a hierarchical model and a simple linear Gaussian process as with the Kalman Filter. We present a fast forward-backward algorithm to filter and smooth sequential data and which also detects these outliers. We compare the robustness and efficiency of this algorithm with classical methods. Finally, we apply this method on a real data set from a Walk Over Weighing system admitting around 60 % of outliers. For this application, we further develop an (explicit) EM algorithm to calibrate some algorithm parameters.","PeriodicalId":49058,"journal":{"name":"International Journal of Biostatistics","volume":"4 1","pages":""},"PeriodicalIF":1.2,"publicationDate":"2024-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140613617","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Ensemble learning methods of inference for spatially stratified infectious disease systems","authors":"Jeffrey Peitsch, Gyanendra Pokharel, Shakhawat Hossain","doi":"10.1515/ijb-2023-0102","DOIUrl":"https://doi.org/10.1515/ijb-2023-0102","url":null,"abstract":"Individual level models are a class of mechanistic models that are widely used to infer infectious disease transmission dynamics. These models incorporate individual level covariate information accounting for population heterogeneity and are generally fitted in a Bayesian Markov chain Monte Carlo (MCMC) framework. However, Bayesian MCMC methods of inference are computationally expensive for large data sets. This issue becomes more severe when applied to infectious disease data collected from spatially heterogeneous populations, as the number of covariates increases. In addition, summary statistics over the global population may not capture the true spatio-temporal dynamics of disease transmission. In this study we propose to use ensemble learning methods to predict epidemic generating models instead of time consuming Bayesian MCMC method. We apply these methods to infer disease transmission dynamics over spatially clustered populations, considering the clusters as natural strata instead of a global population. We compare the performance of two tree-based ensemble learning techniques: random forest and gradient boosting. These methods are applied to the 2001 foot-and-mouth disease epidemic in the U.K. and evaluated using simulated data from a clustered population. It is shown that the spatially clustered data can help to predict epidemic generating models more accurately than the global data.","PeriodicalId":49058,"journal":{"name":"International Journal of Biostatistics","volume":"56 1","pages":""},"PeriodicalIF":1.2,"publicationDate":"2024-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140569024","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
James H. McVittie, David B. Wolfson, David A. Stephens
{"title":"The survival function NPMLE for combined right-censored and length-biased right-censored failure time data: properties and applications","authors":"James H. McVittie, David B. Wolfson, David A. Stephens","doi":"10.1515/ijb-2023-0121","DOIUrl":"https://doi.org/10.1515/ijb-2023-0121","url":null,"abstract":"Many cohort studies in survival analysis have imbedded in them subcohorts consisting of incident cases and prevalent cases. Instead of analysing the data from the incident and prevalent cohorts alone, there are surely advantages to combining the data from these two subcohorts. In this paper, we discuss a survival function nonparametric maximum likelihood estimator (NPMLE) using both length-biased right-censored prevalent cohort data and right-censored incident cohort data. We establish the asymptotic properties of the survival function NPMLE and utilize the NPMLE to estimate the distribution for time spent in a Montreal area hospital.","PeriodicalId":49058,"journal":{"name":"International Journal of Biostatistics","volume":"56 1","pages":""},"PeriodicalIF":1.2,"publicationDate":"2024-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140569156","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Martin Paries, Evelyne Vigneau, Adeline Huneau, Olivier Lantz, Stéphanie Bougeard
{"title":"MBPCA-OS: an exploratory multiblock method for variables of different measurement levels. Application to study the immune response to SARS-CoV-2 infection and vaccination","authors":"Martin Paries, Evelyne Vigneau, Adeline Huneau, Olivier Lantz, Stéphanie Bougeard","doi":"10.1515/ijb-2023-0062","DOIUrl":"https://doi.org/10.1515/ijb-2023-0062","url":null,"abstract":"Studying a large number of variables measured on the same observations and organized in blocks – denoted multiblock data – is becoming standard in several domains especially in biology. To explore the relationships between all these variables – at the block- and the variable-level – several exploratory multiblock methods were proposed. However, most of them are only designed for numeric variables. In reality, some data sets contain variables of different measurement levels (i.e., numeric, nominal, ordinal). In this article, we focus on exploratory multiblock methods that handle variables at their appropriate measurement level. Multi-Block Principal Component Analysis with Optimal Scaling (MBPCA-OS) is proposed and applied to multiblock data from the CURIE-O-SA French cohort. In this study, variables are of different measurement levels and organized in four blocks. The objective is to study the immune responses according to the SARS-CoV-2 infection and vaccination statuses, the symptoms and the participant’s characteristics.","PeriodicalId":49058,"journal":{"name":"International Journal of Biostatistics","volume":"92 1","pages":""},"PeriodicalIF":1.2,"publicationDate":"2023-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138579777","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Designing efficient randomized trials: power and sample size calculation when using semiparametric efficient estimators.","authors":"Alejandro Schuler","doi":"10.1515/ijb-2021-0039","DOIUrl":"https://doi.org/10.1515/ijb-2021-0039","url":null,"abstract":"<p><p>Trials enroll a large number of subjects in order to attain power, making them expensive and time-consuming. Sample size calculations are often performed with the assumption of an unadjusted analysis, even if the trial analysis plan specifies a more efficient estimator (e.g. ANCOVA). This leads to conservative estimates of required sample sizes and an opportunity for savings. Here we show that a relatively simple formula can be used to estimate the power of any two-arm, single-timepoint trial analyzed with a semiparametric efficient estimator, regardless of the domain of the outcome or kind of treatment effect (e.g. odds ratio, mean difference). Since an efficient estimator attains the minimum possible asymptotic variance, this allows for the design of trials that are as small as possible while still attaining design power and control of type I error. The required sample size calculation is parsimonious and requires the analyst to provide only a small number of population parameters. We verify in simulation that the large-sample properties of trials designed this way attain their nominal values. Lastly, we demonstrate how to use this formula in the \"design\" (and subsequent reanalysis) of a real randomized trial and show that fewer subjects are required to attain the same design power when a semiparametric efficient estimator is accounted for at the design stage.</p>","PeriodicalId":49058,"journal":{"name":"International Journal of Biostatistics","volume":"18 1","pages":"151-171"},"PeriodicalIF":1.2,"publicationDate":"2021-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39297947","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Bayesian adaptive design of early-phase clinical trials for precision medicine based on cancer biomarkers.","authors":"Shinjo Yada","doi":"10.1515/ijb-2021-0009","DOIUrl":"https://doi.org/10.1515/ijb-2021-0009","url":null,"abstract":"<p><p>Cancer tissue samples obtained via biopsy or surgery were examined for specific gene mutations by genetic testing to inform treatment. Precision medicine, which considers not only the cancer type and location, but also the genetic information, environment, and lifestyle of each patient, can be applied for disease prevention and treatment in individual patients. The number of patient-specific characteristics, including biomarkers, has been increasing with time; these characteristics are highly correlated with outcomes. The number of patients at the beginning of early-phase clinical trials is often limited. Moreover, it is challenging to estimate parameters of models that include baseline characteristics as covariates such as biomarkers. To overcome these issues and promote personalized medicine, we propose a dose-finding method that considers patient background characteristics, including biomarkers, using a model for phase I/II oncology trials. We built a Bayesian neural network with input variables of dose, biomarkers, and interactions between dose and biomarkers and output variables of efficacy outcomes for each patient. We trained the neural network to select the optimal dose based on all background characteristics of a patient. Simulation analysis showed that the probability of selecting the desirable dose was higher using the proposed method than that using the naïve method.</p>","PeriodicalId":49058,"journal":{"name":"International Journal of Biostatistics","volume":"18 1","pages":"109-125"},"PeriodicalIF":1.2,"publicationDate":"2021-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/ijb-2021-0009","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39015527","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Adam Errington, Jochen Einbeck, Jonathan Cumming, Ute Rössler, David Endesfelder
{"title":"The effect of data aggregation on dispersion estimates in count data models.","authors":"Adam Errington, Jochen Einbeck, Jonathan Cumming, Ute Rössler, David Endesfelder","doi":"10.1515/ijb-2020-0079","DOIUrl":"https://doi.org/10.1515/ijb-2020-0079","url":null,"abstract":"<p><p>For the modelling of count data, aggregation of the raw data over certain subgroups or predictor configurations is common practice. This is, for instance, the case for count data biomarkers of radiation exposure. Under the Poisson law, count data can be aggregated without loss of information on the Poisson parameter, which remains true if the Poisson assumption is relaxed towards quasi-Poisson. However, in biodosimetry in particular, but also beyond, the question of how the dispersion estimates for quasi-Poisson models behave under data aggregation have received little attention. Indeed, for real data sets featuring unexplained heterogeneities, dispersion estimates can increase strongly after aggregation, an effect which we will demonstrate and quantify explicitly for some scenarios. The increase in dispersion estimates implies an inflation of the parameter standard errors, which, however, by comparison with random effect models, can be shown to serve a corrective purpose. The phenomena are illustrated by <i>γ</i>-H2AX foci data as used for instance in radiation biodosimetry for the calibration of dose-response curves.</p>","PeriodicalId":49058,"journal":{"name":"International Journal of Biostatistics","volume":"18 1","pages":"183-202"},"PeriodicalIF":1.2,"publicationDate":"2021-05-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/ijb-2020-0079","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"38961045","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}