{"title":"Designing efficient randomized trials: power and sample size calculation when using semiparametric efficient estimators.","authors":"Alejandro Schuler","doi":"10.1515/ijb-2021-0039","DOIUrl":"https://doi.org/10.1515/ijb-2021-0039","url":null,"abstract":"<p><p>Trials enroll a large number of subjects in order to attain power, making them expensive and time-consuming. Sample size calculations are often performed with the assumption of an unadjusted analysis, even if the trial analysis plan specifies a more efficient estimator (e.g. ANCOVA). This leads to conservative estimates of required sample sizes and an opportunity for savings. Here we show that a relatively simple formula can be used to estimate the power of any two-arm, single-timepoint trial analyzed with a semiparametric efficient estimator, regardless of the domain of the outcome or kind of treatment effect (e.g. odds ratio, mean difference). Since an efficient estimator attains the minimum possible asymptotic variance, this allows for the design of trials that are as small as possible while still attaining design power and control of type I error. The required sample size calculation is parsimonious and requires the analyst to provide only a small number of population parameters. We verify in simulation that the large-sample properties of trials designed this way attain their nominal values. Lastly, we demonstrate how to use this formula in the \"design\" (and subsequent reanalysis) of a real randomized trial and show that fewer subjects are required to attain the same design power when a semiparametric efficient estimator is accounted for at the design stage.</p>","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2021-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39297947","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Bayesian adaptive design of early-phase clinical trials for precision medicine based on cancer biomarkers.","authors":"Shinjo Yada","doi":"10.1515/ijb-2021-0009","DOIUrl":"https://doi.org/10.1515/ijb-2021-0009","url":null,"abstract":"<p><p>Cancer tissue samples obtained via biopsy or surgery were examined for specific gene mutations by genetic testing to inform treatment. Precision medicine, which considers not only the cancer type and location, but also the genetic information, environment, and lifestyle of each patient, can be applied for disease prevention and treatment in individual patients. The number of patient-specific characteristics, including biomarkers, has been increasing with time; these characteristics are highly correlated with outcomes. The number of patients at the beginning of early-phase clinical trials is often limited. Moreover, it is challenging to estimate parameters of models that include baseline characteristics as covariates such as biomarkers. To overcome these issues and promote personalized medicine, we propose a dose-finding method that considers patient background characteristics, including biomarkers, using a model for phase I/II oncology trials. We built a Bayesian neural network with input variables of dose, biomarkers, and interactions between dose and biomarkers and output variables of efficacy outcomes for each patient. We trained the neural network to select the optimal dose based on all background characteristics of a patient. Simulation analysis showed that the probability of selecting the desirable dose was higher using the proposed method than that using the naïve method.</p>","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2021-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/ijb-2021-0009","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39015527","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Adam Errington, Jochen Einbeck, Jonathan Cumming, Ute Rössler, David Endesfelder
{"title":"The effect of data aggregation on dispersion estimates in count data models.","authors":"Adam Errington, Jochen Einbeck, Jonathan Cumming, Ute Rössler, David Endesfelder","doi":"10.1515/ijb-2020-0079","DOIUrl":"https://doi.org/10.1515/ijb-2020-0079","url":null,"abstract":"<p><p>For the modelling of count data, aggregation of the raw data over certain subgroups or predictor configurations is common practice. This is, for instance, the case for count data biomarkers of radiation exposure. Under the Poisson law, count data can be aggregated without loss of information on the Poisson parameter, which remains true if the Poisson assumption is relaxed towards quasi-Poisson. However, in biodosimetry in particular, but also beyond, the question of how the dispersion estimates for quasi-Poisson models behave under data aggregation have received little attention. Indeed, for real data sets featuring unexplained heterogeneities, dispersion estimates can increase strongly after aggregation, an effect which we will demonstrate and quantify explicitly for some scenarios. The increase in dispersion estimates implies an inflation of the parameter standard errors, which, however, by comparison with random effect models, can be shown to serve a corrective purpose. The phenomena are illustrated by <i>γ</i>-H2AX foci data as used for instance in radiation biodosimetry for the calibration of dose-response curves.</p>","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2021-05-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/ijb-2020-0079","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"38961045","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lei Huang, Liwen Su, Yuling Zheng, Yuanyuan Chen, Fangrong Yan
{"title":"Power prior for borrowing the real-world data in bioequivalence test with a parallel design.","authors":"Lei Huang, Liwen Su, Yuling Zheng, Yuanyuan Chen, Fangrong Yan","doi":"10.1515/ijb-2020-0119","DOIUrl":"https://doi.org/10.1515/ijb-2020-0119","url":null,"abstract":"<p><p>Recently, real-world study has attracted wide attention for drug development. In bioequivalence study, the reference drug often has been marketed for many years and accumulated abundant real-world data. It is therefore appealing to incorporate these data in the design to improve trial efficiency. In this paper, we propose a Bayesian method to include real-world data of the reference drug in a current bioequivalence trial, with the aim to increase the power of analysis and reduce sample size for long half-life drugs. We adopt the power prior method for incorporating real-world data and use the average bioequivalence posterior probability to evaluate the bioequivalence between the test drug and the reference drug. Simulations were conducted to investigate the performance of the proposed method in different scenarios. The simulation results show that the proposed design has higher power than the traditional design without borrowing real-world data, while controlling the type I error. Moreover, the proposed method saves sample size and reduces costs for the trial.</p>","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2021-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"38960602","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wentian Li, S. Cetin, A. Ulgen, M. Cetin, Hakan Şıvgın, Yaning Yang
{"title":"Approximate reciprocal relationship between two cause-specific hazard ratios in COVID-19 data with mutually exclusive events","authors":"Wentian Li, S. Cetin, A. Ulgen, M. Cetin, Hakan Şıvgın, Yaning Yang","doi":"10.1101/2021.04.22.21255955","DOIUrl":"https://doi.org/10.1101/2021.04.22.21255955","url":null,"abstract":"Abstract COVID-19 survival data presents a special situation where not only the time-to-event period is short, but also the two events or outcome types, death and release from hospital, are mutually exclusive, leading to two cause-specific hazard ratios (csHR d and csHR r ). The eventual mortality/release outcome is also analyzed by logistic regression to obtain odds-ratio (OR). We have the following three empirical observations: (1) The magnitude of OR is an upper limit of the csHR d : |log(OR)| ≥ |log(csHR d )|. This relationship between OR and HR might be understood from the definition of the two quantities; (2) csHR d and csHR r point in opposite directions: log(csHR d ) ⋅ log(csHR r ) < 0; This relation is a direct consequence of the nature of the two events; and (3) there is a tendency for a reciprocal relation between csHR d and csHR r : csHR d ∼ 1/csHR r . Though an approximate reciprocal trend between the two hazard ratios is in indication that the same factor causing faster death also lead to slow recovery by a similar mechanism, and vice versa, a quantitative relation between csHR d and csHR r in this context is not obvious. These results may help future analyses of data from COVID-19 or other similar diseases, in particular if the deceased patients are lacking, whereas surviving patients are abundant.","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2021-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42193520","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Bayesian mixture model for changepoint estimation using ordinal predictors.","authors":"Emily Roberts, Lili Zhao","doi":"10.1515/ijb-2020-0151","DOIUrl":"https://doi.org/10.1515/ijb-2020-0151","url":null,"abstract":"<p><p>In regression models, predictor variables with inherent ordering, such ECOG performance status or novel biomarker expression levels, are commonly seen in medical settings. Statistically, it may be difficult to determine the functional form of an ordinal predictor variable. Often, such a variable is dichotomized based on whether it is above or below a certain cutoff. Other methods conveniently treat the ordinal predictor as a continuous variable and assume a linear relationship with the outcome. However, arbitrarily choosing a method may lead to inaccurate inference and treatment. In this paper, we propose a Bayesian mixture model to consider both dichotomous and linear forms for the variable. This allows for simultaneous assessment of the appropriate form of the predictor in regression models by considering the presence of a changepoint through the lens of a threshold detection problem. This method is applicable to continuous, binary, and survival outcomes, and it is easily amenable to penalized regression. We evaluated the proposed method using simulation studies and apply it to two real datasets. We provide JAGS code for easy implementation.</p>","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2021-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/ijb-2020-0151","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25564949","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Bayesian optimization design for finding a maximum tolerated dose combination in phase I clinical trials.","authors":"Ami Takahashi, Taiji Suzuki","doi":"10.1515/ijb-2020-0147","DOIUrl":"https://doi.org/10.1515/ijb-2020-0147","url":null,"abstract":"<p><p>The development of combination therapies has become commonplace because potential synergistic benefits are expected for resistant patients of single-agent treatment. In phase I clinical trials, the underlying premise is toxicity increases monotonically with increasing dose levels. This assumption cannot be applied in drug combination trials, however, as there are complex drug-drug interactions. Although many parametric model-based designs have been developed, strong assumptions may be inappropriate owing to little information available about dose-toxicity relationships. No standard solution for finding a maximum tolerated dose combination has been established. With these considerations, we propose a Bayesian optimization design for identifying a single maximum tolerated dose combination. Our proposed design utilizing Bayesian optimization guides the next dose by a balance of information between exploration and exploitation on the nonparametrically estimated dose-toxicity function, thereby allowing us to reach a global optimum with fewer evaluations. We evaluate the proposed design by comparing it with a Bayesian optimal interval design and with the partial-ordering continual reassessment method. The simulation results suggest that the proposed design works well in terms of correct selection probabilities and dose allocations. The proposed design has high potential as a powerful tool for use in finding a maximum tolerated dose combination.</p>","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2021-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/ijb-2020-0147","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25560130","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"More than one way: exploring the capabilities of different estimation approaches to joint models for longitudinal and time-to-event outcomes.","authors":"Anja Rappl, Andreas Mayr, Elisabeth Waldmann","doi":"10.1515/ijb-2020-0067","DOIUrl":"https://doi.org/10.1515/ijb-2020-0067","url":null,"abstract":"<p><p>The development of physical functioning after a caesura in an aged population is still widely unexplored. Analysis of this topic would need to model the longitudinal trajectories of physical functioning and simultaneously take terminal events (deaths) into account. Separate analysis of both results in biased estimates, since it neglects the inherent connection between the two outcomes. Thus, this type of data generating process is best modelled jointly. To facilitate this several software applications were made available. They differ in model formulation, estimation technique (likelihood-based, Bayesian inference, statistical boosting) and a comparison of the different approaches is necessary to identify their capabilities and limitations. Therefore, we compared the performance of the packages JM, joineRML, JMbayes and JMboost of the R software environment with respect to estimation accuracy, variable selection properties and prediction precision. With these findings we then illustrate the topic of physical functioning after a caesura with data from the German ageing survey (DEAS). The results suggest that in smaller data sets and theory driven modelling likelihood-based methods (expectation maximation, JM, joineRML) or Bayesian inference (JMbayes) are preferable, whereas statistical boosting (JMboost) is a better choice with high-dimensional data and data exploration settings.</p>","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2021-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/ijb-2020-0067","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25560133","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yixin Kong, Ariangela Kozik, Cindy H Nakatsu, Yava L Jones-Hall, Hyonho Chun
{"title":"A zero-inflated non-negative matrix factorization for the deconvolution of mixed signals of biological data.","authors":"Yixin Kong, Ariangela Kozik, Cindy H Nakatsu, Yava L Jones-Hall, Hyonho Chun","doi":"10.1515/ijb-2020-0039","DOIUrl":"https://doi.org/10.1515/ijb-2020-0039","url":null,"abstract":"<p><p>A latent factor model for count data is popularly applied in deconvoluting mixed signals in biological data as exemplified by sequencing data for transcriptome or microbiome studies. Due to the availability of pure samples such as single-cell transcriptome data, the accuracy of the estimates could be much improved. However, the advantage quickly disappears in the presence of excessive zeros. To correctly account for this phenomenon in both mixed and pure samples, we propose a zero-inflated non-negative matrix factorization and derive an effective multiplicative parameter updating rule. In simulation studies, our method yielded the smallest bias. We applied our approach to brain gene expression as well as fecal microbiome datasets, illustrating the superior performance of the approach. Our method is implemented as a publicly available R-package, iNMF.</p>","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2021-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/ijb-2020-0039","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25541025","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Riham El Saeiti, Marta García-Fiñana, David M Hughes
{"title":"The effect of random-effects misspecification on classification accuracy.","authors":"Riham El Saeiti, Marta García-Fiñana, David M Hughes","doi":"10.1515/ijb-2019-0159","DOIUrl":"10.1515/ijb-2019-0159","url":null,"abstract":"<p><p>Mixed models are a useful way of analysing longitudinal data. Random effects terms allow modelling of patient specific deviations from the overall trend over time. Correlation between repeated measurements are captured by specifying a joint distribution for all random effects in a model. Typically, this joint distribution is assumed to be a multivariate normal distribution. For Gaussian outcomes misspecification of the random effects distribution usually has little impact. However, when the outcome is discrete (e.g. counts or binary outcomes) generalised linear mixed models (GLMMs) are used to analyse longitudinal trends. Opinion is divided about how robust GLMMs are to misspecification of the random effects. Previous work explored the impact of random effects misspecification on the bias of model parameters in single outcome GLMMs. Accepting that these model parameters may be biased, we investigate whether this affects our ability to classify patients into clinical groups using a longitudinal discriminant analysis. We also consider multiple outcomes, which can significantly increase the dimensions of the random effects distribution when modelled simultaneously. We show that when there is severe departure from normality, more flexible mixture distributions can give better classification accuracy. However, in many cases, wrongly assuming a single multivariate normal distribution has little impact on classification accuracy.</p>","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2021-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/ijb-2019-0159","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25519700","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}