{"title":"A Weighted Risk Set Estimator for Survival Distributions in Two-Stage Randomization Designs with Censored Survival Data","authors":"Xiang Guo, A. Tsiatis","doi":"10.2202/1557-4679.1000","DOIUrl":"https://doi.org/10.2202/1557-4679.1000","url":null,"abstract":"In many clinical trials related to diseases such as cancers and HIV, patients are treated by different combinations of therapies. This leads to two-stage designs, where patients are initially randomized to a primary therapy and then depending on disease remission and patients' consent, a maintenance therapy will be randomly assigned. In such designs, the effects of different treatment policies, i.e., combinations of primary and maintenance therapy are of great interest. In this paper, we propose an estimator for the survival distribution for each treatment policy in such two-stage studies with right-censoring using the method of weighted estimation equations within risk sets. We also derive the large-sample properties. The method is demonstrated and compared with other estimators through simulations and applied to analyze a two-stage randomized study with leukemia patients.","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":"1 1","pages":""},"PeriodicalIF":1.2,"publicationDate":"2005-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.2202/1557-4679.1000","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68714613","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Relationship between Derivatives of the Observed and Full Loglikelihoods and Application to Newton-Raphson Algorithm","authors":"D. Commenges, V. Rondeau","doi":"10.2202/1557-4679.1010","DOIUrl":"https://doi.org/10.2202/1557-4679.1010","url":null,"abstract":"In the case of incomplete data we give general relationships between the first and second derivatives of the loglikelihood relative to the full and the incomplete observation set-ups. In the case where these quantities are easy to compute for the full observation set-up we propose to compute their analogue for the incomplete observation set-up using the above mentioned relationships: this involves numerical integrations. Once we are able to compute these quantities, Newton-Raphson type algorithms can be applied to find the maximum likelihood estimators, together with estimates of their variances. We detail the application of this approach to parametric multiplicative frailty models and we show that the method works well in practice using both a real data and a simulated example. The proposed algorithm outperforms a Newton-Raphson type algorithm using numerical derivatives.","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":"2 1","pages":""},"PeriodicalIF":1.2,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68714578","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Approximate Power and Sample Size Calculations with the Benjamini-Hochberg Method","authors":"J. A. Ferreira, A. Zwinderman","doi":"10.2202/1557-4679.1018","DOIUrl":"https://doi.org/10.2202/1557-4679.1018","url":null,"abstract":"We provide a method for calculating the sample size required to attain a given average power (the ratio of rejected hypotheses to the number of false hypotheses) and a given false discovery rate (the number of incorrect rejections divided by the number of rejections) in adaptive versions of the Benjamini-Hochberg method of multiple testing. The method works in an asymptotic sense as the number of hypotheses grows to infinity and under quite general conditions, and it requires data from a pilot study. The consistency of the method follows from several results in classical areas of nonparametric statistics developed in a new context of \"weak\" dependence.","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":"2 1","pages":""},"PeriodicalIF":1.2,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.2202/1557-4679.1018","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68714885","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
P. Sottas, N. Robinson, S. Giraud, F. Taroni, M. Kamber, P. Mangin, M. Saugy
{"title":"Statistical Classification of Abnormal Blood Profiles in Athletes","authors":"P. Sottas, N. Robinson, S. Giraud, F. Taroni, M. Kamber, P. Mangin, M. Saugy","doi":"10.2202/1557-4679.1011","DOIUrl":"https://doi.org/10.2202/1557-4679.1011","url":null,"abstract":"Blood doping has been challenging the scientific community since the early 1970's, where it was demonstrated that blood transfusion significantly improves physical performance. Here, we present through 3 applications how statistical classification techniques can assist the implementation of effective tests to deter blood doping in elite sports. In particular, we developed a new indirect and universal test of blood doping, called Abnormal Blood Profile Score (ABPS), based on the statistical classification of indirect biomarkers of altered erythropoiesis. Up to 601 hematological profiles have been compiled in a reference database. Twenty-one of them were obtained from blood samples withdrawn from professional athletes convicted of blood doping by other direct tests. Discriminative training algorithms were used jointly with cross-validation techniques to map these labeled reference profiles to target outputs. The strict cross-validation procedure facilitates the adherence to medico-legal standards mandated by the World Anti Doping Agency (WADA). The test has a sensitivity to recombinant erythropoietin (rhEPO) abuse up to 3 times better than current generative models, independently whether the athlete is currently taking rhEPO or has stopped the treatment. The test is also sensitive to any form of blood transfusion, autologous transfusion included. We finally conclude why a probabilistic approach should be encouraged for the evaluation of evidence in anti-doping area of investigation.","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":"2 1","pages":""},"PeriodicalIF":1.2,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.2202/1557-4679.1011","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68714923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Estimating a Survival Distribution with Current Status Data and High-dimensional Covariates","authors":"A. van der Vaart, M. J. van der Laan","doi":"10.2202/1557-4679.1014","DOIUrl":"https://doi.org/10.2202/1557-4679.1014","url":null,"abstract":"We consider the inverse problem of estimating a survival distribution when the survival times are only observed to be in one of the intervals of a random bisection of the time axis. We are particularly interested in the case that high-dimensional and/or time-dependent covariates are available, and/or the survival events and censoring times are only conditionally independent given the covariate process. The method of estimation consists of regularizing the survival distribution by taking the primitive function or smoothing, estimating the regularized parameter by using estimating equations, and finally recovering an estimator for the parameter of interest.","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":"2 1","pages":""},"PeriodicalIF":1.2,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.2202/1557-4679.1014","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68715195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Statistical Inference for Variable Importance","authors":"M. J. van der Laan","doi":"10.2202/1557-4679.1008","DOIUrl":"https://doi.org/10.2202/1557-4679.1008","url":null,"abstract":"Many statistical problems involve the learning of an importance/effect of a variable for predicting an outcome of interest based on observing a sample of $n$ independent and identically distributed observations on a list of input variables and an outcome. For example, though prediction/machine learning is, in principle, concerned with learning the optimal unknown mapping from input variables to an outcome from the data, the typical reported output is a list of importance measures for each input variable. The approach in prediction has been to learn the unknown optimal predictor from the data and derive, for each of the input variables, the variable importance from the obtained fit. In this article we propose a new approach which involves for each variable separately 1) defining variable importance as a real valued parameter, 2) deriving the efficient influence curve and thereby optimal estimating function for this parameter in the assumed (possibly nonparametric) model, and 3) develop a corresponding double robust locally efficient estimator of this variable importance, obtained by substituting for the nuisance parameters in the optimal estimating function data adaptive estimators. We illustrate this methodology in the context of prediction, and obtain in this manner double robust locally optimal estimators of marginal variable importance, accompanied with p-values and confidence intervals. In addition, we present a model based and machine learning approach to estimate covariate-adjusted variable importance. Finally, we generalize this methodology to variable importance parameters for time-dependent variables.","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":"2 1","pages":""},"PeriodicalIF":1.2,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.2202/1557-4679.1008","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68714461","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Properties of the Projected Length of the Curve (PLC) and Area Swept out by the Curve (ASC) Indices for the Receiver Operating Characteristic (SROC) Curve","authors":"Xuan Zhang, S. Walter, R. Agnihotram","doi":"10.2202/1557-4679.1096","DOIUrl":"https://doi.org/10.2202/1557-4679.1096","url":null,"abstract":"Several measures have been proposed to summarize the Receiver Operating Characteristic (ROC) curve, including the Projected Length of the Curve (PLC) and the Area Swept out by the Curve (ASC). These indices were first proposed by Lee (Epidemiology 1996; 7:605-611) to avoid certain deficiencies of the traditional Area Under the Curve (AUC) summary measure. More recently meta-analysis methods for assessing diagnostic test accuracy have been developed and the Summary Receiver Operating Characteristic (SROC) curve has been recommended to represent the performance of a diagnostic test. Some properties of the SROC curve were discussed by Walter (Statist. Med. 2002; 21:1237-1256). Here we extend that work to focus on properties of PLC and ASC in the context of SROC curve. Mathematical expressions for these two indices and their variances are derived in terms of the overall diagnostic odds ratio and the magnitude of inter-study heterogeneity in the odds ratio. Expressions for PLC and ASC and their variances are easily computed in homogeneous studies, and their values provide good approximations to the corresponding values for heterogeneous studies in most practical situations. General variances of PLC and ASC are derived by using delta methods, and are found to be smaller if the odds ratio is large. The methods are illustrated using data from two studies, the first being a meta-analysis on the detection of metastases in cervical cancer patients, and the second being a single study of HPV infection and pre-invasive cervical lesions.","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":"5 1","pages":""},"PeriodicalIF":1.2,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.2202/1557-4679.1096","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68715363","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Improved Akaike Information Criterion for Generalized Log-Gamma Regression Models","authors":"Xiaogang Su, Chih-Ling Tsai","doi":"10.2202/1557-4679.1032","DOIUrl":"https://doi.org/10.2202/1557-4679.1032","url":null,"abstract":"We propose an improved Akaike information criterion (AICc) for generalized log-gamma regression models, which include the extreme-value and normal regression models as special cases. Moreover, we extend our proposed criterion to situations when the data contain censored observations. Monte Carlo results show that AICc outperforms the classical Akaike information criterion (AIC), and an empirical example is presented to illustrate its usefulness.","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":"2 1","pages":""},"PeriodicalIF":1.2,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68715409","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On the Use of K-Fold Cross-Validation to Choose Cutoff Values and Assess the Performance of Predictive Models in Stepwise Regression","authors":"Z. Mahmood, Salahuddin J. Khan","doi":"10.2202/1557-4679.1105","DOIUrl":"https://doi.org/10.2202/1557-4679.1105","url":null,"abstract":"This paper addresses a methodological technique of leave-many-out cross-validation for choosing cutoff values in stepwise regression methods for simplifying the final regression model. A practical approach to choose cutoff values through cross-validation is to compute the minimum Predicted Residual Sum of Squares (PRESS). A leave-one-out cross-validation may overestimate the predictive model capabilities, for example see Shao (1993) and So et al (2000). Shao proves with asymptotic results and simulation that the model with the minimum value for the leave-oneout cross validation estimate of predictor errors is often over specified. That is, too many insignificant variables are contained in set βi of the regression model. He recommended using a method that leaves out a subset of observations, called K-fold cross-validation. Leave-many-out procedures can be more adequate in order to obtain significant and optimal results. We describe various investigations for the assessment of performance of predictive regression models, including different values of K in K-fold cross-validation and selecting the best possible cutoffvalues for automated model selection methods. We propose a resampling procedure by introducing alternative estimates of boosted cross-validated PRESS values for deciding the number of observations (l) to be omitted and number of folds/subsets (K) subsequently in K-fold cross-validation. Salahuddin and Hawkes (1991) used leave-one-out cross-validation to select equal cutoff values in stepwise regression which minimizes PRESS. We concentrate on applying K-fold cross-validation to choose unequal cutoff values that is F-to-enter and F-to-remove values which are then used for determining predictor variables in a regression model from the full data set. Our computer program for K-fold cross-validation can be efficiently used for choosing both equal and unequal cutoff values for automated model selection methods. Some previously analyzed data and Monte Carlo simulation are used to evaluate the proposed method against alternatives through a design experiment approach.","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":"5 1","pages":""},"PeriodicalIF":1.2,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.2202/1557-4679.1105","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68715423","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}