{"title":"Estimation and Inference for the Mediation Proportion.","authors":"Daniel Nevo, Xiaomei Liao, Donna Spiegelman","doi":"10.1515/ijb-2017-0006","DOIUrl":"https://doi.org/10.1515/ijb-2017-0006","url":null,"abstract":"<p><p>In epidemiology, public health and social science, mediation analysis is often undertaken to investigate the extent to which the effect of a risk factor on an outcome of interest is mediated by other covariates. A pivotal quantity of interest in such an analysis is the mediation proportion. A common method for estimating it, termed the \"difference method\", compares estimates from models with and without the hypothesized mediator. However, rigorous methodology for estimation and statistical inference for this quantity has not previously been available. We formulated the problem for the Cox model and generalized linear models, and utilize a data duplication algorithm together with a generalized estimation equations approach for estimating the mediation proportion and its variance. We further considered the assumption that the same link function hold for the marginal and conditional models, a property which we term \"g-linkability\". We show that our approach is valid whenever g-linkability holds, exactly or approximately, and present results from an extensive simulation study to explore finite sample properties. The methodology is illustrated by an analysis of pre-menopausal breast cancer incidence in the Nurses' Health Study. User-friendly publicly available software implementing those methods can be downloaded from the last author's website (SAS) or from CRAN (R).</p>","PeriodicalId":49058,"journal":{"name":"International Journal of Biostatistics","volume":"13 2","pages":""},"PeriodicalIF":1.2,"publicationDate":"2017-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/ijb-2017-0006","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35372681","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Modelling Mixed Types of Outcomes in Additive Genetic Models.","authors":"Wagner Hugo Bonat","doi":"10.1515/ijb-2017-0001","DOIUrl":"https://doi.org/10.1515/ijb-2017-0001","url":null,"abstract":"<p><p>We present a general statistical modelling framework for handling multivariate mixed types of outcomes in the context of quantitative genetic analysis. The models are based on the multivariate covariance generalized linear models, where the matrix linear predictor is composed of an identity matrix combined with a relatedness matrix defined by a pedigree, representing the environmental and genetic components, respectively. We also propose a new index of heritability for non-Gaussian data. A case study on house sparrow (Passer domesticus) population with continuous, binomial and count outcomes is employed to motivate the new model. Simulation of multivariate marginal models is not trivial, thus we adapt the NORTA (Normal to anything) algorithm for simulation of multivariate covariance generalized linear models in the context of genetic data analysis. A simulation study is presented to assess the asymptotic properties of the estimating function estimators for the correlation between outcomes and the new heritability index parameters. The data set and R code are available in the supplementary material.</p>","PeriodicalId":49058,"journal":{"name":"International Journal of Biostatistics","volume":"13 2","pages":""},"PeriodicalIF":1.2,"publicationDate":"2017-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/ijb-2017-0001","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35152532","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Kernel-Based Measure of Variable Importance for Genetic Association Studies.","authors":"Vicente Gallego, M Luz Calle, Ramon Oller","doi":"10.1515/ijb-2016-0087","DOIUrl":"https://doi.org/10.1515/ijb-2016-0087","url":null,"abstract":"<p><p>The identification of genetic variants that are associated with disease risk is an important goal of genetic association studies. Standard approaches perform univariate analysis where each genetic variant, usually Single Nucleotide Polymorphisms (SNPs), is tested for association with disease status. Though many genetic variants have been identified and validated so far using this univariate approach, for most complex diseases a large part of their genetic component is still unknown, the so called missing heritability. We propose a Kernel-based measure of variable importance (KVI) that provides the contribution of a SNP, or a group of SNPs, to the joint genetic effect of a set of genetic variants. KVI can be used for ranking genetic markers individually, sets of markers that form blocks of linkage disequilibrium or sets of genetic variants that lie in a gene or a genetic pathway. We prove that, unlike the univariate analysis, KVI captures the relationship with other genetic variants in the analysis, even when measured at the individual level for each genetic variable separately. This is specially relevant and powerful for detecting genetic interactions. We illustrate the results with data from an Alzheimer's disease study and show through simulations that the rankings based on KVI improve those rankings based on two measures of importance provided by the Random Forest. We also prove with a simulation study that KVI is very powerful for detecting genetic interactions.</p>","PeriodicalId":49058,"journal":{"name":"International Journal of Biostatistics","volume":"13 2","pages":""},"PeriodicalIF":1.2,"publicationDate":"2017-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/ijb-2016-0087","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35099476","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Generalized Confidence Intervals for Intra- and Inter-subject Coefficients of Variation in Linear Mixed-effects Models.","authors":"Johannes Forkman","doi":"10.1515/ijb-2016-0093","DOIUrl":"https://doi.org/10.1515/ijb-2016-0093","url":null,"abstract":"<p><p>Linear mixed-effects models are linear models with several variance components. Models with a single random-effects factor have two variance components: the random-effects variance, i. e., the inter-subject variance, and the residual error variance, i. e., the intra-subject variance. In many applications, it is practice to report variance components as coefficients of variation. The intra- and inter-subject coefficients of variation are the square roots of the corresponding variances divided by the mean. This article proposes methods for computing confidence intervals for intra- and inter-subject coefficients of variation using generalized pivotal quantities. The methods are illustrated through two examples. In the first example, precision is assessed within and between runs in a bioanalytical method validation. In the second example, variation is estimated within and between main plots in an agricultural split-plot experiment. Coverage of generalized confidence intervals is investigated through simulation and shown to be close to the nominal value.</p>","PeriodicalId":49058,"journal":{"name":"International Journal of Biostatistics","volume":"13 2","pages":""},"PeriodicalIF":1.2,"publicationDate":"2017-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/ijb-2016-0093","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35138437","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Big Data, Small Sample.","authors":"Inna Gerlovina, Mark J van der Laan, Alan Hubbard","doi":"10.1515/ijb-2017-0012","DOIUrl":"https://doi.org/10.1515/ijb-2017-0012","url":null,"abstract":"<p><p>Multiple comparisons and small sample size, common characteristics of many types of \"Big Data\" including those that are produced by genomic studies, present specific challenges that affect reliability of inference. Use of multiple testing procedures necessitates calculation of very small tail probabilities of a test statistic distribution. Results based on large deviation theory provide a formal condition that is necessary to guarantee error rate control given practical sample sizes, linking the number of tests and the sample size; this condition, however, is rarely satisfied. Using methods that are based on Edgeworth expansions (relying especially on the work of Peter Hall), we explore the impact of departures of sampling distributions from typical assumptions on actual error rates. Our investigation illustrates how far the actual error rates can be from the declared nominal levels, suggesting potentially wide-spread problems with error rate control, specifically excessive false positives. This is an important factor that contributes to \"reproducibility crisis\". We also review some other commonly used methods (such as permutation and methods based on finite sampling inequalities) in their application to multiple testing/small sample data. We point out that Edgeworth expansions, providing higher order approximations to the sampling distribution, offer a promising direction for data analysis that could improve reliability of studies relying on large numbers of comparisons with modest sample sizes.</p>","PeriodicalId":49058,"journal":{"name":"International Journal of Biostatistics","volume":"13 1","pages":""},"PeriodicalIF":1.2,"publicationDate":"2017-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/ijb-2017-0012","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35076952","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improvement Screening for Ultra-High Dimensional Data with Censored Survival Outcomes and Varying Coefficients.","authors":"Mu Yue, Jialiang Li","doi":"10.1515/ijb-2017-0024","DOIUrl":"https://doi.org/10.1515/ijb-2017-0024","url":null,"abstract":"<p><p>Motivated by risk prediction studies with ultra-high dimensional bio markers, we propose a novel improvement screening methodology. Accurate risk prediction can be quite useful for patient treatment selection, prevention strategy or disease management in evidence-based medicine. The question of how to choose new markers in addition to the conventional ones is especially important. In the past decade, a number of new measures for quantifying the added value from the new markers were proposed, among which the integrated discrimination improvement (IDI) and net reclassification improvement (NRI) stand out. Meanwhile, C-statistics are routinely used to quantify the capacity of the estimated risk score in discriminating among subjects with different event times. In this paper, we will examine these improvement statistics as well as the norm-based approach for evaluating the incremental values of new markers and compare these four measures by analyzing ultra-high dimensional censored survival data. In particular, we consider Cox proportional hazards models with varying coefficients. All measures perform very well in simulations and we illustrate our methods in an application to a lung cancer study.</p>","PeriodicalId":49058,"journal":{"name":"International Journal of Biostatistics","volume":"13 1","pages":""},"PeriodicalIF":1.2,"publicationDate":"2017-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/ijb-2017-0024","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35027552","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Comparing Four Methods for Estimating Tree-Based Treatment Regimes.","authors":"Aniek Sies, Iven Van Mechelen","doi":"10.1515/ijb-2016-0068","DOIUrl":"https://doi.org/10.1515/ijb-2016-0068","url":null,"abstract":"<p><p>When multiple treatment alternatives are available for a certain psychological or medical problem, an important challenge is to find an optimal treatment regime, which specifies for each patient the most effective treatment alternative given his or her pattern of pretreatment characteristics. The focus of this paper is on tree-based treatment regimes, which link an optimal treatment alternative to each leaf of a tree; as such they provide an insightful representation of the decision structure underlying the regime. This paper compares the absolute and relative performance of four methods for estimating regimes of that sort (viz., Interaction Trees, Model-based Recursive Partitioning, an approach developed by Zhang et al. and Qualitative Interaction Trees) in an extensive simulation study. The evaluation criteria were, on the one hand, the expected outcome if the entire population would be subjected to the treatment regime resulting from each method under study and the proportion of clients assigned to the truly best treatment alternative, and, on the other hand, the Type I and Type II error probabilities of each method. The method of Zhang et al. was superior regarding the first two outcome measures and the Type II error probabilities, but performed worst in some conditions of the simulation study regarding Type I error probabilities.</p>","PeriodicalId":49058,"journal":{"name":"International Journal of Biostatistics","volume":"13 1","pages":""},"PeriodicalIF":1.2,"publicationDate":"2017-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/ijb-2016-0068","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35013211","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Group Tests for High-dimensional Failure Time Data with the Additive Hazards Models.","authors":"Dandan Jiang, Jianguo Sun","doi":"10.1515/ijb-2016-0085","DOIUrl":"https://doi.org/10.1515/ijb-2016-0085","url":null,"abstract":"<p><p>Statistical analysis of high-dimensional data has been attracting more and more attention due to the abundance of such data in various fields such as genetic studies or genomics and the existence of many interesting topics. Among them, one is the identification of a gene or genes that have significant effects on the occurrence of or are significantly related to a certain disease. In this paper, we will discuss such a problem that can be formulated as a group test or testing a group of variables or coefficients when one faces right-censored failure time response variable. For the problem, we develop a corrected variance reduced partial profiling (CVRPP) linear regression model and a likelihood ratio test procedure when the failure time of interest follows the additive hazards model. The numerical study suggests that the proposed method works well in practical situations and gives better performance than the existing one. An illustrative example is provided.</p>","PeriodicalId":49058,"journal":{"name":"International Journal of Biostatistics","volume":"13 1","pages":""},"PeriodicalIF":1.2,"publicationDate":"2017-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/ijb-2016-0085","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34986865","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Median Analysis of Repeated Measures Associated with Recurrent Events in Presence of Terminal Event.","authors":"Rajeshwari Sundaram, Ling Ma, Subhashis Ghoshal","doi":"10.1515/ijb-2016-0057","DOIUrl":"https://doi.org/10.1515/ijb-2016-0057","url":null,"abstract":"<p><p>Recurrent events are often encountered in medical follow up studies. In addition, such recurrences have other quantities associated with them that are of considerable interest, for instance medical costs of the repeated hospitalizations and tumor size in cancer recurrences. These processes can be viewed as point processes, i.e. processes with arbitrary positive jump at each recurrence. An analysis of the mean function for such point processes have been proposed in the literature. However, such point processes are often skewed, leading to median as a more appropriate measure than the mean. Furthermore, the analysis of recurrent event data is often complicated by the presence of death. We propose a semiparametric model for assessing the effect of covariates on the quantiles of the point processes. We investigate both the finite sample as well as the large sample properties of the proposed estimators. We conclude with a real data analysis of the medical cost associated with the treatment of ovarian cancer.</p>","PeriodicalId":49058,"journal":{"name":"International Journal of Biostatistics","volume":"13 1","pages":""},"PeriodicalIF":1.2,"publicationDate":"2017-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/ijb-2016-0057","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34951162","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Empirical Likelihood in Nonignorable Covariate-Missing Data Problems.","authors":"Yanmei Xie, Biao Zhang","doi":"10.1515/ijb-2016-0053","DOIUrl":"https://doi.org/10.1515/ijb-2016-0053","url":null,"abstract":"Abstract: Missing covariate data occurs often in regression analysis, which frequently arises in the health and social sciences as well as in survey sampling. We study methods for the analysis of a nonignorable covariate-missing data problem in an assumed conditional mean function when some covariates are completely observed but other covariates are missing for some subjects. We adopt the semiparametric perspective of Bartlett et al. (Improving upon the efficiency of complete case analysis when covariates are MNAR. Biostatistics 2014;15:719–30) on regression analyses with nonignorable missing covariates, in which they have introduced the use of two working models, the working probability model of missingness and the working conditional score model. In this paper, we study an empirical likelihood approach to nonignorable covariate-missing data problems with the objective of effectively utilizing the two working models in the analysis of covariate-missing data. We propose a unified approach to constructing a system of unbiased estimating equations, where there are more equations than unknown parameters of interest. One useful feature of these unbiased estimating equations is that they naturally incorporate the incomplete data into the data analysis, making it possible to seek efficient estimation of the parameter of interest even when the working regression function is not specified to be the optimal regression function. We apply the general methodology of empirical likelihood to optimally combine these unbiased estimating equations. We propose three maximum empirical likelihood estimators of the underlying regression parameters and compare their efficiencies with other existing competitors. We present a simulation study to compare the finite-sample performance of various methods with respect to bias, efficiency, and robustness to model misspecification. The proposed empirical likelihood method is also illustrated by an analysis of a data set from the US National Health and Nutrition Examination Survey (NHANES).","PeriodicalId":49058,"journal":{"name":"International Journal of Biostatistics","volume":"13 1","pages":""},"PeriodicalIF":1.2,"publicationDate":"2017-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/ijb-2016-0053","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34940139","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}