{"title":"A copula-based set-variant association test for bivariate continuous, binary or mixed phenotypes.","authors":"Julien St-Pierre, Karim Oualkacha","doi":"10.1515/ijb-2022-0010","DOIUrl":"10.1515/ijb-2022-0010","url":null,"abstract":"<p><p>In genome wide association studies (GWAS), researchers are often dealing with dichotomous and non-normally distributed traits, or a mixture of discrete-continuous traits. However, most of the current region-based methods rely on multivariate linear mixed models (mvLMMs) and assume a multivariate normal distribution for the phenotypes of interest. Hence, these methods are not applicable to disease or non-normally distributed traits. Therefore, there is a need to develop unified and flexible methods to study association between a set of (possibly rare) genetic variants and non-normal multivariate phenotypes. Copulas are multivariate distribution functions with uniform margins on the [0, 1] interval and they provide suitable models to deal with non-normality of errors in multivariate association studies. We propose a novel unified and flexible copula-based multivariate association test (CBMAT) for discovering association between a genetic region and a bivariate continuous, binary or mixed phenotype. We also derive a data-driven analytic <i>p</i>-value procedure of the proposed region-based score-type test. Through simulation studies, we demonstrate that CBMAT has well controlled type I error rates and higher power to detect associations compared with other existing methods, for discrete and non-normally distributed traits. At last, we apply CBMAT to detect the association between two genes located on chromosome 11 and several lipid levels measured on 1477 subjects from the ASLPAC study.</p>","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2022-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10644254/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10749076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Detection of atypical response trajectories in biomedical longitudinal databases.","authors":"Lucio José Pantazis, Rafael Antonio García","doi":"10.1515/ijb-2020-0076","DOIUrl":"10.1515/ijb-2020-0076","url":null,"abstract":"<p><p>Many health care professionals and institutions manage longitudinal databases, involving follow-ups for different patients over time. Longitudinal data frequently manifest additional complexities such as high variability, correlated measurements and missing data. Mixed effects models have been widely used to overcome these difficulties. This work proposes the use of linear mixed effects models as a tool that allows to search conceptually different types of anomalies in the data simultaneously.</p>","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2022-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"40569794","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Causal inference for oncology: past developments and current challenges.","authors":"Erica E M Moodie","doi":"10.1515/ijb-2022-0056","DOIUrl":"10.1515/ijb-2022-0056","url":null,"abstract":"<p><p>In this paper, we review some important early developments on causal inference in medical statistics and epidemiology that were inspired by questions in oncology. We examine two classical examples from the literature and point to a current area of ongoing methodological development, namely the estimation of optimal adaptive treatment strategies. While causal approaches to analysis have become more routine in oncology research, many exciting challenges and open problems remain, particularly in the context of censored outcomes.</p>","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2022-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"40342739","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Efficient estimation of pathwise differentiable target parameters with the undersmoothed highly adaptive lasso.","authors":"Mark J van der Laan, David Benkeser, Weixin Cai","doi":"10.1515/ijb-2019-0092","DOIUrl":"10.1515/ijb-2019-0092","url":null,"abstract":"<p><p>We consider estimation of a functional parameter of a realistically modeled data distribution based on observing independent and identically distributed observations. The highly adaptive lasso estimator of the functional parameter is defined as the minimizer of the empirical risk over a class of cadlag functions with finite sectional variation norm, where the functional parameter is parametrized in terms of such a class of functions. In this article we establish that this HAL estimator yields an asymptotically efficient estimator of any smooth feature of the functional parameter under a global undersmoothing condition. It is formally shown that the <i>L</i> <sub>1</sub>-restriction in HAL does not obstruct it from solving the score equations along paths that do not enforce this condition. Therefore, from an asymptotic point of view, the only reason for undersmoothing is that the true target function might not be complex so that the HAL-fit leaves out key basis functions that are needed to span the desired efficient influence curve of the smooth target parameter. Nonetheless, in practice undersmoothing appears to be beneficial and a simple targeted method is proposed and practically verified to perform well. We demonstrate our general result HAL-estimator of a treatment-specific mean and of the integrated square density. We also present simulations for these two examples confirming the theory.</p>","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2022-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10238856/pdf/ijb-19-1-ijb-2019-0092.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9560776","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A varying-coefficient partially linear transformation model for length-biased data with an application to HIV vaccine studies.","authors":"Alan T K Wan, Wei Zhao, Peter Gilbert, Yong Zhou","doi":"10.1515/ijb-2021-0057","DOIUrl":"10.1515/ijb-2021-0057","url":null,"abstract":"<p><p>Prevalent cohort studies in medical research often give rise to length-biased survival data that require special treatments. The recently proposed varying-coefficient partially linear transformation (VCPLT) model has the virtue of providing a more dynamic content of the effects of the covariates on survival times than the well-known partially linear transformation (PLT) model by allowing flexible interactions between the covariates. However, no existing analysis of the VCPLT model has considered length-biased sampling. In this paper, we consider the VCPLT model when the data are length-biased and right censored, thereby extending the reach of this flexible and powerful tool. We develop a martingale estimating function-based approach to the estimation of this model, provide theoretical underpinnings, evaluate finite sample performance via simulations, and showcase its practical appeal via an empirical application using data from two HIV vaccine clinical trials conducted by the U.S. National Institute of Allergy and Infectious Diseases.</p>","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2022-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9832178/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9567667","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lina M Montoya, Mark J van der Laan, Alexander R Luedtke, Jennifer L Skeem, Jeremy R Coyle, Maya L Petersen
{"title":"The optimal dynamic treatment rule superlearner: considerations, performance, and application to criminal justice interventions.","authors":"Lina M Montoya, Mark J van der Laan, Alexander R Luedtke, Jennifer L Skeem, Jeremy R Coyle, Maya L Petersen","doi":"10.1515/ijb-2020-0127","DOIUrl":"10.1515/ijb-2020-0127","url":null,"abstract":"<p><p>The optimal dynamic treatment rule (ODTR) framework offers an approach for understanding which kinds of patients respond best to specific treatments - in other words, treatment effect heterogeneity. Recently, there has been a proliferation of methods for estimating the ODTR. One such method is an extension of the SuperLearner algorithm - an ensemble method to optimally combine candidate algorithms extensively used in prediction problems - to ODTRs. Following the ``causal roadmap,\" we causally and statistically define the ODTR and provide an introduction to estimating it using the ODTR SuperLearner. Additionally, we highlight practical choices when implementing the algorithm, including choice of candidate algorithms, metalearners to combine the candidates, and risk functions to select the best combination of algorithms. Using simulations, we illustrate how estimating the ODTR using this SuperLearner approach can uncover treatment effect heterogeneity more effectively than traditional approaches based on fitting a parametric regression of the outcome on the treatment, covariates and treatment-covariate interactions. We investigate the implications of choices in implementing an ODTR SuperLearner at various sample sizes. Our results show the advantages of: (1) including a combination of both flexible machine learning algorithms and simple parametric estimators in the library of candidate algorithms; (2) using an ensemble metalearner to combine candidates rather than selecting only the best-performing candidate; (3) using the mean outcome under the rule as a risk function. Finally, we apply the ODTR SuperLearner to the ``Interventions\" study, an ongoing randomized controlled trial, to identify which justice-involved adults with mental illness benefit most from cognitive behavioral therapy to reduce criminal re-offending.</p>","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2022-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10238854/pdf/ijb-19-1-ijb-2020-0127.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9925259","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lina M Montoya, Mark J van der Laan, Jennifer L Skeem, Maya L Petersen
{"title":"Estimators for the value of the optimal dynamic treatment rule with application to criminal justice interventions.","authors":"Lina M Montoya, Mark J van der Laan, Jennifer L Skeem, Maya L Petersen","doi":"10.1515/ijb-2020-0128","DOIUrl":"10.1515/ijb-2020-0128","url":null,"abstract":"<p><p>Given an (optimal) dynamic treatment rule, it may be of interest to evaluate that rule - that is, to ask the causal question: what is the expected outcome had every subject received treatment according to that rule? In this paper, we study the performance of estimators that approximate the true value of: (1) an <i>a priori</i> known dynamic treatment rule (2) the true, unknown optimal dynamic treatment rule (ODTR); (3) an estimated ODTR, a so-called \"data-adaptive parameter,\" whose true value depends on the sample. Using simulations of point-treatment data, we specifically investigate: (1) the impact of increasingly data-adaptive estimation of nuisance parameters and/or of the ODTR on performance; (2) the potential for improved efficiency and bias reduction through the use of semiparametric efficient estimators; and, (3) the importance of sample splitting based on the cross-validated targeted maximum likelihood estimator (CV-TMLE) for accurate inference. In the simulations considered, there was very little cost and many benefits to using CV-TMLE to estimate the value of the true and estimated ODTR; importantly, and in contrast to non cross-validated estimators, the performance of CV-TMLE was maintained even when highly data-adaptive algorithms were used to estimate both nuisance parameters and the ODTR. In addition, we apply these estimators for the value of the rule to the \"Interventions\" study, an ongoing randomized controlled trial, to identify whether assigning cognitive behavioral therapy (CBT) to criminal justice-involved adults with mental illness using an ODTR significantly reduces the probability of recidivism, compared to assigning CBT in a non-individualized way.</p>","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2022-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9722979/pdf/ijb-19-1-ijb-2020-0128.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9941666","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sybil Prince Nelson, Viswanathan Ramakrishnan, Paul Nietert, Diane Kamen, Paula Ramos, Bethany Wolf
{"title":"A comparison of joint dichotomization and single dichotomization of interacting variables to discriminate a disease outcome.","authors":"Sybil Prince Nelson, Viswanathan Ramakrishnan, Paul Nietert, Diane Kamen, Paula Ramos, Bethany Wolf","doi":"10.1515/ijb-2021-0071","DOIUrl":"10.1515/ijb-2021-0071","url":null,"abstract":"<p><p>Dichotomization is often used on clinical and diagnostic settings to simplify interpretation. For example, a person with systolic and diastolic blood pressure above 140 over 90 may be prescribed medication. Blood pressure as well as other factors such as age and cholesterol and their interactions may lead to increased risk of certain diseases. When using a dichotomized variable to determine a diagnosis, if the interactions with other variables are not considered, then an incorrect threshold for the continuous variable may be selected. In this paper, we compare single dichotomization with joint dichotomization; the process of simultaneously optimizing cutpoints for multiple variables. A simulation study shows that simultaneous dichotomization of continuous variables is more accurate in recovering both 'true' thresholds given they exist.</p>","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2022-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10198136/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9847323","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Youngjoo Cho, Annette M Molinaro, Chen Hu, Robert L Strawderman
{"title":"Regression trees and ensembles for cumulative incidence functions.","authors":"Youngjoo Cho, Annette M Molinaro, Chen Hu, Robert L Strawderman","doi":"10.1515/ijb-2021-0014","DOIUrl":"10.1515/ijb-2021-0014","url":null,"abstract":"<p><p>The use of cumulative incidence functions for characterizing the risk of one type of event in the presence of others has become increasingly popular over the past two decades. The problems of modeling, estimation and inference have been treated using parametric, nonparametric and semi-parametric methods. Efforts to develop suitable extensions of machine learning methods, such as regression trees and ensemble methods, have begun comparatively recently. In this paper, we propose a novel approach to estimating cumulative incidence curves in a competing risks setting using regression trees and associated ensemble estimators. The proposed methods use augmented estimators of the Brier score risk as the primary basis for building and pruning trees, and lead to methods that are easily implemented using existing R packages. Data from the Radiation Therapy Oncology Group (trial 9410) is used to illustrate these new methods.</p>","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2022-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9509494/pdf/ijb-18-2-ijb-2021-0014.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10494393","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Causal inference under over-simplified longitudinal causal models.","authors":"Lola Étiévant, Vivian Viallon","doi":"10.1515/ijb-2020-0081","DOIUrl":"10.1515/ijb-2020-0081","url":null,"abstract":"<p><p>Many causal models of interest in epidemiology involve longitudinal exposures, confounders and mediators. However, repeated measurements are not always available or used in practice, leading analysts to overlook the time-varying nature of exposures and work under over-simplified causal models. Our objective is to assess whether - and how - causal effects identified under such misspecified causal models relates to true causal effects of interest. We derive sufficient conditions ensuring that the quantities estimated in practice under over-simplified causal models can be expressed as weighted averages of longitudinal causal effects of interest. Unsurprisingly, these sufficient conditions are very restrictive, and our results state that the quantities estimated in practice should be interpreted with caution in general, as they usually do not relate to any longitudinal causal effect of interest. Our simulations further illustrate that the bias between the quantities estimated in practice and the weighted averages of longitudinal causal effects of interest can be substantial. Overall, our results confirm the need for repeated measurements to conduct proper analyses and/or the development of sensitivity analyses when they are not available.</p>","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10492690","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}