{"title":"Penalized logistic regression with prior information for microarray gene expression classification.","authors":"Murat Genç","doi":"10.1515/ijb-2022-0025","DOIUrl":"10.1515/ijb-2022-0025","url":null,"abstract":"<p><p>Cancer classification and gene selection are important applications in DNA microarray gene expression data analysis. Since DNA microarray data suffers from the high-dimensionality problem, automatic gene selection methods are used to enhance the classification performance of expert classifier systems. In this paper, a new penalized logistic regression method that performs simultaneous gene coefficient estimation and variable selection in DNA microarray data is discussed. The method employs prior information about the gene coefficients to improve the classification accuracy of the underlying model. The coordinate descent algorithm with screening rules is given to obtain the gene coefficient estimates of the proposed method efficiently. The performance of the method is examined on five high-dimensional cancer classification datasets using the area under the curve, the number of selected genes, misclassification rate and F-score measures. The real data analysis results indicate that the proposed method achieves a good cancer classification performance with a small misclassification rate, large area under the curve and F-score by trading off some sparsity level of the underlying model. Hence, the proposed method can be seen as a reliable penalized logistic regression method in the scope of high-dimensional cancer classification.</p>","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2022-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"40707390","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Masoumeh Shirozhan, Naushad A Mamode Khan, Célestin C Kokonendji
{"title":"The balanced discrete triplet Lindley model and its INAR(1) extension: properties and COVID-19 applications.","authors":"Masoumeh Shirozhan, Naushad A Mamode Khan, Célestin C Kokonendji","doi":"10.1515/ijb-2022-0001","DOIUrl":"10.1515/ijb-2022-0001","url":null,"abstract":"<p><p>This paper proposes a new flexible discrete triplet Lindley model that is constructed from the balanced discretization principle of the extended Lindley distribution. This model has several appealing statistical properties in terms of providing exact and closed form moment expressions and handling all forms of dispersion. Due to these, this paper explores further the usage of the discrete triplet Lindley as an innovation distribution in the simple integer-valued autoregressive process (INAR(1)). This subsequently allows for the modeling of count time series observations. In this context, a novel INAR(1) process is developed under mixed Binomial and the Pegram thinning operators. The model parameters of the INAR(1) process are estimated using the conditional maximum likelihood and Yule-Walker approaches. Some Monte Carlo simulation experiments are executed to assess the consistency of the estimators under the two estimation approaches. Interestingly, the proposed INAR(1) process is applied to analyze the COVID-19 cases and death series of different countries where it yields reliable parameter estimates and suitable forecasts via the modified Sieve bootstrap technique. On the other side, the new INAR(1) with discrete triplet Lindley innovations competes comfortably with other established INAR(1)s in the literature.</p>","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2022-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"40721904","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Potential application of elastic nets for shared polygenicity detection with adapted threshold selection.","authors":"Majnu John, Todd Lencz","doi":"10.1515/ijb-2020-0108","DOIUrl":"10.1515/ijb-2020-0108","url":null,"abstract":"<p><p>Current research suggests that hundreds to thousands of single nucleotide polymorphisms (SNPs) with small to modest effect sizes contribute to the genetic basis of many disorders, a phenomenon labeled as polygenicity. Additionally, many such disorders demonstrate polygenic overlap, in which risk alleles are shared at associated genetic loci. A simple strategy to detect polygenic overlap between two phenotypes is based on rank-ordering the univariate <i>p</i>-values from two genome-wide association studies (GWASs). Although high-dimensional variable selection strategies such as Lasso and elastic nets have been utilized in other GWAS analysis settings, they are yet to be utilized for detecting shared polygenicity. In this paper, we illustrate how elastic nets, with polygenic scores as the dependent variable and with appropriate adaptation in selecting the penalty parameter, may be utilized for detecting a subset of SNPs involved in shared polygenicity. We provide theory to better understand our approaches, and illustrate their utility using synthetic datasets. Results from extensive simulations are presented comparing the elastic net approaches with the rank ordering approach, in various scenarios. Results from simulations studies exhibit one of the elastic net approaches to be superior when the correlations among the SNPs are high. Finally, we apply the methods on two real datasets to illustrate further the capabilities, limitations and differences among the methods.</p>","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2022-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10154439/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9401096","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Bayesian semiparametric accelerate failure time mixture cure model.","authors":"Yijun Wang, Weiwei Wang, Yincai Tang","doi":"10.1515/ijb-2021-0012","DOIUrl":"https://doi.org/10.1515/ijb-2021-0012","url":null,"abstract":"<p><p>The accelerated failure time mixture cure (AFTMC) model is widely used for survival data when a portion of patients can be cured. In this paper, a Bayesian semiparametric method is proposed to obtain the estimation of parameters and density distribution for both the cure probability and the survival distribution of the uncured patients in the AFTMC model. Specifically, the baseline error distribution of the uncured patients is nonparametrically modeled by a mixture of Dirichlet process. Based on the stick-breaking formulation of the Dirichlet process, the techniques of retrospective and slice sampling, an efficient and easy-to-implement Gibbs sampler is developed for the posterior calculation. The proposed approach can be easily implemented in commonly used statistical softwares, and its performance is comparable to fully parametric method via comprehensive simulation studies. Besides, the proposed approach is adopted to the analysis of a colorectal cancer clinical trial data.</p>","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10550584","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Asma Bahamyirou, Mireille E Schnitzer, Edward H Kennedy, Lucie Blais, Yi Yang
{"title":"Doubly robust adaptive LASSO for effect modifier discovery.","authors":"Asma Bahamyirou, Mireille E Schnitzer, Edward H Kennedy, Lucie Blais, Yi Yang","doi":"10.1515/ijb-2020-0073","DOIUrl":"https://doi.org/10.1515/ijb-2020-0073","url":null,"abstract":"<p><p>Effect modification occurs when the effect of a treatment on an outcome differsaccording to the level of some pre-treatment variable (the effect modifier). Assessing an effect modifier is not a straight-forward task even for a subject matter expert. In this paper, we propose a two-stageprocedure to automatically selecteffect modifying variables in a Marginal Structural Model (MSM) with a single time point exposure based on the two nuisance quantities (the conditionaloutcome expectation and propensity score). We highlight the performance of our proposal in a simulation study. Finally, to illustrate tractability of our proposed methods, we apply them to analyze a set of pregnancy data. We estimate the conditional expected difference in the counterfactual birth weight if all women were exposed to inhaled corticosteroids during pregnancy versus the counterfactual birthweight if all women were not, using data from asthma medications during pregnancy.</p>","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10550593","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Causal inference under interference with prognostic scores for dynamic group therapy studies.","authors":"Bing Han, Susan M Paddock, Lane Burgette","doi":"10.1515/ijb-2019-0126","DOIUrl":"https://doi.org/10.1515/ijb-2019-0126","url":null,"abstract":"<p><p>Group therapy is a common treatment modality for behavioral health conditions. Patients often enter and exit groups on an ongoing basis, leading to dynamic therapy groups. Examining the effect of high versus low session attendance on patient outcomes is a research question of interest. However, there are several challenges to identifying causal effects in this setting, including the lack of randomization, interference among patients, and the interrelatedness of patient participation. Dynamic therapy groups motivate a unique causal inference scenario, as the treatment statuses are completely defined by the patient attendance record for the therapy session, which is also the structure inducing interference. We adopt the Rubin causal model framework to define the causal effect of high versus low session attendance of group therapy at both the individual patient and peer levels. We propose a strategy to identify individual, peer, and total effects of high attendance versus low attendance on patient outcomes by the prognostic score stratification. We examine performance of our approach via simulation and apply it to data from a group cognitive behavioral therapy trial for treating depression among patients in a substance use disorders treatment setting.</p>","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9973534/pdf/nihms-1876458.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10866225","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The number of response categories in ordered response models.","authors":"Maria Iannario, Anna Clara Monti, Pietro Scalera","doi":"10.1515/ijb-2021-0013","DOIUrl":"https://doi.org/10.1515/ijb-2021-0013","url":null,"abstract":"<p><p>The choice of the number <i>m</i> of response categories is a crucial issue in categorization of a continuous response. The paper exploits the Proportional Odds Models' property which allows to generate ordinal responses with a different number of categories from the same underlying variable. It investigates the asymptotic efficiency of the estimators of the regression coefficients and the accuracy of the derived inferential procedures when <i>m</i> varies. The analysis is based on models with closed-form information matrices so that the asymptotic efficiency can be analytically evaluated without need of simulations. The paper proves that a finer categorization augments the information content of the data and consequently shows that the asymptotic efficiency and the power of the tests on the regression coefficients increase with <i>m</i>. The impact of the loss of information produced by merging categories on the efficiency of the estimators is also considered, highlighting its risks especially when performed in its extreme form of dichotomization. Furthermore, the appropriate value of <i>m</i> for various sample sizes is explored, pointing out that a large number of categories can offset the limited amount of information of a small sample by a better quality of the data. Finally, two case studies on the quality of life of chemotherapy patients and on the perception of pain, based on discretized continuous scales, illustrate the main findings of the paper.</p>","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10844729","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alejandro Schuler, David Walsh, Diana Hall, Jon Walsh, Charles Fisher
{"title":"Increasing the efficiency of randomized trial estimates via linear adjustment for a prognostic score.","authors":"Alejandro Schuler, David Walsh, Diana Hall, Jon Walsh, Charles Fisher","doi":"10.1515/ijb-2021-0072","DOIUrl":"https://doi.org/10.1515/ijb-2021-0072","url":null,"abstract":"<p><p>Estimating causal effects from randomized experiments is central to clinical research. Reducing the statistical uncertainty in these analyses is an important objective for statisticians. Registries, prior trials, and health records constitute a growing compendium of historical data on patients under standard-of-care that may be exploitable to this end. However, most methods for historical borrowing achieve reductions in variance by sacrificing strict type-I error rate control. Here, we propose a use of historical data that exploits linear covariate adjustment to improve the efficiency of trial analyses without incurring bias. Specifically, we train a prognostic model on the historical data, then estimate the treatment effect using a linear regression while adjusting for the trial subjects' predicted outcomes (their <i>prognostic scores</i>). We prove that, under certain conditions, this prognostic covariate adjustment procedure attains the minimum variance possible among a large class of estimators. When those conditions are not met, prognostic covariate adjustment is still more efficient than raw covariate adjustment and the gain in efficiency is proportional to a measure of the predictive accuracy of the prognostic model above and beyond the linear relationship with the raw covariates. We demonstrate the approach using simulations and a reanalysis of an Alzheimer's disease clinical trial and observe meaningful reductions in mean-squared error and the estimated variance. Lastly, we provide a simplified formula for asymptotic variance that enables power calculations that account for these gains. Sample size reductions between 10% and 30% are attainable when using prognostic models that explain a clinically realistic percentage of the outcome variance.</p>","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10844739","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hermine Biermé, Camille Constant, Anne Duittoz, Christine Georgelin
{"title":"Spike detection for calcium activity.","authors":"Hermine Biermé, Camille Constant, Anne Duittoz, Christine Georgelin","doi":"10.1515/ijb-2020-0043","DOIUrl":"https://doi.org/10.1515/ijb-2020-0043","url":null,"abstract":"<p><p>We present in this paper a global methodology for the spike detection in a biological context of fluorescence recording of GnRH-neurons calcium activity. For this purpose we first propose a simple stochastic model that could mimic experimental time series by considering an autoregressive AR(1) process with a linear trend and specific innovations involving spiking times. Estimators of parameters with asymptotic normality are established and used to set up a statistical test on estimated innovations in order to detect spikes. We compare several procedures and illustrate on biological data the performance of our procedure.</p>","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10550586","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Charlotte Castel, Cécile Sommen, Edouard Chatignoux, Yann Le Strat, Ahmadou Alioum
{"title":"Bayesian multi-response nonlinear mixed-effect model: application of two recent HIV infection biomarkers.","authors":"Charlotte Castel, Cécile Sommen, Edouard Chatignoux, Yann Le Strat, Ahmadou Alioum","doi":"10.1515/ijb-2021-0030","DOIUrl":"https://doi.org/10.1515/ijb-2021-0030","url":null,"abstract":"<p><p>Since the discovery of the human immunodeficiency virus (HIV) 35 years ago, the epidemic is still ongoing in France. To monitor the dynamics of HIV transmission and assess the impact of prevention campaigns, the main indicator is the incidence. One method to estimate the HIV incidence is based on biomarker values at diagnosis and their dynamics over time. Estimating the HIV incidence from biomarkers first requires modeling their dynamics since infection using external longitudinal data. The objective of the work presented here is to estimate the joint dynamics of two biomarkers from the PRIMO cohort. We thus jointly modeled the dynamics of two biomarkers (TM and V3) using a multi-response nonlinear mixed-effect model. The parameters were estimated using Bayesian Hamiltonian Monte Carlo inference. This procedure was first applied to the real data of the PRIMO cohort. In a simulation study, we then evaluated the performance of the Bayesian procedure for estimating the parameters of multi-response nonlinear mixed-effect models.</p>","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10485529","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}