The genome-wide association studies (GWAS) analysis, the most successful technique for discovering disease-related genetic variation, has some statistical concerns, including multiple testing, the correlation among variants (single-nucleotide polymorphisms) based on linkage disequilibrium and omitting the important variants when fitting the model with just one variant. To eliminate these problems in a small sample-size study, we used a sparse Bayesian learning model for finding bipolar disorder (BD) genetic variants.
This study used the Wellcome Trust Case Control Consortium data set, including 1998 BD cases and 1500 control samples, and after quality control, 380,628 variants were analysed. In this GWAS, a Bayesian logistic model with hierarchical shrinkage spike and slab priors was used, with all variants considered simultaneously in one model. In order to decrease the computational burden, an alternative inferential method, Bayesian variational inference, has been used.
Thirteen variants were selected as associated with BD. The three of them (rs7572953, rs1378850 and rs4148944) were reported in previous GWAS. Eight of which were related to hemogram parameters, such as lymphocyte percentage, plateletcrit and haemoglobin concentration. Among selected related genes, GABPA, ELF3 and JAM2 were enriched in the platelet-derived growth factor pathway. These three genes, along with APP, ARL8A, CDH23 and GPR37L1, could be differential diagnostic variants for BD.
By reducing the statistical restrictions of GWAS analysis, the application of the Bayesian variational spike and slab models can offer insight into the genetic link with BD even with a small sample size. To uncover related variations with other traits, this model needs to be further examined.