Statistical Applications in Genetics and Molecular Biology最新文献_第10页

AGGrEGATOr: A Gene-based GEne-Gene interActTiOn test for case-control association studies AGGrEGATOr:一种基于基因的基因-基因相互作用试验，用于病例对照关联研究

IF 0.9 4区数学

Statistical Applications in Genetics and Molecular Biology Pub Date : 2016-04-01 DOI: 10.1515/sagmb-2015-0074

M. Emily

{"title":"AGGrEGATOr: A Gene-based GEne-Gene interActTiOn test for case-control association studies","authors":"M. Emily","doi":"10.1515/sagmb-2015-0074","DOIUrl":"https://doi.org/10.1515/sagmb-2015-0074","url":null,"abstract":"Abstract Among the large of number of statistical methods that have been proposed to identify gene-gene interactions in case-control genome-wide association studies (GWAS), gene-based methods have recently grown in popularity as they confer advantage in both statistical power and biological interpretation. All of the gene-based methods jointly model the distribution of single nucleotide polymorphisms (SNPs) sets prior to the statistical test, leading to a limited power to detect sums of SNP-SNP signals. In this paper, we instead propose a gene-based method that first performs SNP-SNP interaction tests before aggregating the obtained p-values into a test at the gene level. Our method called AGGrEGATOr is based on a minP procedure that tests the significance of the minimum of a set of p-values. We use simulations to assess the capacity of AGGrEGATOr to correctly control for type-I error. The benefits of our approach in terms of statistical power and robustness to SNPs set characteristics are evaluated in a wide range of disease models by comparing it to previous methods. We also apply our method to detect gene pairs associated to rheumatoid arthritis (RA) on the GSE39428 dataset. We identify 13 potential gene-gene interactions and replicate one gene pair in the Wellcome Trust Case Control Consortium dataset at the level of 5%. We further test 15 gene pairs, previously reported as being statistically associated with RA or Crohn’s disease (CD) or coronary artery disease (CAD), for replication in the Wellcome Trust Case Control Consortium dataset. We show that AGGrEGATOr is the only method able to successfully replicate seven gene pairs.","PeriodicalId":49477,"journal":{"name":"Statistical Applications in Genetics and Molecular Biology","volume":"15 1","pages":"151 - 171"},"PeriodicalIF":0.9,"publicationDate":"2016-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/sagmb-2015-0074","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"67003013","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 14

What if we ignore the random effects when analyzing RNA-seq data in a multifactor experiment 如果我们在多因素实验中分析RNA-seq数据时忽略随机效应会怎样

IF 0.9 4区数学

Statistical Applications in Genetics and Molecular Biology Pub Date : 2016-04-01 DOI: 10.1515/sagmb-2015-0011

Shiqi Cui, Tieming Ji, Jilong Li, J. Cheng, Jing Qiu

{"title":"What if we ignore the random effects when analyzing RNA-seq data in a multifactor experiment","authors":"Shiqi Cui, Tieming Ji, Jilong Li, J. Cheng, Jing Qiu","doi":"10.1515/sagmb-2015-0011","DOIUrl":"https://doi.org/10.1515/sagmb-2015-0011","url":null,"abstract":"Abstract Identifying differentially expressed (DE) genes between different conditions is one of the main goals of RNA-seq data analysis. Although a large amount of RNA-seq data were produced for two-group comparison with small sample sizes at early stage, more and more RNA-seq data are being produced in the setting of complex experimental designs such as split-plot designs and repeated measure designs. Data arising from such experiments are traditionally analyzed by mixed-effects models. Therefore an appropriate statistical approach for analyzing RNA-seq data from such designs should be generalized linear mixed models (GLMM) or similar approaches that allow for random effects. However, common practices for analyzing such data in literature either treat random effects as fixed or completely ignore the experimental design and focus on two-group comparison using partial data. In this paper, we examine the effect of ignoring the random effects when analyzing RNA-seq data. We accomplish this goal by comparing the standard GLMM model to the methods that ignore the random effects through simulation studies and real data analysis. Our studies show that, ignoring random effects in a multi-factor experiment can lead to the increase of the false positives among the top selected genes or lower power when the nominal FDR level is controlled.","PeriodicalId":49477,"journal":{"name":"Statistical Applications in Genetics and Molecular Biology","volume":"15 1","pages":"105 - 87"},"PeriodicalIF":0.9,"publicationDate":"2016-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/sagmb-2015-0011","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"67002901","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 23

Comparing five statistical methods of differential methylation identification using bisulfite sequencing data 亚硫酸酯测序数据差异甲基化鉴定的五种统计方法比较

IF 0.9 4区数学

Statistical Applications in Genetics and Molecular Biology Pub Date : 2016-04-01 DOI: 10.1515/sagmb-2015-0078

Xiaoqing Yu, Shuying Sun

{"title":"Comparing five statistical methods of differential methylation identification using bisulfite sequencing data","authors":"Xiaoqing Yu, Shuying Sun","doi":"10.1515/sagmb-2015-0078","DOIUrl":"https://doi.org/10.1515/sagmb-2015-0078","url":null,"abstract":"Abstract We are presenting a comprehensive comparative analysis of five differential methylation (DM) identification methods: methylKit, BSmooth, BiSeq, HMM-DM, and HMM-Fisher, which are developed for bisulfite sequencing (BS) data. We summarize the features of these methods from several analytical aspects and compare their performances using both simulated and real BS datasets. Our comparison results are summarized below. First, parameter settings may largely affect the accuracy of DM identification. Different from default settings, modified parameter settings yield higher sensitivity and/or lower false positive rates. Second, all five methods show more accurate results when identifying simulated DM regions that are long and have small within-group variation, but they have low concordance, probably due to the different approaches they have used for DM identification. Third, HMM-DM and HMM-Fisher yield relatively higher sensitivity and lower false positive rates than others, especially in DM regions with large variation. Finally, we have found that among the three methods that involve methylation estimation (methylKit, BSmooth, and BiSeq), BiSeq can best present raw methylation signals. Therefore, based on these results, we suggest that users select DM identification methods based on the characteristics of their data and the advantages of each method.","PeriodicalId":49477,"journal":{"name":"Statistical Applications in Genetics and Molecular Biology","volume":"15 1","pages":"173 - 191"},"PeriodicalIF":0.9,"publicationDate":"2016-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/sagmb-2015-0078","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"67003069","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 20

Belief propagation in genotype-phenotype networks 基因型-表现型网络中的信念传播

IF 0.9 4区数学

Statistical Applications in Genetics and Molecular Biology Pub Date : 2016-03-01 DOI: 10.1515/sagmb-2015-0058

Janhavi Moharil, Paul May, D. Gaile, R. Blair

{"title":"Belief propagation in genotype-phenotype networks","authors":"Janhavi Moharil, Paul May, D. Gaile, R. Blair","doi":"10.1515/sagmb-2015-0058","DOIUrl":"https://doi.org/10.1515/sagmb-2015-0058","url":null,"abstract":"Abstract Graphical models have proven to be a valuable tool for connecting genotypes and phenotypes. Structural learning of phenotype-genotype networks has received considerable attention in the post-genome era. In recent years, a dozen different methods have emerged for network inference, which leverage natural variation that arises in certain genetic populations. The structure of the network itself can be used to form hypotheses based on the inferred direct and indirect network relationships, but represents a premature endpoint to the graphical analyses. In this work, we extend this endpoint. We examine the unexplored problem of perturbing a given network structure, and quantifying the system-wide effects on the network in a node-wise manner. The perturbation is achieved through the setting of values of phenotype node(s), which may reflect an inhibition or activation, and propagating this information through the entire network. We leverage belief propagation methods in Conditional Gaussian Bayesian Networks (CG-BNs), in order to absorb and propagate phenotypic evidence through the network. We show that the modeling assumptions adopted for genotype-phenotype networks represent an important sub-class of CG-BNs, which possess properties that ensure exact inference in the propagation scheme. The system-wide effects of the perturbation are quantified in a node-wise manner through the comparison of perturbed and unperturbed marginal distributions using a symmetric Kullback-Leibler divergence. Applications to kidney and skin cancer expression quantitative trait loci (eQTL) data from different mus musculus populations are presented. System-wide effects in the network were predicted and visualized across a spectrum of evidence. Sub-pathways and regions of the network responded in concert, suggesting co-regulation and coordination throughout the network in response to phenotypic changes. We demonstrate how these predicted system-wide effects can be examined in connection with estimated class probabilities for covariates of interest, e.g. cancer status. Despite the uncertainty in the network structure, we demonstrate the system-wide predictions are stable across an ensemble of highly likely networks. A software package, geneNetBP, which implements our approach, was developed in the R programming language.","PeriodicalId":49477,"journal":{"name":"Statistical Applications in Genetics and Molecular Biology","volume":"15 1","pages":"39 - 53"},"PeriodicalIF":0.9,"publicationDate":"2016-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/sagmb-2015-0058","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"67002956","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Identification of consistent functional genetic modules 鉴定一致的功能遗传模块

IF 0.9 4区数学

Statistical Applications in Genetics and Molecular Biology Pub Date : 2016-03-01 DOI: 10.1515/sagmb-2015-0026

J. Miecznikowski, D. Gaile, Xiwei Chen, D. Tritchler

引用次数: 6

HMM-DM: identifying differentially methylated regions using a hidden Markov model HMM-DM:使用隐马尔可夫模型识别差异甲基化区域

IF 0.9 4区数学

Statistical Applications in Genetics and Molecular Biology Pub Date : 2016-03-01 DOI: 10.1515/sagmb-2015-0077

Xiaoqing Yu, Shuying Sun

引用次数: 30

HMM-Fisher: identifying differential methylation using a hidden Markov model and Fisher’s exact test HMM-Fisher:使用隐马尔可夫模型和Fisher的精确检验来识别差异甲基化

IF 0.9 4区数学

Statistical Applications in Genetics and Molecular Biology Pub Date : 2016-03-01 DOI: 10.1515/sagmb-2015-0076

Shuying Sun, Xiaoqing Yu

{"title":"HMM-Fisher: identifying differential methylation using a hidden Markov model and Fisher’s exact test","authors":"Shuying Sun, Xiaoqing Yu","doi":"10.1515/sagmb-2015-0076","DOIUrl":"https://doi.org/10.1515/sagmb-2015-0076","url":null,"abstract":"Abstract DNA methylation is an epigenetic event that plays an important role in regulating gene expression. It is important to study DNA methylation, especially differential methylation patterns between two groups of samples (e.g. patients vs. normal individuals). With next generation sequencing technologies, it is now possible to identify differential methylation patterns by considering methylation at the single CG site level in an entire genome. However, it is challenging to analyze large and complex NGS data. In order to address this difficult question, we have developed a new statistical method using a hidden Markov model and Fisher’s exact test (HMM-Fisher) to identify differentially methylated cytosines and regions. We first use a hidden Markov chain to model the methylation signals to infer the methylation state as Not methylated (N), Partly methylated (P), and Fully methylated (F) for each individual sample. We then use Fisher’s exact test to identify differentially methylated CG sites. We show the HMM-Fisher method and compare it with commonly cited methods using both simulated data and real sequencing data. The results show that HMM-Fisher outperforms the current available methods to which we have compared. HMM-Fisher is efficient and robust in identifying heterogeneous DM regions.","PeriodicalId":49477,"journal":{"name":"Statistical Applications in Genetics and Molecular Biology","volume":"15 1","pages":"55 - 67"},"PeriodicalIF":0.9,"publicationDate":"2016-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/sagmb-2015-0076","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"67003025","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 24

MDI-GPU: accelerating integrative modelling for genomic-scale data using GP-GPU computing MDI-GPU:使用GP-GPU计算加速基因组尺度数据的集成建模

IF 0.9 4区数学

Statistical Applications in Genetics and Molecular Biology Pub Date : 2016-02-24 DOI: 10.1515/sagmb-2015-0055

Samuel A. Mason, Faiz Sayyid, Paul D. W. Kirk, Colin Starr, D. Wild

引用次数: 8

Homology cluster differential expression analysis for interspecies mRNA-Seq experiments 种间mRNA-Seq实验的同源聚类差异表达分析

IF 0.9 4区数学

Statistical Applications in Genetics and Molecular Biology Pub Date : 2015-12-01 DOI: 10.1515/sagmb-2014-0056

J. Gelfond, J. Ibrahim, Ming-Hui Chen, Wei Sun, Kaitlyn N. Lewis, Sean Kinahan, Matthew A. Hibbs, R. Buffenstein

{"title":"Homology cluster differential expression analysis for interspecies mRNA-Seq experiments","authors":"J. Gelfond, J. Ibrahim, Ming-Hui Chen, Wei Sun, Kaitlyn N. Lewis, Sean Kinahan, Matthew A. Hibbs, R. Buffenstein","doi":"10.1515/sagmb-2014-0056","DOIUrl":"https://doi.org/10.1515/sagmb-2014-0056","url":null,"abstract":"Abstract There is an increasing demand for exploration of the transcriptomes of multiple species with extraordinary traits such as the naked-mole rat (NMR). The NMR is remarkable because of its longevity and resistance to developing cancer. It is of scientific interest to understand the molecular mechanisms that impart these traits, and RNA-sequencing experiments with comparator species can correlate transcriptome dynamics with these phenotypes. Comparing transcriptome differences requires a homology mapping of each transcript in one species to transcript(s) within the other. Such mappings are necessary, especially if one species does not have well-annotated genome available. Current approaches for this type of analysis typically identify the best match for each transcript, but the best match analysis ignores the inherent risks of mismatch when there are multiple candidate transcripts with similar homology scores. We present a method that treats the set of homologs from a novel species as a cluster corresponding to a single gene in the reference species, and we compare the cluster-based approach to a conventional best-match analysis in both simulated data and a case study with NMR and mouse tissues. We demonstrate that the cluster-based approach has superior power to detect differential expression.","PeriodicalId":49477,"journal":{"name":"Statistical Applications in Genetics and Molecular Biology","volume":"14 1","pages":"507 - 516"},"PeriodicalIF":0.9,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/sagmb-2014-0056","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"67002383","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

On the validity of within-nuclear-family genetic association analysis in samples of extended families 核心家族遗传关联分析在大家庭样本中的有效性探讨

IF 0.9 4区数学

Statistical Applications in Genetics and Molecular Biology Pub Date : 2015-12-01 DOI: 10.1515/sagmb-2015-0056

A. Bureau, T. Duchesne

{"title":"On the validity of within-nuclear-family genetic association analysis in samples of extended families","authors":"A. Bureau, T. Duchesne","doi":"10.1515/sagmb-2015-0056","DOIUrl":"https://doi.org/10.1515/sagmb-2015-0056","url":null,"abstract":"Abstract Splitting extended families into their component nuclear families to apply a genetic association method designed for nuclear families is a widespread practice in familial genetic studies. Dependence among genotypes and phenotypes of nuclear families from the same extended family arises because of genetic linkage of the tested marker with a risk variant or because of familial specificity of genetic effects due to gene-environment interaction. This raises concerns about the validity of inference conducted under the assumption of independence of the nuclear families. We indeed prove theoretically that, in a conditional logistic regression analysis applicable to disease cases and their genotyped parents, the naive model-based estimator of the variance of the coefficient estimates underestimates the true variance. However, simulations with realistic effect sizes of risk variants and variation of this effect from family to family reveal that the underestimation is negligible. The simulations also show the greater efficiency of the model-based variance estimator compared to a robust empirical estimator. Our recommendation is therefore, to use the model-based estimator of variance for inference on effects of genetic variants.","PeriodicalId":49477,"journal":{"name":"Statistical Applications in Genetics and Molecular Biology","volume":"14 1","pages":"533 - 549"},"PeriodicalIF":0.9,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/sagmb-2015-0056","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"67002948","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3