Statistical Applications in Genetics and Molecular Biology最新文献_第7页

Testing genotypes-phenotype relationships using permutation tests on association rules. 使用关联规则上的排列测试测试基因型-表型关系。

IF 0.9 4区数学

Statistical Applications in Genetics and Molecular Biology Pub Date : 2015-02-01 DOI: 10.1515/sagmb-2014-0033

Mateen Shaikh, Joseph Beyene

引用次数: 2

A Bayesian mixture model for chromatin interaction data. 染色质相互作用数据的贝叶斯混合模型。

IF 0.9 4区数学

Statistical Applications in Genetics and Molecular Biology Pub Date : 2015-02-01 DOI: 10.1515/sagmb-2014-0029

Liang Niu, Shili Lin

{"title":"A Bayesian mixture model for chromatin interaction data.","authors":"Liang Niu, Shili Lin","doi":"10.1515/sagmb-2014-0029","DOIUrl":"https://doi.org/10.1515/sagmb-2014-0029","url":null,"abstract":"Chromatin interactions mediated by a particular protein are of interest for studying gene regulation, especially the regulation of genes that are associated with, or known to be causative of, a disease. A recent molecular technique, Chromatin interaction analysis by paired-end tag sequencing (ChIA-PET), that uses chromatin immunoprecipitation (ChIP) and high throughput paired-end sequencing, is able to detect such chromatin interactions genomewide. However, ChIA-PET may generate noise (i.e., pairings of DNA fragments by random chance) in addition to true signal (i.e., pairings of DNA fragments by interactions). In this paper, we propose MC_DIST based on a mixture modeling framework to identify true chromatin interactions from ChIA-PET count data (counts of DNA fragment pairs). The model is cast into a Bayesian framework to take into account the dependency among the data and the available information on protein binding sites and gene promoters to reduce false positives. A simulation study showed that MC_DIST outperforms the previously proposed hypergeometric model in terms of both power and type I error rate. A real data study showed that MC_DIST may identify potential chromatin interactions between protein binding sites and gene promoters that may be missed by the hypergeometric model. An R package implementing the MC_DIST model is available at http://www.stat.osu.edu/~statgen/SOFTWARE/MDM.","PeriodicalId":48980,"journal":{"name":"Statistical Applications in Genetics and Molecular Biology","volume":"14 1","pages":"53-64"},"PeriodicalIF":0.9,"publicationDate":"2015-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/sagmb-2014-0029","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"32890093","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

A region-based multiple testing method for hypotheses ordered in space or time. 一种在空间或时间上有序的假设的基于区域的多重检验方法。

IF 0.9 4区数学

Statistical Applications in Genetics and Molecular Biology Pub Date : 2015-02-01 DOI: 10.1515/sagmb-2013-0075

Rosa J Meijer, Thijmen J P Krebs, Jelle J Goeman

引用次数: 12

A hidden Markov-model for gene mapping based on whole-genome next generation sequencing data. 基于下一代全基因组测序数据的基因定位隐马尔可夫模型。

IF 0.9 4区数学

Statistical Applications in Genetics and Molecular Biology Pub Date : 2015-02-01 DOI: 10.1515/sagmb-2014-0007

Jürgen Claesen, Tomasz Burzykowski

{"title":"A hidden Markov-model for gene mapping based on whole-genome next generation sequencing data.","authors":"Jürgen Claesen, Tomasz Burzykowski","doi":"10.1515/sagmb-2014-0007","DOIUrl":"https://doi.org/10.1515/sagmb-2014-0007","url":null,"abstract":"The analysis of polygenic, phenotypic characteristics such as quantitative traits or inheritable diseases requires reliable scoring of many genetic markers covering the entire genome. The advent of high-throughput sequencing technologies provides a new way to evaluate large numbers of single nucleotide polymorphisms as genetic markers. Combining the technologies with pooling of segregants, as performed in bulk segregant analysis, should, in principle, allow the simultaneous mapping of multiple genetic loci present throughout the genome. We propose a hidden Markov-model to analyze the marker data obtained by the bulk segregant next generation sequencing. The model includes several states, each associated with a different probability of observing the same/different nucleotide in an offspring as compared to the parent. The transitions between the molecular markers imply transitions between the states of the model. After estimating the transition probabilities and state-related probabilities of nucleotide (dis)similarity, the most probable state for each SNP is selected. The most probable states can then be used to indicate which genomic regions may be likely to contain trait-related genes. The application of the model is illustrated on the data from a study of ethanol tolerance in yeast. Software is written in R. R-functions, R-scripts and documentation are available on www.ibiostat.be/software/bioinformatics.","PeriodicalId":48980,"journal":{"name":"Statistical Applications in Genetics and Molecular Biology","volume":"14 1","pages":"21-34"},"PeriodicalIF":0.9,"publicationDate":"2015-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/sagmb-2014-0007","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"32885132","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

Exploring homogeneity of correlation structures of gene expression datasets within and between etiological disease categories. 探索病原学疾病类别内部和之间基因表达数据集相关结构的同质性。

IF 0.9 4区数学

Statistical Applications in Genetics and Molecular Biology Pub Date : 2014-12-01 DOI: 10.1515/sagmb-2014-0003

Victor L Jong, Putri W Novianti, Kit C B Roes, Marinus J C Eijkemans

{"title":"Exploring homogeneity of correlation structures of gene expression datasets within and between etiological disease categories.","authors":"Victor L Jong, Putri W Novianti, Kit C B Roes, Marinus J C Eijkemans","doi":"10.1515/sagmb-2014-0003","DOIUrl":"https://doi.org/10.1515/sagmb-2014-0003","url":null,"abstract":"The literature shows that classifiers perform differently across datasets and that correlations within datasets affect the performance of classifiers. The question that arises is whether the correlation structure within datasets differ significantly across diseases. In this study, we evaluated the homogeneity of correlation structures within and between datasets of six etiological disease categories; inflammatory, immune, infectious, degenerative, hereditary and acute myeloid leukemia (AML). We also assessed the effect of filtering; detection call and variance filtering on correlation structures. We downloaded microarray datasets from ArrayExpress for experiments meeting predefined criteria and ended up with 12 datasets for non-cancerous diseases and six for AML. The datasets were preprocessed by a common procedure incorporating platform-specific recommendations and the two filtering methods mentioned above. Homogeneity of correlation matrices between and within datasets of etiological diseases was assessed using the Box's M statistic on permuted samples. We found that correlation structures significantly differ between datasets of the same and/or different etiological disease categories and that variance filtering eliminates more uncorrelated probesets than detection call filtering and thus renders the data highly correlated.","PeriodicalId":48980,"journal":{"name":"Statistical Applications in Genetics and Molecular Biology","volume":"13 6","pages":"717-32"},"PeriodicalIF":0.9,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/sagmb-2014-0003","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"32906123","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Covariate adjusted differential variability analysis of DNA methylation with propensity score method. 用倾向评分法分析DNA甲基化的协变量调整差异变异性。

IF 0.9 4区数学

Statistical Applications in Genetics and Molecular Biology Pub Date : 2014-12-01 DOI: 10.1515/sagmb-2013-0072

Pei Fen Kuan

引用次数: 1

P-value calibration for multiple testing problems in genomics. 基因组学中多重检测问题的p值校准。

IF 0.9 4区数学

Statistical Applications in Genetics and Molecular Biology Pub Date : 2014-12-01 DOI: 10.1515/sagmb-2013-0074

John P Ferguson, Dean Palejev

引用次数: 1

When is Menzerath-Altmann law mathematically trivial? A new approach. 什么时候Menzerath-Altmann定律在数学上是微不足道的?一种新的方法。

IF 0.9 4区数学

Statistical Applications in Genetics and Molecular Biology Pub Date : 2014-12-01 DOI: 10.1515/sagmb-2013-0034

Ramon Ferrer-i-Cancho, Antoni Hernández-Fernández, Jaume Baixeries, Łukasz Dębowski, Ján Mačutek

{"title":"When is Menzerath-Altmann law mathematically trivial? A new approach.","authors":"Ramon Ferrer-i-Cancho, Antoni Hernández-Fernández, Jaume Baixeries, Łukasz Dębowski, Ján Mačutek","doi":"10.1515/sagmb-2013-0034","DOIUrl":"https://doi.org/10.1515/sagmb-2013-0034","url":null,"abstract":"Menzerath's law, the tendency of Z (the mean size of the parts) to decrease as X (the number of parts) increases, is found in language, music and genomes. Recently, it has been argued that the presence of the law in genomes is an inevitable consequence of the fact that Z=Y/X, which would imply that Z scales with X as Z ∼ 1/X. That scaling is a very particular case of Menzerath-Altmann law that has been rejected by means of a correlation test between X and Y in genomes, being X the number of chromosomes of a species, Y its genome size in bases and Z the mean chromosome size. Here we review the statistical foundations of that test and consider three non-parametric tests based upon different correlation metrics and one parametric test to evaluate if Z ∼ 1/X in genomes. The most powerful test is a new non-parametric one based upon the correlation ratio, which is able to reject Z ∼ 1/X in nine out of 11 taxonomic groups and detect a borderline group. Rather than a fact, Z ∼ 1/X is a baseline that real genomes do not meet. The view of Menzerath-Altmann law as inevitable is seriously flawed.","PeriodicalId":48980,"journal":{"name":"Statistical Applications in Genetics and Molecular Biology","volume":"13 6","pages":"633-44"},"PeriodicalIF":0.9,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/sagmb-2013-0034","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"32906121","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 22

Markovianness and conditional independence in annotated bacterial DNA. 注释细菌DNA的马尔可夫性和条件独立性。

IF 0.9 4区数学

Statistical Applications in Genetics and Molecular Biology Pub Date : 2014-12-01 DOI: 10.1515/sagmb-2014-0002

Andrew Hart, Servet Martínez

引用次数: 5

Robust methods to detect disease-genotype association in genetic association studies: calculate p-values using exact conditional enumeration instead of simulated permutations or asymptotic approximations. 在遗传关联研究中检测疾病-基因型关联的鲁棒方法:使用精确条件枚举而不是模拟排列或渐近近似计算p值。

IF 0.9 4区数学

Statistical Applications in Genetics and Molecular Biology Pub Date : 2014-12-01 DOI: 10.1515/sagmb-2013-0084

Mette Langaas, Øyvind Bakke

{"title":"Robust methods to detect disease-genotype association in genetic association studies: calculate p-values using exact conditional enumeration instead of simulated permutations or asymptotic approximations.","authors":"Mette Langaas, Øyvind Bakke","doi":"10.1515/sagmb-2013-0084","DOIUrl":"https://doi.org/10.1515/sagmb-2013-0084","url":null,"abstract":"In genetic association studies, detecting disease-genotype association is a primary goal. We study seven robust test statistics for such association when the underlying genetic model is unknown, for data on disease status (case or control) and genotype (three genotypes of a biallelic genetic marker). In such studies, p-values have predominantly been calculated by asymptotic approximations or by simulated permutations. We consider an exact method, conditional enumeration. When the number of simulated permutations tends to infinity, the permutation p-value approaches the conditional enumeration p-value, but calculating the latter is much more efficient than performing simulated permutations. We have studied case-control sample sizes with 500-5000 cases and 500-15,000 controls, and significance levels from 5 × 10(-8) to 0.05, thus our results are applicable to genetic association studies with only a few genetic markers under study, intermediate follow-up studies, and genome-wide association studies. Our main findings are: (i) If all monotone genetic models are of interest, the best performance in the situations under study is achieved for the robust test statistics based on the maximum over a range of Cochran-Armitage trend tests with different scores and for the constrained likelihood ratio test. (ii) For significance levels below 0.05, for the test statistics under study, asymptotic approximations may give a test size up to 20 times the nominal level, and should therefore be used with caution. (iii) Calculating p-values based on exact conditional enumeration is a powerful, valid and computationally feasible approach, and we advocate its use in genetic association studies.","PeriodicalId":48980,"journal":{"name":"Statistical Applications in Genetics and Molecular Biology","volume":"13 6","pages":"675-92"},"PeriodicalIF":0.9,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/sagmb-2013-0084","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"32755665","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6