Statistical Applications in Genetics and Molecular Biology最新文献_第9页

A maximum likelihood approach to functional mapping of longitudinal binary traits. 纵向二元特征函数映射的最大似然方法。

IF 0.9 4区数学

Statistical Applications in Genetics and Molecular Biology Pub Date : 2012-11-22 DOI: 10.1515/1544-6115.1675

Chenguang Wang, Hongying Li, Zhong Wang, Yaqun Wang, Ningtao Wang, Zuoheng Wang, Rongling Wu

{"title":"A maximum likelihood approach to functional mapping of longitudinal binary traits.","authors":"Chenguang Wang, Hongying Li, Zhong Wang, Yaqun Wang, Ningtao Wang, Zuoheng Wang, Rongling Wu","doi":"10.1515/1544-6115.1675","DOIUrl":"https://doi.org/10.1515/1544-6115.1675","url":null,"abstract":"Despite their importance in biology and biomedicine, genetic mapping of binary traits that change over time has not been well explored. In this article, we develop a statistical model for mapping quantitative trait loci (QTLs) that govern longitudinal responses of binary traits. The model is constructed within the maximum likelihood framework by which the association between binary responses is modeled in terms of conditional log odds-ratios. With this parameterization, the maximum likelihood estimates (MLEs) of marginal mean parameters are robust to the misspecification of time dependence. We implement an iterative procedures to obtain the MLEs of QTL genotype-specific parameters that define longitudinal binary responses. The usefulness of the model was validated by analyzing a real example in rice. Simulation studies were performed to investigate the statistical properties of the model, showing that the model has power to identify and map specific QTLs responsible for the temporal pattern of binary traits.","PeriodicalId":48980,"journal":{"name":"Statistical Applications in Genetics and Molecular Biology","volume":"11 6","pages":"Article 2"},"PeriodicalIF":0.9,"publicationDate":"2012-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/1544-6115.1675","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"31076958","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Time dependent ROC curves for the estimation of true prognostic capacity of microarray data. 用随时间变化的ROC曲线估计微阵列数据的真实预后能力。

IF 0.9 4区数学

Statistical Applications in Genetics and Molecular Biology Pub Date : 2012-11-22 DOI: 10.1515/1544-6115.1815

Yohann Foucher, Richard Danger

引用次数: 19

Large-scale parentage inference with SNPs: an efficient algorithm for statistical confidence of parent pair allocations. 基于snp的大规模亲子关系推断:一种有效的父母对分配统计置信度算法。

IF 0.9 4区数学

Statistical Applications in Genetics and Molecular Biology Pub Date : 2012-11-08 DOI: 10.1515/1544-6115.1833

Eric C Anderson

{"title":"Large-scale parentage inference with SNPs: an efficient algorithm for statistical confidence of parent pair allocations.","authors":"Eric C Anderson","doi":"10.1515/1544-6115.1833","DOIUrl":"https://doi.org/10.1515/1544-6115.1833","url":null,"abstract":"Advances in genotyping that allow tens of thousands of individuals to be genotyped at a moderate number of single nucleotide polymorphisms (SNPs) permit parentage inference to be pursued on a very large scale. The intergenerational tagging this capacity allows is revolutionizing the management of cultured organisms (cows, salmon, etc.) and is poised to do the same for scientific studies of natural populations. Currently, however, there are no likelihood-based methods of parentage inference which are implemented in a manner that allows them to quickly handle a very large number of potential parents or parent pairs. Here we introduce an efficient likelihood-based method applicable to the specialized case of cultured organisms in which both parents can be reliably sampled. We develop a Markov chain representation for the cumulative number of Mendelian incompatibilities between an offspring and its putative parents and we exploit it to develop a fast algorithm for simulation-based estimates of statistical confidence in SNP-based assignments of offspring to pairs of parents. The method is implemented in the freely available software SNPPIT. We describe the method in detail, then assess its performance in a large simulation study using known allele frequencies at 96 SNPs from ten hatchery salmon populations. The simulations verify that the method is fast and accurate and that 96 well-chosen SNPs can provide sufficient power to identify the correct pair of parents from amongst millions of candidate pairs.","PeriodicalId":48980,"journal":{"name":"Statistical Applications in Genetics and Molecular Biology","volume":"11 5","pages":""},"PeriodicalIF":0.9,"publicationDate":"2012-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/1544-6115.1833","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"31050246","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 36

ExactDAS: an exact test procedure for the detection of differential alternative splicing in microarray experiments. ExactDAS:一个精确的测试程序，用于检测微阵列实验中不同的选择性剪接。

IF 0.9 4区数学

Statistical Applications in Genetics and Molecular Biology Pub Date : 2012-11-06 DOI: 10.1515/1544-6115.1814

Tristan Mary-Huard, Florence Jaffrezic, Stéphane Robin

引用次数: 0

Variational Bayes procedure for effective classification of tumor type with microarray gene expression data. 利用微阵列基因表达数据有效分类肿瘤类型的变分贝叶斯方法。

IF 0.9 4区数学

Statistical Applications in Genetics and Molecular Biology Pub Date : 2012-10-30 DOI: 10.1515/1544-6115.1700

Takeshi Hayashi

{"title":"Variational Bayes procedure for effective classification of tumor type with microarray gene expression data.","authors":"Takeshi Hayashi","doi":"10.1515/1544-6115.1700","DOIUrl":"https://doi.org/10.1515/1544-6115.1700","url":null,"abstract":"Recently, microarrays that can simultaneously measure the expression levels of thousands of genes have become a valuable tool for classifying tumors. For such classification, where the sample size is usually much smaller than the number of genes, it is essential to construct properly sparse models for accurately predicting tumor types to avoid over-fitting. Bayesian shrinkage estimation is considered a suitable method for providing such sparse models, effectively shrinking estimates of the effects for many irrelevant genes to zero while maintaining those of a small number of relevant genes at significant magnitudes. However, Bayesian analysis usually requires time-consuming computational techniques such as computationally intensive MCMC iterations. This paper describes a computationally effective method of Bayesian shrinkage regression (BSR) incorporating multiple hierarchical structures for constructing a classification model for tumor types using microarray gene expression data. We use a variational approximation method which provides simple approximations of posterior distributions of parameters to reduce computational burden in the Bayesian estimation. This computationally efficient BSR procedure yields a properly sparse model for accurately and rapidly classifying tumor samples. The accuracy of tumor classification is shown to be at least equivalent to that of other methods such as support vector machine and partial least squares using simulated and actual gene expression data sets.","PeriodicalId":48980,"journal":{"name":"Statistical Applications in Genetics and Molecular Biology","volume":"11 5","pages":"Article 9"},"PeriodicalIF":0.9,"publicationDate":"2012-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/1544-6115.1700","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"31017509","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Detecting differential expression in RNA-sequence data using quasi-likelihood with shrunken dispersion estimates. 利用准似然和缩小的分散估计检测rna序列数据中的差异表达。

IF 0.9 4区数学

Statistical Applications in Genetics and Molecular Biology Pub Date : 2012-10-22 DOI: 10.1515/1544-6115.1826

Steven P Lund, Dan Nettleton, Davis J McCarthy, Gordon K Smyth

引用次数: 280

Analyzing genetic association studies with an extended propensity score approach. 用扩展倾向评分法分析遗传关联研究。

IF 0.9 4区数学

Statistical Applications in Genetics and Molecular Biology Pub Date : 2012-10-19 DOI: 10.1515/1544-6115.1790

Huaqing Zhao, Timothy R Rebbeck, Nandita Mitra

{"title":"Analyzing genetic association studies with an extended propensity score approach.","authors":"Huaqing Zhao, Timothy R Rebbeck, Nandita Mitra","doi":"10.1515/1544-6115.1790","DOIUrl":"https://doi.org/10.1515/1544-6115.1790","url":null,"abstract":"Propensity scores are commonly used to address confounding in observational studies. However, they have not been previously adapted to deal with bias in genetic association studies. We propose an extension of our previous method (Zhao et al., 2009) that uses a multilevel propensity score approach and allows one to estimate the effect of a genotype under an additive model and also simultaneously adjusts for confounders such as genetic ancestry and patient and disease characteristics. Using simulation studies, we demonstrate that this extended genetic propensity score (eGPS) can adequately adjust and consistently correct for bias due to confounding in a variety of circumstances. Under all simulation scenarios, the eGPS method yields estimates with bias close to 0 (mean=0.018, standard error=0.01). Our method also preserves statistical properties such as coverage probability, Type I error, and power. We illustrate this approach in a population-based genetic association study of testicular germ cell tumors and KITLG and SPRY4 susceptibility genes. We conclude that our method provides a novel and broadly applicable analytic strategy for obtaining less biased and more valid estimates of genetic associations.","PeriodicalId":48980,"journal":{"name":"Statistical Applications in Genetics and Molecular Biology","volume":"11 5","pages":""},"PeriodicalIF":0.9,"publicationDate":"2012-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/1544-6115.1790","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"31006389","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Empirical bayesian selection of hypothesis testing procedures for analysis of sequence count expression data. 经验贝叶斯选择的假设检验程序，用于分析序列计数表达数据。

IF 0.9 4区数学

Statistical Applications in Genetics and Molecular Biology Pub Date : 2012-10-19 DOI: 10.1515/1544-6115.1773

Stanley B Pounds, Cuilan L Gao, Hui Zhang

{"title":"Empirical bayesian selection of hypothesis testing procedures for analysis of sequence count expression data.","authors":"Stanley B Pounds, Cuilan L Gao, Hui Zhang","doi":"10.1515/1544-6115.1773","DOIUrl":"https://doi.org/10.1515/1544-6115.1773","url":null,"abstract":"Differential expression analysis of sequence-count expression data involves performing a large number of hypothesis tests that compare the expression count data of each gene or transcript across two or more biological conditions. The assumptions of any specific hypothesis-testing method will probably not be valid for each of a very large number of genes. Thus, computational evaluation of assumptions should be incorporated into the analysis to select an appropriate hypothesis-testing method for each gene. Here, we generalize earlier work to introduce two novel procedures that use estimates of the empirical Bayesian probability (EBP) of overdispersion to select or combine results of a standard Poisson likelihood ratio test and a quasi-likelihood test for each gene. These EBP-based procedures simultaneously evaluate the Poisson-distribution assumption and account for multiple testing. With adequate power to detect overdispersion, the new procedures select the standard likelihood test for each gene with Poisson-distributed counts and the quasi-likelihood test for each gene with overdispersed counts. The new procedures outperformed previously published methods in many simulation studies. We also present a real-data analysis example and discuss how the framework used to develop the new procedures may be generalized to further enhance performance. An R code library that implements the methods is freely available at www.stjuderesearch.org/depts/biostats/software.","PeriodicalId":48980,"journal":{"name":"Statistical Applications in Genetics and Molecular Biology","volume":"11 5","pages":""},"PeriodicalIF":0.9,"publicationDate":"2012-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/1544-6115.1773","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"31008004","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13

Estimators of the local false discovery rate designed for small numbers of tests. 为少量测试设计的局部错误发现率估计器。

IF 0.9 4区数学

Statistical Applications in Genetics and Molecular Biology Pub Date : 2012-10-12 DOI: 10.1515/1544-6115.1807

Marta Padilla, David R Bickel

{"title":"Estimators of the local false discovery rate designed for small numbers of tests.","authors":"Marta Padilla, David R Bickel","doi":"10.1515/1544-6115.1807","DOIUrl":"https://doi.org/10.1515/1544-6115.1807","url":null,"abstract":"Histogram-based empirical Bayes methods developed for analyzing data for large numbers of genes, SNPs, or other biological features tend to have large biases when applied to data with a smaller number of features such as genes with expression measured conventionally, proteins, and metabolites. To analyze such small-scale and medium-scale data in an empirical Bayes framework, we introduce corrections of maximum likelihood estimators (MLEs) of the local false discovery rate (LFDR). In this context, the MLE estimates the LFDR, which is a posterior probability of null hypothesis truth, by estimating the prior distribution. The corrections lie in excluding each feature when estimating one or more parameters on which the prior depends. In addition, we propose the expected LFDR (ELFDR) in order to propagate the uncertainty involved in estimating the prior. We also introduce an optimally weighted combination of the best of the corrected MLEs with a previous estimator that, being based on a binomial distribution, does not require a parametric model of the data distribution across features. An application of the new estimators and previous estimators to protein abundance data illustrates the extent to which different estimators lead to different conclusions about which proteins are affected by cancer. A simulation study was conducted to approximate the bias of the new estimators relative to previous LFDR estimators. Data were simulated for two different numbers of features (N), two different noncentrality parameter values or detectability levels (dalt), and several proportions of unaffected features (p0). One of these previous estimators is a histogram-based estimator (HBE) designed for a large number of features. The simulations show that some of the corrected MLEs and the ELFDR that corrects the HBE reduce the negative bias relative to the MLE and the HBE, respectively. For every method, we defined the worst-case performance as the maximum of the absolute value of the bias over the two different dalt and over various p0. The best worst-case methods represent the safest methods to be used under given conditions. This analysis indicates that the binomial-based method has the lowest worst-case absolute bias for high p0 and for N = 3, 12. However, the corrected MLE that is based on the minimum description length (MDL) principle is the best worst-case method when the value of p0 is more uncertain since it has one of the lowest worst-case biases over all possible values of p0 and for N = 3, 12. Therefore, the safest estimator considered is the binomial-based method when a high proportion of unaffected features can be assumed and the MDL-based method otherwise. A second simulation study was conducted with additional values of N. We found that HBE requires N to be at least 6-12 features to perform as well as the estimators proposed here, with the precise minimum N depending on p0 and dalt.","PeriodicalId":48980,"journal":{"name":"Statistical Applications in Genetics and Molecular Biology","volume":"11 5","pages":"4"},"PeriodicalIF":0.9,"publicationDate":"2012-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/1544-6115.1807","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"30988559","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Genotype copy number variations using Gaussian mixture models: theory and algorithms. 使用高斯混合模型的基因型拷贝数变化:理论和算法。

IF 0.9 4区数学

Statistical Applications in Genetics and Molecular Biology Pub Date : 2012-10-12 DOI: 10.1515/1544-6115.1725

Chang-Yun Lin, Yungtai Lo, Kenny Q Ye

{"title":"Genotype copy number variations using Gaussian mixture models: theory and algorithms.","authors":"Chang-Yun Lin, Yungtai Lo, Kenny Q Ye","doi":"10.1515/1544-6115.1725","DOIUrl":"https://doi.org/10.1515/1544-6115.1725","url":null,"abstract":"Copy number variations (CNVs) are important in the disease association studies and are usually targeted by most recent microarray platforms developed for GWAS studies. However, the probes targeting the same CNV regions could vary greatly in performance, with some of the probes carrying little information more than pure noise. In this paper, we investigate how to best combine measurements of multiple probes to estimate copy numbers of individuals under the framework of Gaussian mixture model (GMM). First we show that under two regularity conditions and assume all the parameters except the mixing proportions are known, optimal weights can be obtained so that the univariate GMM based on the weighted average gives the exactly the same classification as the multivariate GMM does. We then developed an algorithm that iteratively estimates the parameters and obtains the optimal weights, and uses them for classification. The algorithm performs well on simulation data and two sets of real data, which shows clear advantage over classification based on the equal weighted average.","PeriodicalId":48980,"journal":{"name":"Statistical Applications in Genetics and Molecular Biology","volume":"11 5","pages":"5"},"PeriodicalIF":0.9,"publicationDate":"2012-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/1544-6115.1725","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"30988558","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7