Statistical Applications in Genetics and Molecular Biology最新文献_第6页

Estimating intrinsic and extrinsic noise from single-cell gene expression measurements. 估计单细胞基因表达测量的内在和外在噪声。

IF 0.9 4区数学

Statistical Applications in Genetics and Molecular Biology Pub Date : 2016-12-01 DOI: 10.1515/sagmb-2016-0002

Audrey Qiuyan Fu, Lior Pachter

{"title":"Estimating intrinsic and extrinsic noise from single-cell gene expression measurements.","authors":"Audrey Qiuyan Fu, Lior Pachter","doi":"10.1515/sagmb-2016-0002","DOIUrl":"https://doi.org/10.1515/sagmb-2016-0002","url":null,"abstract":"Gene expression is stochastic and displays variation (\"noise\") both within and between cells. Intracellular (intrinsic) variance can be distinguished from extracellular (extrinsic) variance by applying the law of total variance to data from two-reporter assays that probe expression of identically regulated gene pairs in single cells. We examine established formulas [Elowitz, M. B., A. J. Levine, E. D. Siggia and P. S. Swain (2002): \"Stochastic gene expression in a single cell,\" Science, 297, 1183-1186.] for the estimation of intrinsic and extrinsic noise and provide interpretations of them in terms of a hierarchical model. This allows us to derive alternative estimators that minimize bias or mean squared error. We provide a geometric interpretation of these results that clarifies the interpretation in [Elowitz, M. B., A. J. Levine, E. D. Siggia and P. S. Swain (2002): \"Stochastic gene expression in a single cell,\" Science, 297, 1183-1186.]. We also demonstrate through simulation and re-analysis of published data that the distribution assumptions underlying the hierarchical model have to be satisfied for the estimators to produce sensible results, which highlights the importance of normalization.","PeriodicalId":48980,"journal":{"name":"Statistical Applications in Genetics and Molecular Biology","volume":"15 6","pages":"447-471"},"PeriodicalIF":0.9,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/sagmb-2016-0002","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39981816","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 23

Search of latent periodicity in amino acid sequences by means of genetic algorithm and dynamic programming. 基于遗传算法和动态规划的氨基酸序列潜在周期性搜索。

IF 0.9 4区数学

Statistical Applications in Genetics and Molecular Biology Pub Date : 2016-10-01 DOI: 10.1515/sagmb-2015-0079

Valentina Pugacheva, Alexander Korotkov, Eugene Korotkov

引用次数: 26

Evaluation of low-template DNA profiles using peak heights. 利用峰高评价低模板DNA谱。

IF 0.9 4区数学

Statistical Applications in Genetics and Molecular Biology Pub Date : 2016-10-01 DOI: 10.1515/sagmb-2016-0038

Christopher D Steele, Matthew Greenhalgh, David J Balding

{"title":"Evaluation of low-template DNA profiles using peak heights.","authors":"Christopher D Steele, Matthew Greenhalgh, David J Balding","doi":"10.1515/sagmb-2016-0038","DOIUrl":"https://doi.org/10.1515/sagmb-2016-0038","url":null,"abstract":"In recent years statistical models for the analysis of complex (low-template and/or mixed) DNA profiles have moved from using only presence/absence information about allelic peaks in an electropherogram, to quantitative use of peak heights. This is challenging because peak heights are very variable and affected by a number of factors. We present a new peak-height model with important novel features, including over- and double-stutter, and a new approach to dropin. Our model is incorporated in open-source R code likeLTD. We apply it to 108 laboratory-generated crime-scene profiles and demonstrate techniques of model validation that are novel in the field. We use the results to explore the benefits of modeling peak heights, finding that it is not always advantageous, and to assess the merits of pre-extraction replication. We also introduce an approximation that can reduce computational complexity when there are multiple low-level contributors who are not of interest to the investigation, and we present a simple approximate adjustment for linkage between loci, making it possible to accommodate linkage when evaluating complex DNA profiles.","PeriodicalId":48980,"journal":{"name":"Statistical Applications in Genetics and Molecular Biology","volume":"15 5","pages":"431-445"},"PeriodicalIF":0.9,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/sagmb-2016-0038","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34733269","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 28

The use of vector bootstrapping to improve variable selection precision in Lasso models. 利用向量自举提高Lasso模型的变量选择精度。

IF 0.9 4区数学

Statistical Applications in Genetics and Molecular Biology Pub Date : 2016-08-01 DOI: 10.1515/sagmb-2015-0043

Charles Laurin, Dorret Boomsma, Gitta Lubke

{"title":"The use of vector bootstrapping to improve variable selection precision in Lasso models.","authors":"Charles Laurin, Dorret Boomsma, Gitta Lubke","doi":"10.1515/sagmb-2015-0043","DOIUrl":"10.1515/sagmb-2015-0043","url":null,"abstract":"The Lasso is a shrinkage regression method that is widely used for variable selection in statistical genetics. Commonly, K-fold cross-validation is used to fit a Lasso model. This is sometimes followed by using bootstrap confidence intervals to improve precision in the resulting variable selections. Nesting cross-validation within bootstrapping could provide further improvements in precision, but this has not been investigated systematically. We performed simulation studies of Lasso variable selection precision (VSP) with and without nesting cross-validation within bootstrapping. Data were simulated to represent genomic data under a polygenic model as well as under a model with effect sizes representative of typical GWAS results. We compared these approaches to each other as well as to software defaults for the Lasso. Nested cross-validation had the most precise variable selection at small effect sizes. At larger effect sizes, there was no advantage to nesting. We illustrated the nested approach with empirical data comprising SNPs and SNP-SNP interactions from the most significant SNPs in a GWAS of borderline personality symptoms. In the empirical example, we found that the default Lasso selected low-reliability SNPs and interactions which were excluded by bootstrapping.","PeriodicalId":48980,"journal":{"name":"Statistical Applications in Genetics and Molecular Biology","volume":"15 4","pages":"305-20"},"PeriodicalIF":0.9,"publicationDate":"2016-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/sagmb-2015-0043","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34536049","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 22

Bayesian state space models for dynamic genetic network construction across multiple tissues. 多组织动态遗传网络构建的贝叶斯状态空间模型。

IF 0.9 4区数学

Statistical Applications in Genetics and Molecular Biology Pub Date : 2016-08-01 DOI: 10.1515/sagmb-2014-0055

Yulan Liang, Arpad Kelemen

{"title":"Bayesian state space models for dynamic genetic network construction across multiple tissues.","authors":"Yulan Liang, Arpad Kelemen","doi":"10.1515/sagmb-2014-0055","DOIUrl":"https://doi.org/10.1515/sagmb-2014-0055","url":null,"abstract":"Construction of gene-gene interaction networks and potential pathways is a challenging and important problem in genomic research for complex diseases while estimating the dynamic changes of the temporal correlations and non-stationarity are the keys in this process. In this paper, we develop dynamic state space models with hierarchical Bayesian settings to tackle this challenge for inferring the dynamic profiles and genetic networks associated with disease treatments. We treat both the stochastic transition matrix and the observation matrix time-variant and include temporal correlation structures in the covariance matrix estimations in the multivariate Bayesian state space models. The unevenly spaced short time courses with unseen time points are treated as hidden state variables. Hierarchical Bayesian approaches with various prior and hyper-prior models with Monte Carlo Markov Chain and Gibbs sampling algorithms are used to estimate the model parameters and the hidden state variables. We apply the proposed Hierarchical Bayesian state space models to multiple tissues (liver, skeletal muscle, and kidney) Affymetrix time course data sets following corticosteroid (CS) drug administration. Both simulation and real data analysis results show that the genomic changes over time and gene-gene interaction in response to CS treatment can be well captured by the proposed models. The proposed dynamic Hierarchical Bayesian state space modeling approaches could be expanded and applied to other large scale genomic data, such as next generation sequence (NGS) combined with real time and time varying electronic health record (EHR) for more comprehensive and robust systematic and network based analysis in order to transform big biomedical data into predictions and diagnostics for precision medicine and personalized healthcare with better decision making and patient outcomes.","PeriodicalId":48980,"journal":{"name":"Statistical Applications in Genetics and Molecular Biology","volume":"15 4","pages":"273-90"},"PeriodicalIF":0.9,"publicationDate":"2016-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/sagmb-2014-0055","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34609077","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

LandScape: a simple method to aggregate p-values and other stochastic variables without a priori grouping. 景观:一种简单的方法来汇总p值和其他随机变量没有先验分组。

IF 0.9 4区数学

Statistical Applications in Genetics and Molecular Biology Pub Date : 2016-08-01 DOI: 10.1515/sagmb-2015-0085

Carsten Wiuf, Jonatan Schaumburg-Müller Pallesen, Leslie Foldager, Jakob Grove

{"title":"LandScape: a simple method to aggregate p-values and other stochastic variables without a priori grouping.","authors":"Carsten Wiuf, Jonatan Schaumburg-Müller Pallesen, Leslie Foldager, Jakob Grove","doi":"10.1515/sagmb-2015-0085","DOIUrl":"https://doi.org/10.1515/sagmb-2015-0085","url":null,"abstract":"In many areas of science it is custom to perform many, potentially millions, of tests simultaneously. To gain statistical power it is common to group tests based on a priori criteria such as predefined regions or by sliding windows. However, it is not straightforward to choose grouping criteria and the results might depend on the chosen criteria. Methods that summarize, or aggregate, test statistics or p-values, without relying on a priori criteria, are therefore desirable. We present a simple method to aggregate a sequence of stochastic variables, such as test statistics or p-values, into fewer variables without assuming a priori defined groups. We provide different ways to evaluate the significance of the aggregated variables based on theoretical considerations and resampling techniques, and show that under certain assumptions the FWER is controlled in the strong sense. Validity of the method was demonstrated using simulations and real data analyses. Our method may be a useful supplement to standard procedures relying on evaluation of test statistics individually. Moreover, by being agnostic and not relying on predefined selected regions, it might be a practical alternative to conventionally used methods of aggregation of p-values over regions. The method is implemented in Python and freely available online (through GitHub, see the Supplementary information).","PeriodicalId":48980,"journal":{"name":"Statistical Applications in Genetics and Molecular Biology","volume":"15 4","pages":"349-61"},"PeriodicalIF":0.9,"publicationDate":"2016-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/sagmb-2015-0085","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34554070","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

A joint modeling approach for uncovering associations between gene expression, bioactivity and chemical structure in early drug discovery to guide lead selection and genomic biomarker development. 一种联合建模方法，揭示早期药物发现中基因表达、生物活性和化学结构之间的关联，以指导先导物选择和基因组生物标志物的开发。

IF 0.9 4区数学

Statistical Applications in Genetics and Molecular Biology Pub Date : 2016-08-01 DOI: 10.1515/sagmb-2014-0086

Nolen Perualila-Tan, Adetayo Kasim, Willem Talloen, Bie Verbist, Hinrich W H Göhlmann, Ziv Shkedy

{"title":"A joint modeling approach for uncovering associations between gene expression, bioactivity and chemical structure in early drug discovery to guide lead selection and genomic biomarker development.","authors":"Nolen Perualila-Tan, Adetayo Kasim, Willem Talloen, Bie Verbist, Hinrich W H Göhlmann, Ziv Shkedy","doi":"10.1515/sagmb-2014-0086","DOIUrl":"https://doi.org/10.1515/sagmb-2014-0086","url":null,"abstract":"The modern drug discovery process involves multiple sources of high-dimensional data. This imposes the challenge of data integration. A typical example is the integration of chemical structure (fingerprint features), phenotypic bioactivity (bioassay read-outs) data for targets of interest, and transcriptomic (gene expression) data in early drug discovery to better understand the chemical and biological mechanisms of candidate drugs, and to facilitate early detection of safety issues prior to later and expensive phases of drug development cycles. In this paper, we discuss a joint model for the transcriptomic and the phenotypic variables conditioned on the chemical structure. This modeling approach can be used to uncover, for a given set of compounds, the association between gene expression and biological activity taking into account the influence of the chemical structure of the compound on both variables. The model allows to detect genes that are associated with the bioactivity data facilitating the identification of potential genomic biomarkers for compounds efficacy. In addition, the effect of every structural feature on both genes and pIC50 and their associations can be simultaneously investigated. Two oncology projects are used to illustrate the applicability and usefulness of the joint model to integrate multi-source high-dimensional information to aid drug discovery.","PeriodicalId":48980,"journal":{"name":"Statistical Applications in Genetics and Molecular Biology","volume":"15 4","pages":"291-304"},"PeriodicalIF":0.9,"publicationDate":"2016-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/sagmb-2014-0086","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34616483","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Finding causative genes from high-dimensional data: an appraisal of statistical and machine learning approaches. 从高维数据中寻找致病基因:对统计和机器学习方法的评估。

IF 0.9 4区数学

Statistical Applications in Genetics and Molecular Biology Pub Date : 2016-08-01 DOI: 10.1515/sagmb-2015-0072

Chamont Wang, Jana L Gevertz

{"title":"Finding causative genes from high-dimensional data: an appraisal of statistical and machine learning approaches.","authors":"Chamont Wang, Jana L Gevertz","doi":"10.1515/sagmb-2015-0072","DOIUrl":"https://doi.org/10.1515/sagmb-2015-0072","url":null,"abstract":"Modern biological experiments often involve high-dimensional data with thousands or more variables. A challenging problem is to identify the key variables that are related to a specific disease. Confounding this task is the vast number of statistical methods available for variable selection. For this reason, we set out to develop a framework to investigate the variable selection capability of statistical methods that are commonly applied to analyze high-dimensional biological datasets. Specifically, we designed six simulated cancers (based on benchmark colon and prostate cancer data) where we know precisely which genes cause a dataset to be classified as cancerous or normal - we call these causative genes. We found that not one statistical method tested could identify all the causative genes for all of the simulated cancers, even though increasing the sample size does improve the variable selection capabilities in most cases. Furthermore, certain statistical tools can classify our simulated data with a low error rate, yet the variables being used for classification are not necessarily the causative genes.","PeriodicalId":48980,"journal":{"name":"Statistical Applications in Genetics and Molecular Biology","volume":"15 4","pages":"321-47"},"PeriodicalIF":0.9,"publicationDate":"2016-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/sagmb-2015-0072","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34519893","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Sparse factor model for co-expression networks with an application using prior biological knowledge. 基于先验生物学知识的共表达网络稀疏因子模型。

IF 0.9 4区数学

Statistical Applications in Genetics and Molecular Biology Pub Date : 2016-06-01 DOI: 10.1515/sagmb-2015-0002

Yuna Blum, Magalie Houée-Bigot, David Causeur

{"title":"Sparse factor model for co-expression networks with an application using prior biological knowledge.","authors":"Yuna Blum, Magalie Houée-Bigot, David Causeur","doi":"10.1515/sagmb-2015-0002","DOIUrl":"https://doi.org/10.1515/sagmb-2015-0002","url":null,"abstract":"Abstract Inference on gene regulatory networks from high-throughput expression data turns out to be one of the main current challenges in systems biology. Such networks can be very insightful for the deep understanding of interactions between genes. Because genes-gene interactions is often viewed as joint contributions to known biological mechanisms, inference on the dependence among gene expressions is expected to be consistent to some extent with the functional characterization of genes which can be derived from ontologies (GO, KEGG, …). The present paper introduces a sparse factor model as a general framework either to account for a prior knowledge on joint contributions of modules of genes to latent biological processes or to infer on the corresponding co-expression network. We propose an ℓ1 – regularized EM algorithm to fit a sparse factor model for correlation. We demonstrate how it helps extracting modules of genes and more generally improves the gene clustering performance. The method is compared to alternative estimation procedures for sparse factor models of relevance networks in a simulation study. The integration of a biological knowledge based on the gene ontology (GO) is also illustrated on a liver expression data generated to understand adiposity variability in chicken.","PeriodicalId":48980,"journal":{"name":"Statistical Applications in Genetics and Molecular Biology","volume":"15 3","pages":"253-72"},"PeriodicalIF":0.9,"publicationDate":"2016-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/sagmb-2015-0002","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34377789","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Bayesian mixed-effects model for the analysis of a series of FRAP images. 贝叶斯混合效应模型对一系列FRAP图像的分析。

IF 0.9 4区数学

Statistical Applications in Genetics and Molecular Biology Pub Date : 2015-02-01 DOI: 10.1515/sagmb-2014-0013

Martina Feilke, Katrin Schneider, Volker J Schmid

{"title":"Bayesian mixed-effects model for the analysis of a series of FRAP images.","authors":"Martina Feilke, Katrin Schneider, Volker J Schmid","doi":"10.1515/sagmb-2014-0013","DOIUrl":"https://doi.org/10.1515/sagmb-2014-0013","url":null,"abstract":"The binding behavior of molecules in nuclei of living cells can be studied through the analysis of images from fluorescence recovery after photobleaching experiments. However, there is still a lack of methodology for the statistical evaluation of FRAP data, especially for the joint analysis of multiple dynamic images. We propose a hierarchical Bayesian nonlinear model with mixed-effect priors based on local compartment models in order to obtain joint parameter estimates for all nuclei as well as to account for the heterogeneity of the nuclei population. We apply our method to a series of FRAP experiments of DNA methyltransferase 1 tagged to green fluorescent protein expressed in a somatic mouse cell line and compare the results to the application of three different fixed-effects models to the same series of FRAP experiments. With the proposed model, we get estimates of the off-rates of the interactions of the molecules under study together with credible intervals, and additionally gain information about the variability between nuclei. The proposed model is superior to and more robust than the tested fixed-effects models. Therefore, it can be used for the joint analysis of data from FRAP experiments on various similar nuclei.","PeriodicalId":48980,"journal":{"name":"Statistical Applications in Genetics and Molecular Biology","volume":"14 1","pages":"35-51"},"PeriodicalIF":0.9,"publicationDate":"2015-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/sagmb-2014-0013","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"32905409","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2