Annals of Applied Statistics最新文献

筛选
英文 中文
REFINING CELLULAR PATHWAY MODELS USING AN ENSEMBLE OF HETEROGENEOUS DATA SOURCES. 利用异构数据源组合完善细胞通路模型。
IF 1.8 4区 数学
Annals of Applied Statistics Pub Date : 2018-09-01 Epub Date: 2018-09-11 DOI: 10.1214/16-aoas915
Alexander M Franks, Florian Markowetz, Edoardo M Airoldi
{"title":"REFINING CELLULAR PATHWAY MODELS USING AN ENSEMBLE OF HETEROGENEOUS DATA SOURCES.","authors":"Alexander M Franks, Florian Markowetz, Edoardo M Airoldi","doi":"10.1214/16-aoas915","DOIUrl":"10.1214/16-aoas915","url":null,"abstract":"<p><p>Improving current models and hypotheses of cellular pathways is one of the major challenges of systems biology and functional genomics. There is a need for methods to build on established expert knowledge and reconcile it with results of new high-throughput studies. Moreover, the available sources of data are heterogeneous, and the data need to be integrated in different ways depending on which part of the pathway they are most informative for. In this paper, we introduce a compartment specific strategy to integrate edge, node and path data for refining a given network hypothesis. To carry out inference, we use a local-move Gibbs sampler for updating the pathway hypothesis from a compendium of heterogeneous data sources, and a new network regression idea for integrating protein attributes. We demonstrate the utility of this approach in a case study of the pheromone response MAPK pathway in the yeast <i>S. cerevisiae</i>.</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"12 3","pages":"1361-1384"},"PeriodicalIF":1.8,"publicationDate":"2018-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9733905/pdf/nihms-1823482.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10366316","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A TESTING BASED APPROACH TO THE DISCOVERY OF DIFFERENTIALLY CORRELATED VARIABLE SETS. 发现差异相关变量集的一种基于测试的方法。
IF 1.8 4区 数学
Annals of Applied Statistics Pub Date : 2018-06-01 Epub Date: 2018-07-28 DOI: 10.1214/17-AOAS1083
By Kelly Bodwin, Kai Zhang, Andrew Nobel
{"title":"A TESTING BASED APPROACH TO THE DISCOVERY OF DIFFERENTIALLY CORRELATED VARIABLE SETS.","authors":"By Kelly Bodwin,&nbsp;Kai Zhang,&nbsp;Andrew Nobel","doi":"10.1214/17-AOAS1083","DOIUrl":"10.1214/17-AOAS1083","url":null,"abstract":"<p><p>Given data obtained under two sampling conditions, it is often of interest to identify variables that behave differently in one condition than in the other. We introduce a method for differential analysis of second-order behavior called Differential Correlation Mining (DCM). The DCM method identifies differentially correlated sets of variables, with the property that the average pairwise correlation between variables in a set is higher under one sample condition than the other. DCM is based on an iterative search procedure that adaptively updates the size and elements of a candidate variable set. Updates are performed via hypothesis testing of individual variables, based on the asymptotic distribution of their average differential correlation. We investigate the performance of DCM by applying it to simulated data as well as to recent experimental datasets in genomics and brain imaging.</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"12 2","pages":"1180-1203"},"PeriodicalIF":1.8,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1214/17-AOAS1083","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"37486780","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
ADJUSTED REGULARIZATION IN LATENT GRAPHICAL MODELS: APPLICATION TO MULTIPLE-NEURON SPIKE COUNT DATA. 潜在图形模型中的调整正则化:应用于多神经元尖峰计数数据。
IF 1.3 4区 数学
Annals of Applied Statistics Pub Date : 2018-06-01 Epub Date: 2018-07-28 DOI: 10.1214/18-AOAS1190
Giuseppe Vinci, Valérie Ventura, Matthew A Smith, Robert E Kass
{"title":"ADJUSTED REGULARIZATION IN LATENT GRAPHICAL MODELS: APPLICATION TO MULTIPLE-NEURON SPIKE COUNT DATA.","authors":"Giuseppe Vinci, Valérie Ventura, Matthew A Smith, Robert E Kass","doi":"10.1214/18-AOAS1190","DOIUrl":"10.1214/18-AOAS1190","url":null,"abstract":"<p><p>A major challenge in contemporary neuroscience is to analyze data from large numbers of neurons recorded simultaneously across many experimental replications (trials), where the data are counts of neural firing events, and one of the basic problems is to characterize the dependence structure among such multivariate counts. Methods of estimating high-dimensional covariation based on <i>ℓ</i> <sub>1</sub>-regularization are most appropriate when there are a small number of relatively large partial correlations, but in neural data there are often large numbers of relatively small partial correlations. Furthermore, the variation across trials is often confounded by Poisson-like variation within trials. To overcome these problems we introduce a comprehensive methodology that imbeds a Gaussian graphical model into a hierarchical structure: the counts are assumed Poisson, conditionally on latent variables that follow a Gaussian graphical model, and the graphical model parameters, in turn, are assumed to depend on physiologically-motivated covariates, which can greatly improve correct detection of interactions (non-zero partial correlations). We develop a Bayesian approach to fitting this covariate-adjusted generalized graphical model and we demonstrate its success in simulation studies. We then apply it to data from an experiment on visual attention, where we assess functional interactions between neurons recorded from two brain areas.</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"12 2","pages":"1068-1095"},"PeriodicalIF":1.3,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6879176/pdf/nihms-1014977.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49684619","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Estimating Large Correlation Matrices for International Migration. 估算国际移民的大型相关矩阵。
IF 1.3 4区 数学
Annals of Applied Statistics Pub Date : 2018-06-01 Epub Date: 2018-07-28 DOI: 10.1214/18-aoas1175
Jonathan J Azose, Adrian E Raftery
{"title":"Estimating Large Correlation Matrices for International Migration.","authors":"Jonathan J Azose, Adrian E Raftery","doi":"10.1214/18-aoas1175","DOIUrl":"10.1214/18-aoas1175","url":null,"abstract":"<p><p>The United Nations is the major organization producing and regularly updating probabilistic population projections for all countries. International migration is a critical component of such projections, and between-country correlations are important for forecasts of regional aggregates. However, in the data we consider there are 200 countries and only 12 data points, each one corresponding to a five-year time period. Thus a 200 × 200 correlation matrix must be estimated on the basis of 12 data points. Using Pearson correlations produces many spurious correlations. We propose a maximum <i>a posteriori</i> estimator for the correlation matrix with an interpretable informative prior distribution. The prior serves to regularize the correlation matrix, shrinking <i>a priori</i> untrustworthy elements towards zero. Our estimated correlation structure improves projections of net migration for regional aggregates, producing narrower projections of migration for Africa as a whole and wider projections for Europe. A simulation study confirms that our estimator outperforms both the Pearson correlation matrix and a simple shrinkage estimator when estimating a sparse correlation matrix.</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"12 2","pages":"940-970"},"PeriodicalIF":1.3,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7164801/pdf/nihms-1029425.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"37851577","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
KERNEL-PENALIZED REGRESSION FOR ANALYSIS OF MICROBIOME DATA. 用于微生物组数据分析的KERNEL-PENALIZED回归。
IF 1.8 4区 数学
Annals of Applied Statistics Pub Date : 2018-03-01 Epub Date: 2018-03-09 DOI: 10.1214/17-AOAS1102
Timothy W Randolph, Sen Zhao, Wade Copeland, Meredith Hullar, Ali Shojaie
{"title":"KERNEL-PENALIZED REGRESSION FOR ANALYSIS OF MICROBIOME DATA.","authors":"Timothy W Randolph,&nbsp;Sen Zhao,&nbsp;Wade Copeland,&nbsp;Meredith Hullar,&nbsp;Ali Shojaie","doi":"10.1214/17-AOAS1102","DOIUrl":"10.1214/17-AOAS1102","url":null,"abstract":"<p><p>The analysis of human microbiome data is often based on dimension-reduced graphical displays and clusterings derived from vectors of microbial abundances in each sample. Common to these ordination methods is the use of biologically motivated definitions of similarity. Principal coordinate analysis, in particular, is often performed using ecologically defined distances, allowing analyses to incorporate context-dependent, non-Euclidean structure. In this paper, we go beyond dimension-reduced ordination methods and describe a framework of high-dimensional regression models that extends these distance-based methods. In particular, we use kernel-based methods to show how to incorporate a variety of extrinsic information, such as phylogeny, into penalized regression models that estimate taxonspecific associations with a phenotype or clinical outcome. Further, we show how this regression framework can be used to address the compositional nature of multivariate predictors comprised of relative abundances; that is, vectors whose entries sum to a constant. We illustrate this approach with several simulations using data from two recent studies on gut and vaginal microbiomes. We conclude with an application to our own data, where we also incorporate a significance test for the estimated coefficients that represent associations between microbial abundance and a percent fat.</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"12 1","pages":"540-566"},"PeriodicalIF":1.8,"publicationDate":"2018-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1214/17-AOAS1102","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"36500481","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 38
A MULTI-RESOLUTION MODEL FOR NON-GAUSSIAN RANDOM FIELDS ON A SPHERE WITH APPLICATION TO IONOSPHERIC ELECTROSTATIC POTENTIALS. 球上非高斯随机场的多分辨率模型及其在电离层静电势中的应用。
IF 1.8 4区 数学
Annals of Applied Statistics Pub Date : 2018-03-01 Epub Date: 2018-03-09 DOI: 10.1214/17-AOAS1104
Minjie Fan, Debashis Paul, Thomas C M Lee, Tomoko Matsuo
{"title":"A MULTI-RESOLUTION MODEL FOR NON-GAUSSIAN RANDOM FIELDS ON A SPHERE WITH APPLICATION TO IONOSPHERIC ELECTROSTATIC POTENTIALS.","authors":"Minjie Fan,&nbsp;Debashis Paul,&nbsp;Thomas C M Lee,&nbsp;Tomoko Matsuo","doi":"10.1214/17-AOAS1104","DOIUrl":"https://doi.org/10.1214/17-AOAS1104","url":null,"abstract":"<p><p>Gaussian random fields have been one of the most popular tools for analyzing spatial data. However, many geophysical and environmental processes often display non-Gaussian characteristics. In this paper, we propose a new class of spatial models for non-Gaussian random fields on a sphere based on a multi-resolution analysis. Using a special wavelet frame, named <i>spherical needlets</i>, as building blocks, the proposed model is constructed in the form of a sparse random effects model. The spatial localization of needlets, together with carefully chosen random coefficients, ensure the model to be non-Gaussian and isotropic. The model can also be expanded to include a spatially varying variance profile. The special formulation of the model enables us to develop efficient estimation and prediction procedures, in which an adaptive MCMC algorithm is used. We investigate the accuracy of parameter estimation of the proposed model, and compare its predictive performance with that of two Gaussian models by extensive numerical experiments. Practical utility of the proposed model is demonstrated through an application of the methodology to a data set of high-latitude ionospheric electrostatic potentials, generated from the LFM-MIX model of the magnetosphere-ionosphere system.</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"12 1","pages":"459-489"},"PeriodicalIF":1.8,"publicationDate":"2018-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1214/17-AOAS1104","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41219380","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
POWERFUL TEST BASED ON CONDITIONAL EFFECTS FOR GENOME-WIDE SCREENING. 基于全基因组筛选条件效应的强大测试。
IF 1.8 4区 数学
Annals of Applied Statistics Pub Date : 2018-03-01 Epub Date: 2018-03-09 DOI: 10.1214/17-AOAS1103
Yaowu Liu, Jun Xie
{"title":"POWERFUL TEST BASED ON CONDITIONAL EFFECTS FOR GENOME-WIDE SCREENING.","authors":"Yaowu Liu, Jun Xie","doi":"10.1214/17-AOAS1103","DOIUrl":"10.1214/17-AOAS1103","url":null,"abstract":"<p><p>This paper considers testing procedures for screening large genome-wide data, where we examine hundreds of thousands of genetic variants, e.g., single nucleotide polymorphisms (SNP), on a quantitative phenotype. We screen the whole genome by SNP sets and propose a new test that is based on conditional effects from multiple SNPs. The test statistic is developed for weak genetic effects and incorporates correlations among genetic variables, which may be very high due to linkage disequilibrium. The limiting null distribution of the test statistic and the power of the test are derived. Under appropriate conditions, the test is shown to be more powerful than the minimum p-value method, which is based on marginal SNP effects and is the most commonly used method in genome-wide screening. The proposed test is also compared with other existing methods, including the Higher Criticism (HC) test and the sequence kernel association test (SKAT), through simulations and analysis of a real genome data set. For typical genome-wide data, where effects of individual SNPs are weak and correlations among SNPs are high, the proposed test is more advantageous and clearly outperforms the other methods in the literature.</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"12 1","pages":"567-585"},"PeriodicalIF":1.8,"publicationDate":"2018-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5931742/pdf/nihms910242.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"36077138","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MSIQ: JOINT MODELING OF MULTIPLE RNA-SEQ SAMPLES FOR ACCURATE ISOFORM QUANTIFICATION. Msiq:多个rna-seq样品的联合建模,用于精确的异构体定量。
IF 1.8 4区 数学
Annals of Applied Statistics Pub Date : 2018-03-01 Epub Date: 2018-03-09 DOI: 10.1214/17-AOAS1100
Wei Vivian Li, Anqi Zhao, Shihua Zhang, Jingyi Jessica Li
{"title":"MSIQ: JOINT MODELING OF MULTIPLE RNA-SEQ SAMPLES FOR ACCURATE ISOFORM QUANTIFICATION.","authors":"Wei Vivian Li, Anqi Zhao, Shihua Zhang, Jingyi Jessica Li","doi":"10.1214/17-AOAS1100","DOIUrl":"10.1214/17-AOAS1100","url":null,"abstract":"<p><p>Next-generation RNA sequencing (RNA-seq) technology has been widely used to assess full-length RNA isoform abundance in a high-throughput manner. RNA-seq data offer insight into gene expression levels and transcriptome structures, enabling us to better understand the regulation of gene expression and fundamental biological processes. Accurate isoform quantification from RNA-seq data is challenging due to the information loss in sequencing experiments. A recent accumulation of multiple RNA-seq data sets from the same tissue or cell type provides new opportunities to improve the accuracy of isoform quantification. However, existing statistical or computational methods for multiple RNA-seq samples either pool the samples into one sample or assign equal weights to the samples when estimating isoform abundance. These methods ignore the possible heterogeneity in the quality of different samples and could result in biased and unrobust estimates. In this article, we develop a method, which we call \"joint modeling of multiple RNA-seq samples for accurate isoform quantification\" (MSIQ), for more accurate and robust isoform quantification by integrating multiple RNA-seq samples under a Bayesian framework. Our method aims to (1) identify a consistent group of samples with homogeneous quality and (2) improve isoform quantification accuracy by jointly modeling multiple RNA-seq samples by allowing for higher weights on the consistent group. We show that MSIQ provides a consistent estimator of isoform abundance, and we demonstrate the accuracy and effectiveness of MSIQ compared with alternative methods through simulation studies on <i>D. melanogaster</i> genes. We justify MSIQ's advantages over existing approaches via application studies on real RNA-seq data from human embryonic stem cells, brain tissues, and the HepG2 immortalized cell line. We also perform a comprehensive analysis of how the isoform quantification accuracy would be affected by RNA-seq sample heterogeneity and different experimental protocols.</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"12 1","pages":"510-539"},"PeriodicalIF":1.8,"publicationDate":"2018-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1214/17-AOAS1100","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"36077139","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
DESIGN OF VACCINE TRIALS DURING OUTBREAKS WITH AND WITHOUT A DELAYED VACCINATION COMPARATOR. 在有和没有延迟接种比较对象的疫情爆发期间设计疫苗试验。
IF 1.8 4区 数学
Annals of Applied Statistics Pub Date : 2018-03-01 Epub Date: 2018-03-09 DOI: 10.1214/17-AOAS1095
Natalie E Dean, M Elizabeth Halloran, Ira M Longini
{"title":"DESIGN OF VACCINE TRIALS DURING OUTBREAKS WITH AND WITHOUT A DELAYED VACCINATION COMPARATOR.","authors":"Natalie E Dean, M Elizabeth Halloran, Ira M Longini","doi":"10.1214/17-AOAS1095","DOIUrl":"10.1214/17-AOAS1095","url":null,"abstract":"<p><p>Conducting vaccine efficacy trials during outbreaks of emerging pathogens poses particular challenges. The \"Ebola ça suffit\" trial in Guinea used a novel ring vaccination cluster randomized design to target populations at highest risk of infection. Another key feature of the trial was the use of a delayed vaccination arm as a comparator, in which clusters were randomized to immediate vaccination or vaccination 21 days later. This approach, chosen to improve ethical acceptability of the trial, complicates the statistical analysis as participants in the comparison arm are eventually protected by vaccine. Furthermore, for infectious diseases, we observe time of illness onset and not time of infection, and we may not know the time required for the vaccinee to develop a protective immune response. As a result, including events observed shortly after vaccination may bias the per protocol estimate of vaccine efficacy. We provide a framework for approximating the bias and power of any given analysis period as functions of the background infection hazard rate, disease incubation period, and vaccine immune response. We use this framework to provide recommendations for designing standard vaccine efficacy trials and trials with a delayed vaccination comparator. Briefly, narrower analysis periods within the correct window can minimize or eliminate bias but may suffer from reduced power. Designs should be reasonably robust to misspecification of the incubation period and time to develop a vaccine immune response.</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"12 1","pages":"330-347"},"PeriodicalIF":1.8,"publicationDate":"2018-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5878056/pdf/nihms949833.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35967541","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A UNIFIED STATISTICAL FRAMEWORK FOR SINGLE CELL AND BULK RNA SEQUENCING DATA. 单细胞和大量RNA测序数据的统一统计框架。
IF 1.8 4区 数学
Annals of Applied Statistics Pub Date : 2018-03-01 Epub Date: 2018-03-09 DOI: 10.1214/17-AOAS1110
Lingxue Zhu, Jing Lei, Bernie Devlin, Kathryn Roeder
{"title":"A UNIFIED STATISTICAL FRAMEWORK FOR SINGLE CELL AND BULK RNA SEQUENCING DATA.","authors":"Lingxue Zhu,&nbsp;Jing Lei,&nbsp;Bernie Devlin,&nbsp;Kathryn Roeder","doi":"10.1214/17-AOAS1110","DOIUrl":"https://doi.org/10.1214/17-AOAS1110","url":null,"abstract":"<p><p>Recent advances in technology have enabled the measurement of RNA levels for individual cells. Compared to traditional tissue-level bulk RNA-seq data, single cell sequencing yields valuable insights about gene expression profiles for different cell types, which is potentially critical for understanding many complex human diseases. However, developing quantitative tools for such data remains challenging because of high levels of technical noise, especially the \"dropout\" events. A \"dropout\" happens when the RNA for a gene fails to be amplified prior to sequencing, producing a \"false\" zero in the observed data. In this paper, we propose a Unified RNA-Sequencing Model (URSM) for both single cell and bulk RNA-seq data, formulated as a hierarchical model. URSM borrows the strength from both data sources and carefully models the dropouts in single cell data, leading to a more accurate estimation of cell type specific gene expression profile. In addition, URSM naturally provides inference on the dropout entries in single cell data that need to be imputed for downstream analyses, as well as the mixing proportions of different cell types in bulk samples. We adopt an empirical Bayes' approach, where parameters are estimated using the EM algorithm and approximate inference is obtained by Gibbs sampling. Simulation results illustrate that URSM outperforms existing approaches both in correcting for dropouts in single cell data, as well as in deconvolving bulk samples. We also demonstrate an application to gene expression data on fetal brains, where our model successfully imputes the dropout genes and reveals cell type specific expression patterns.</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"12 1","pages":"609-632"},"PeriodicalIF":1.8,"publicationDate":"2018-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1214/17-AOAS1110","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"36456543","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 53
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信