Modifying SAMseq to account for asymmetry in the distribution of effect sizes when identifying differentially expressed genes.

IF 0.4 4区数学 Q4 BIOCHEMISTRY & MOLECULAR BIOLOGY

Statistical Applications in Genetics and Molecular Biology Pub Date : 2017-11-27 DOI:10.1515/sagmb-2016-0037

Ekua Kotoka, Megan Orr

{"title":"Modifying SAMseq to account for asymmetry in the distribution of effect sizes when identifying differentially expressed genes.","authors":"Ekua Kotoka, Megan Orr","doi":"10.1515/sagmb-2016-0037","DOIUrl":null,"url":null,"abstract":"<p><p>RNA-Seq is a developing technology for generating gene expression data by directly sequencing mRNA molecules in a sample. RNA-Seq data consist of counts of reads recorded to a particular gene that are often used to identify differentially expressed (DE) genes. A common statistical method used to analyze RNA-Seq data is Significance Analysis of Microarray with emphasis on RNA-Seq data (SAMseq). SAMseq is a nonparametric method that uses a resampling technique to account for differences in sequencing depths when identifying DE genes. We propose a modification of this method that takes into account asymmetry in the distribution of the effect sizes by taking into account the sign of the test statistics. Through simulation studies, we showthat the proposed method, comparedwith the traditional SAMseqmethod and other existing methods provides better power for identifying truly DE genes or more sufficiently controls FDR in most settings where asymmetry is present. We illustrate the use of the proposed method by analyzing an RNA-Seq data set containing C57BL/6J (B6) and DBA/2J (D2) mouse strains samples.</p>","PeriodicalId":48980,"journal":{"name":"Statistical Applications in Genetics and Molecular Biology","volume":"16 5-6","pages":"291-312"},"PeriodicalIF":0.4000,"publicationDate":"2017-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/sagmb-2016-0037","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Statistical Applications in Genetics and Molecular Biology","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1515/sagmb-2016-0037","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}

引用次数: 1

Abstract

RNA-Seq is a developing technology for generating gene expression data by directly sequencing mRNA molecules in a sample. RNA-Seq data consist of counts of reads recorded to a particular gene that are often used to identify differentially expressed (DE) genes. A common statistical method used to analyze RNA-Seq data is Significance Analysis of Microarray with emphasis on RNA-Seq data (SAMseq). SAMseq is a nonparametric method that uses a resampling technique to account for differences in sequencing depths when identifying DE genes. We propose a modification of this method that takes into account asymmetry in the distribution of the effect sizes by taking into account the sign of the test statistics. Through simulation studies, we showthat the proposed method, comparedwith the traditional SAMseqmethod and other existing methods provides better power for identifying truly DE genes or more sufficiently controls FDR in most settings where asymmetry is present. We illustrate the use of the proposed method by analyzing an RNA-Seq data set containing C57BL/6J (B6) and DBA/2J (D2) mouse strains samples.

查看原文本刊更多论文

在鉴定差异表达基因时，修改SAMseq以解释效应大小分布的不对称性。

RNA-Seq是一种通过直接测序样品中的mRNA分子来生成基因表达数据的新兴技术。RNA-Seq数据包括记录到特定基因的读取计数，通常用于识别差异表达(DE)基因。用于分析RNA-Seq数据的常用统计方法是强调RNA-Seq数据的微阵列显著性分析(SAMseq)。SAMseq是一种非参数方法，在鉴定DE基因时使用重采样技术来解释测序深度的差异。我们建议对这种方法进行修改，通过考虑检验统计量的符号来考虑效应大小分布的不对称性。通过仿真研究，我们表明，与传统的SAMseqmethod和其他现有方法相比，所提出的方法在大多数不对称存在的情况下，能够更好地识别真正的DE基因或更充分地控制FDR。我们通过分析包含C57BL/6J (B6)和DBA/2J (D2)小鼠品系样本的RNA-Seq数据集来说明该方法的使用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Statistical Applications in Genetics and Molecular Biology BIOCHEMISTRY & MOLECULAR BIOLOGY-MATHEMATICAL & COMPUTATIONAL BIOLOGY

自引率

11.10%

发文量

期刊介绍： Statistical Applications in Genetics and Molecular Biology seeks to publish significant research on the application of statistical ideas to problems arising from computational biology. The focus of the papers should be on the relevant statistical issues but should contain a succinct description of the relevant biological problem being considered. The range of topics is wide and will include topics such as linkage mapping, association studies, gene finding and sequence alignment, protein structure prediction, design and analysis of microarray data, molecular evolution and phylogenetic trees, DNA topology, and data base search strategies. Both original research and review articles will be warmly received.