Modifying SAMseq to account for asymmetry in the distribution of effect sizes when identifying differentially expressed genes.

IF 0.8 4区 数学 Q4 BIOCHEMISTRY & MOLECULAR BIOLOGY
Ekua Kotoka, Megan Orr
{"title":"Modifying SAMseq to account for asymmetry in the distribution of effect sizes when identifying differentially expressed genes.","authors":"Ekua Kotoka,&nbsp;Megan Orr","doi":"10.1515/sagmb-2016-0037","DOIUrl":null,"url":null,"abstract":"<p><p>RNA-Seq is a developing technology for generating gene expression data by directly sequencing mRNA molecules in a sample. RNA-Seq data consist of counts of reads recorded to a particular gene that are often used to identify differentially expressed (DE) genes. A common statistical method used to analyze RNA-Seq data is Significance Analysis of Microarray with emphasis on RNA-Seq data (SAMseq). SAMseq is a nonparametric method that uses a resampling technique to account for differences in sequencing depths when identifying DE genes. We propose a modification of this method that takes into account asymmetry in the distribution of the effect sizes by taking into account the sign of the test statistics. Through simulation studies, we showthat the proposed method, comparedwith the traditional SAMseqmethod and other existing methods provides better power for identifying truly DE genes or more sufficiently controls FDR in most settings where asymmetry is present. We illustrate the use of the proposed method by analyzing an RNA-Seq data set containing C57BL/6J (B6) and DBA/2J (D2) mouse strains samples.</p>","PeriodicalId":48980,"journal":{"name":"Statistical Applications in Genetics and Molecular Biology","volume":"16 5-6","pages":"291-312"},"PeriodicalIF":0.8000,"publicationDate":"2017-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/sagmb-2016-0037","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Statistical Applications in Genetics and Molecular Biology","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1515/sagmb-2016-0037","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 1

Abstract

RNA-Seq is a developing technology for generating gene expression data by directly sequencing mRNA molecules in a sample. RNA-Seq data consist of counts of reads recorded to a particular gene that are often used to identify differentially expressed (DE) genes. A common statistical method used to analyze RNA-Seq data is Significance Analysis of Microarray with emphasis on RNA-Seq data (SAMseq). SAMseq is a nonparametric method that uses a resampling technique to account for differences in sequencing depths when identifying DE genes. We propose a modification of this method that takes into account asymmetry in the distribution of the effect sizes by taking into account the sign of the test statistics. Through simulation studies, we showthat the proposed method, comparedwith the traditional SAMseqmethod and other existing methods provides better power for identifying truly DE genes or more sufficiently controls FDR in most settings where asymmetry is present. We illustrate the use of the proposed method by analyzing an RNA-Seq data set containing C57BL/6J (B6) and DBA/2J (D2) mouse strains samples.

在鉴定差异表达基因时,修改SAMseq以解释效应大小分布的不对称性。
RNA-Seq是一种通过直接测序样品中的mRNA分子来生成基因表达数据的新兴技术。RNA-Seq数据包括记录到特定基因的读取计数,通常用于识别差异表达(DE)基因。用于分析RNA-Seq数据的常用统计方法是强调RNA-Seq数据的微阵列显著性分析(SAMseq)。SAMseq是一种非参数方法,在鉴定DE基因时使用重采样技术来解释测序深度的差异。我们建议对这种方法进行修改,通过考虑检验统计量的符号来考虑效应大小分布的不对称性。通过仿真研究,我们表明,与传统的SAMseqmethod和其他现有方法相比,所提出的方法在大多数不对称存在的情况下,能够更好地识别真正的DE基因或更充分地控制FDR。我们通过分析包含C57BL/6J (B6)和DBA/2J (D2)小鼠品系样本的RNA-Seq数据集来说明该方法的使用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Statistical Applications in Genetics and Molecular Biology
Statistical Applications in Genetics and Molecular Biology BIOCHEMISTRY & MOLECULAR BIOLOGY-MATHEMATICAL & COMPUTATIONAL BIOLOGY
自引率
11.10%
发文量
8
期刊介绍: Statistical Applications in Genetics and Molecular Biology seeks to publish significant research on the application of statistical ideas to problems arising from computational biology. The focus of the papers should be on the relevant statistical issues but should contain a succinct description of the relevant biological problem being considered. The range of topics is wide and will include topics such as linkage mapping, association studies, gene finding and sequence alignment, protein structure prediction, design and analysis of microarray data, molecular evolution and phylogenetic trees, DNA topology, and data base search strategies. Both original research and review articles will be warmly received.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信