Regmex: a statistical tool for exploring motifs in ranked sequence lists from genomics experiments.

IF 1.5 4区生物学 Q4 BIOCHEMICAL RESEARCH METHODS

Algorithms for Molecular Biology Pub Date : 2018-12-08 eCollection Date: 2018-01-01 DOI:10.1186/s13015-018-0135-2

Morten Muhlig Nielsen, Paula Tataru, Tobias Madsen, Asger Hobolth, Jakob Skou Pedersen

{"title":"Regmex: a statistical tool for exploring motifs in ranked sequence lists from genomics experiments.","authors":"Morten Muhlig Nielsen, Paula Tataru, Tobias Madsen, Asger Hobolth, Jakob Skou Pedersen","doi":"10.1186/s13015-018-0135-2","DOIUrl":null,"url":null,"abstract":"Background: Motif analysis methods have long been central for studying biological function of nucleotide sequences. Functional genomics experiments extend their potential. They typically generate sequence lists ranked by an experimentally acquired functional property such as gene expression or protein binding affinity. Current motif discovery tools suffer from limitations in searching large motif spaces, and thus more complex motifs may not be included. There is thus a need for motif analysis methods that are tailored for analyzing specific complex motifs motivated by biological questions and hypotheses rather than acting as a screen based motif finding tool.Methods: We present Regmex (REGular expression Motif EXplorer), which offers several methods to identify overrepresented motifs in ranked lists of sequences. Regmex uses regular expressions to define motifs or families of motifs and embedded Markov models to calculate exact p-values for motif observations in sequences. Biases in motif distributions across ranked sequence lists are evaluated using random walks, Brownian bridges, or modified rank based statistics. A modular setup and fast analytic p value evaluations make Regmex applicable to diverse and potentially large-scale motif analysis problems.Results: We demonstrate use cases of combined motifs on simulated data and on expression data from micro RNA transfection experiments. We confirm previously obtained results and demonstrate the usability of Regmex to test a specific hypothesis about the relative location of microRNA seed sites and U-rich motifs. We further compare the tool with an existing motif discovery tool and show increased sensitivity.Conclusions: Regmex is a useful and flexible tool to analyze motif hypotheses that relates to large data sets in functional genomics. The method is available as an R package (https://github.com/muhligs/regmex).","PeriodicalId":50823,"journal":{"name":"Algorithms for Molecular Biology","volume":"13 ","pages":"17"},"PeriodicalIF":1.5000,"publicationDate":"2018-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/s13015-018-0135-2","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Algorithms for Molecular Biology","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s13015-018-0135-2","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2018/1/1 0:00:00","PubModel":"eCollection","JCR":"Q4","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}

引用次数: 4

Abstract

Background: Motif analysis methods have long been central for studying biological function of nucleotide sequences. Functional genomics experiments extend their potential. They typically generate sequence lists ranked by an experimentally acquired functional property such as gene expression or protein binding affinity. Current motif discovery tools suffer from limitations in searching large motif spaces, and thus more complex motifs may not be included. There is thus a need for motif analysis methods that are tailored for analyzing specific complex motifs motivated by biological questions and hypotheses rather than acting as a screen based motif finding tool.

Methods: We present Regmex (REGular expression Motif EXplorer), which offers several methods to identify overrepresented motifs in ranked lists of sequences. Regmex uses regular expressions to define motifs or families of motifs and embedded Markov models to calculate exact p-values for motif observations in sequences. Biases in motif distributions across ranked sequence lists are evaluated using random walks, Brownian bridges, or modified rank based statistics. A modular setup and fast analytic p value evaluations make Regmex applicable to diverse and potentially large-scale motif analysis problems.

Results: We demonstrate use cases of combined motifs on simulated data and on expression data from micro RNA transfection experiments. We confirm previously obtained results and demonstrate the usability of Regmex to test a specific hypothesis about the relative location of microRNA seed sites and U-rich motifs. We further compare the tool with an existing motif discovery tool and show increased sensitivity.

Conclusions: Regmex is a useful and flexible tool to analyze motif hypotheses that relates to large data sets in functional genomics. The method is available as an R package (https://github.com/muhligs/regmex).

Abstract Image

查看原文本刊更多论文

Regmex:用于从基因组学实验中探索排序序列列表中的基序的统计工具。

背景:基序分析方法一直是研究核苷酸序列生物学功能的核心方法。功能基因组学实验拓展了它们的潜力。它们通常根据实验获得的功能特性(如基因表达或蛋白质结合亲和力)生成序列列表。当前的基序发现工具在搜索大型基序空间时受到限制，因此可能无法包括更复杂的基序。因此，需要为分析由生物学问题和假设驱动的特定复杂基序而量身定制的基序分析方法，而不是作为基于屏幕的基序查找工具。方法:我们提出Regmex(正则表达式Motif EXplorer)，它提供了几种方法来识别序列排名列表中过度表示的Motif。Regmex使用正则表达式来定义基序或基序族，并使用嵌入式马尔可夫模型来计算序列中基序观察的精确p值。通过随机漫步、布朗桥或修改的基于秩的统计来评估排序序列列表中基序分布的偏差。模块化的设置和快速的分析p值评估使Regmex适用于各种和潜在的大规模基序分析问题。结果:我们在模拟数据和微RNA转染实验的表达数据上展示了组合基序的用例。我们证实了先前获得的结果，并证明Regmex的可用性，以测试关于microRNA种子位点和富u基序的相对位置的特定假设。我们进一步将该工具与现有的motif发现工具进行比较，并显示出更高的灵敏度。结论:Regmex是一个有用且灵活的工具，用于分析与功能基因组学中大型数据集相关的基序假设。该方法可以作为R包获得(https://github.com/muhligs/regmex)。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Algorithms for Molecular Biology 生物-生化研究方法

CiteScore

2.40

自引率

10.00%

发文量

审稿时长

>12 weeks

期刊介绍： Algorithms for Molecular Biology publishes articles on novel algorithms for biological sequence and structure analysis, phylogeny reconstruction, and combinatorial algorithms and machine learning. Areas of interest include but are not limited to: algorithms for RNA and protein structure analysis, gene prediction and genome analysis, comparative sequence analysis and alignment, phylogeny, gene expression, machine learning, and combinatorial algorithms. Where appropriate, manuscripts should describe applications to real-world data. However, pure algorithm papers are also welcome if future applications to biological data are to be expected, or if they address complexity or approximation issues of novel computational problems in molecular biology. Articles about novel software tools will be considered for publication if they contain some algorithmically interesting aspects.