FDR-FET:一种优化基因集富集分析方法。

Q2 Biochemistry, Genetics and Molecular Biology

Advances and Applications in Bioinformatics and Chemistry Pub Date : 2011-01-01 Epub Date: 2011-03-15 DOI:10.2147/AABC.S15840

Rui-Ru Ji, Karl-Heinz Ott, Roumyana Yordanova, Robert E Bruccoleri

{"title":"FDR-FET:一种优化基因集富集分析方法。","authors":"Rui-Ru Ji, Karl-Heinz Ott, Roumyana Yordanova, Robert E Bruccoleri","doi":"10.2147/AABC.S15840","DOIUrl":null,"url":null,"abstract":"Gene set enrichment analysis for analyzing large profiling and screening experiments can reveal unifying biological schemes based on previously accumulated knowledge represented as \"gene sets\". Most of the existing implementations use a fixed fold-change or P value cutoff to generate regulated gene lists. However, the threshold selection in most cases is arbitrary, and has a significant effect on the test outcome and interpretation of the experiment. We developed a new gene set enrichment analysis method, ie, FDR-FET, which dynamically optimizes the threshold choice and improves the sensitivity and selectivity of gene set enrichment analysis. The procedure translates experimental results into a series of regulated gene lists at multiple false discovery rate (FDR) cutoffs, and computes the P value of the overrepresentation of a gene set using a Fisher's exact test (FET) in each of these gene lists. The lowest P value is retained to represent the significance of the gene set. We also implemented improved methods to define a more relevant global reference set for the FET. We demonstrate the validity of the method using a published microarray study of three protease inhibitors of the human immunodeficiency virus and compare the results with those from other popular gene set enrichment analysis algorithms. Our results show that combining FDR with multiple cutoffs allows us to control the error while retaining genes that increase information content. We conclude that FDR-FET can selectively identify significant affected biological processes. Our method can be used for any user-generated gene list in the area of transcriptome, proteome, and other biological and scientific applications.","PeriodicalId":53584,"journal":{"name":"Advances and Applications in Bioinformatics and Chemistry","volume":"4 ","pages":"37-42"},"PeriodicalIF":0.0000,"publicationDate":"2011-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.2147/AABC.S15840","citationCount":"3","resultStr":"{\"title\":\"FDR-FET: an optimizing gene set enrichment analysis method.\",\"authors\":\"Rui-Ru Ji, Karl-Heinz Ott, Roumyana Yordanova, Robert E Bruccoleri\",\"doi\":\"10.2147/AABC.S15840\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Gene set enrichment analysis for analyzing large profiling and screening experiments can reveal unifying biological schemes based on previously accumulated knowledge represented as \\\"gene sets\\\". Most of the existing implementations use a fixed fold-change or P value cutoff to generate regulated gene lists. However, the threshold selection in most cases is arbitrary, and has a significant effect on the test outcome and interpretation of the experiment. We developed a new gene set enrichment analysis method, ie, FDR-FET, which dynamically optimizes the threshold choice and improves the sensitivity and selectivity of gene set enrichment analysis. The procedure translates experimental results into a series of regulated gene lists at multiple false discovery rate (FDR) cutoffs, and computes the P value of the overrepresentation of a gene set using a Fisher's exact test (FET) in each of these gene lists. The lowest P value is retained to represent the significance of the gene set. We also implemented improved methods to define a more relevant global reference set for the FET. We demonstrate the validity of the method using a published microarray study of three protease inhibitors of the human immunodeficiency virus and compare the results with those from other popular gene set enrichment analysis algorithms. Our results show that combining FDR with multiple cutoffs allows us to control the error while retaining genes that increase information content. We conclude that FDR-FET can selectively identify significant affected biological processes. Our method can be used for any user-generated gene list in the area of transcriptome, proteome, and other biological and scientific applications.\",\"PeriodicalId\":53584,\"journal\":{\"name\":\"Advances and Applications in Bioinformatics and Chemistry\",\"volume\":\"4 \",\"pages\":\"37-42\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.2147/AABC.S15840\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Advances and Applications in Bioinformatics and Chemistry\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.2147/AABC.S15840\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2011/3/15 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q2\",\"JCRName\":\"Biochemistry, Genetics and Molecular Biology\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Advances and Applications in Bioinformatics and Chemistry","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2147/AABC.S15840","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2011/3/15 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"Biochemistry, Genetics and Molecular Biology","Score":null,"Total":0}

引用次数: 3

摘要

基因集富集分析用于分析大型分析和筛选实验，可以揭示基于先前积累的知识的统一生物学方案，这些知识表示为“基因集”。大多数现有的实现使用固定的折叠变化或P值截止来生成受调节的基因列表。然而，在大多数情况下，阈值的选择是任意的，并且对测试结果和实验的解释有重大影响。本文提出了一种新的基因集富集分析方法FDR-FET，该方法动态优化了阈值选择，提高了基因集富集分析的灵敏度和选择性。该程序在多个错误发现率(FDR)截止点将实验结果转换为一系列受调节的基因列表，并在每个基因列表中使用Fisher精确测试(FET)计算基因集的过度表示的P值。保留最低P值以表示该基因集的显著性。我们还实现了改进的方法来定义一个更相关的FET全局参考集。我们利用已发表的三种人类免疫缺陷病毒蛋白酶抑制剂的微阵列研究证明了该方法的有效性，并将结果与其他流行的基因集富集分析算法进行了比较。我们的研究结果表明，将FDR与多个截止值相结合，可以在保留增加信息含量的基因的同时控制误差。我们得出结论，FDR-FET可以选择性地识别重要的受影响的生物过程。我们的方法可以用于转录组、蛋白质组和其他生物和科学应用领域的任何用户生成的基因列表。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

FDR-FET: an optimizing gene set enrichment analysis method.

查看原文本刊更多论文

FDR-FET: an optimizing gene set enrichment analysis method.

Gene set enrichment analysis for analyzing large profiling and screening experiments can reveal unifying biological schemes based on previously accumulated knowledge represented as "gene sets". Most of the existing implementations use a fixed fold-change or P value cutoff to generate regulated gene lists. However, the threshold selection in most cases is arbitrary, and has a significant effect on the test outcome and interpretation of the experiment. We developed a new gene set enrichment analysis method, ie, FDR-FET, which dynamically optimizes the threshold choice and improves the sensitivity and selectivity of gene set enrichment analysis. The procedure translates experimental results into a series of regulated gene lists at multiple false discovery rate (FDR) cutoffs, and computes the P value of the overrepresentation of a gene set using a Fisher's exact test (FET) in each of these gene lists. The lowest P value is retained to represent the significance of the gene set. We also implemented improved methods to define a more relevant global reference set for the FET. We demonstrate the validity of the method using a published microarray study of three protease inhibitors of the human immunodeficiency virus and compare the results with those from other popular gene set enrichment analysis algorithms. Our results show that combining FDR with multiple cutoffs allows us to control the error while retaining genes that increase information content. We conclude that FDR-FET can selectively identify significant affected biological processes. Our method can be used for any user-generated gene list in the area of transcriptome, proteome, and other biological and scientific applications.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Advances and Applications in Bioinformatics and Chemistry Biochemistry, Genetics and Molecular Biology-Biochemistry, Genetics and Molecular Biology (miscellaneous)

CiteScore

6.50

自引率

0.00%

发文量

审稿时长

16 weeks