{"title":"EnrichDO: a global weighted model for Disease Ontology enrichment analysis.","authors":"Haixiu Yang, Hongyu Fu, Meiyi Zhang, Yangyang Liu, Yongqun Oliver He, Chao Wang, Liang Cheng","doi":"10.1093/gigascience/giaf021","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Disease Ontology (DO) has been widely studied in biomedical research and clinical practice to describe the roles of genes. DO enrichment analysis is an effective means to discover associations between genes and diseases. Compared to hundreds of Gene Ontology (GO)-based enrichment analysis methods, however, DO-based methods are relatively scarce, and most current DO-based approaches are term-for-term and thus are unable to solve over-enrichment problems caused by the \"true-path\" rule.</p><p><strong>Results: </strong>Here, we describe a novel double-weighted model, EnrichDO, which leverages the latest annotations of the human genome with DO terms and integrates DO graph topology on a global scale. Compared to classic enrichment methods (mainly for GO) and existing DO-based enrichment tools, EnrichDO performs better in both GO and DO enrichment analysis cases. It can accurately identify more specific terms, without ignoring the truly associated parent terms, as shown in the Alzheimer's disease (AD) case (AD ranked first). Moreover, both a simulated test and a data perturbation test validate the accuracy and robustness of EnrichDO. Finally, EnrichDO is applied to other types of datasets to expand its application, including gene expression profile datasets, a host gene set of microorganisms, and hallmark gene sets. Based on the findings reported here, EnrichDO shows significant improvement via all experimental results.</p><p><strong>Conclusions: </strong>EnrichDO provides an effective DO enrichment analysis model for gaining insight into the significance of a particular gene set in the context of disease. To increase the usability of EnrichDO, we have developed an R-based software package, which is freely available through Bioconductor (https://bioconductor.org/packages/release/bioc/html/EnrichDO.html) or at https://github.com/liangcheng-hrbmu/EnrichDO.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"14 ","pages":""},"PeriodicalIF":11.8000,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11945307/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"GigaScience","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/gigascience/giaf021","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Disease Ontology (DO) has been widely studied in biomedical research and clinical practice to describe the roles of genes. DO enrichment analysis is an effective means to discover associations between genes and diseases. Compared to hundreds of Gene Ontology (GO)-based enrichment analysis methods, however, DO-based methods are relatively scarce, and most current DO-based approaches are term-for-term and thus are unable to solve over-enrichment problems caused by the "true-path" rule.
Results: Here, we describe a novel double-weighted model, EnrichDO, which leverages the latest annotations of the human genome with DO terms and integrates DO graph topology on a global scale. Compared to classic enrichment methods (mainly for GO) and existing DO-based enrichment tools, EnrichDO performs better in both GO and DO enrichment analysis cases. It can accurately identify more specific terms, without ignoring the truly associated parent terms, as shown in the Alzheimer's disease (AD) case (AD ranked first). Moreover, both a simulated test and a data perturbation test validate the accuracy and robustness of EnrichDO. Finally, EnrichDO is applied to other types of datasets to expand its application, including gene expression profile datasets, a host gene set of microorganisms, and hallmark gene sets. Based on the findings reported here, EnrichDO shows significant improvement via all experimental results.
Conclusions: EnrichDO provides an effective DO enrichment analysis model for gaining insight into the significance of a particular gene set in the context of disease. To increase the usability of EnrichDO, we have developed an R-based software package, which is freely available through Bioconductor (https://bioconductor.org/packages/release/bioc/html/EnrichDO.html) or at https://github.com/liangcheng-hrbmu/EnrichDO.
期刊介绍:
GigaScience seeks to transform data dissemination and utilization in the life and biomedical sciences. As an online open-access open-data journal, it specializes in publishing "big-data" studies encompassing various fields. Its scope includes not only "omic" type data and the fields of high-throughput biology currently serviced by large public repositories, but also the growing range of more difficult-to-access data, such as imaging, neuroscience, ecology, cohort data, systems biology and other new types of large-scale shareable data.