Harmonizome 3.0: integrated knowledge about genes and proteins from diverse multi-omics resources

IF 16.6 2区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY
Ido Diamant, Daniel J B Clarke, John Erol Evangelista, Nathania Lingam, Avi Ma’ayan
{"title":"Harmonizome 3.0: integrated knowledge about genes and proteins from diverse multi-omics resources","authors":"Ido Diamant, Daniel J B Clarke, John Erol Evangelista, Nathania Lingam, Avi Ma’ayan","doi":"10.1093/nar/gkae1080","DOIUrl":null,"url":null,"abstract":"By processing and abstracting diverse omics datasets into associations between genes and their attributes, the Harmonizome database enables researchers to explore and integrate knowledge about human genes from many central omics resources. Here, we introduce Harmonizome 3.0, a significant upgrade to the original Harmonizome database. The upgrade adds 26 datasets that contribute nearly 12 million associations between genes and various attribute types such as cells and tissues, diseases, and pathways. The upgrade has a dataset crossing feature to identify gene modules that are shared across datasets. To further explain significantly high gene set overlap between dataset pairs, a large language model (LLM) composes a paragraph that speculates about the reasons behind the high overlap. The upgrade also adds more data formats and visualization options. Datasets are downloadable as knowledge graph (KG) assertions and visualized with Uniform Manifold Approximation and Projection (UMAP) plots. The KG assertions can be explored via a user interface that visualizes gene–attribute associations as ball-and-stick diagrams. Overall, Harmonizome 3.0 is a rich resource of processed omics datasets that are provided in several AI-ready formats. Harmonizome 3.0 is available at https://maayanlab.cloud/Harmonizome/.","PeriodicalId":19471,"journal":{"name":"Nucleic Acids Research","volume":"250 1","pages":""},"PeriodicalIF":16.6000,"publicationDate":"2024-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Nucleic Acids Research","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/nar/gkae1080","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

By processing and abstracting diverse omics datasets into associations between genes and their attributes, the Harmonizome database enables researchers to explore and integrate knowledge about human genes from many central omics resources. Here, we introduce Harmonizome 3.0, a significant upgrade to the original Harmonizome database. The upgrade adds 26 datasets that contribute nearly 12 million associations between genes and various attribute types such as cells and tissues, diseases, and pathways. The upgrade has a dataset crossing feature to identify gene modules that are shared across datasets. To further explain significantly high gene set overlap between dataset pairs, a large language model (LLM) composes a paragraph that speculates about the reasons behind the high overlap. The upgrade also adds more data formats and visualization options. Datasets are downloadable as knowledge graph (KG) assertions and visualized with Uniform Manifold Approximation and Projection (UMAP) plots. The KG assertions can be explored via a user interface that visualizes gene–attribute associations as ball-and-stick diagrams. Overall, Harmonizome 3.0 is a rich resource of processed omics datasets that are provided in several AI-ready formats. Harmonizome 3.0 is available at https://maayanlab.cloud/Harmonizome/.
Harmonizome 3.0:来自不同多组学资源的基因和蛋白质综合知识
Harmonizome 数据库通过将各种 omics 数据集处理和抽象为基因与其属性之间的关联,使研究人员能够从许多中央 omics 资源中探索和整合有关人类基因的知识。在此,我们介绍 Harmonizome 3.0,这是对原始 Harmonizome 数据库的重大升级。此次升级增加了 26 个数据集,这些数据集提供了近 1200 万个基因与细胞和组织、疾病和通路等各种属性类型之间的关联。升级版具有数据集交叉功能,可识别跨数据集共享的基因模块。为了进一步解释数据集对之间基因组的高重合度,一个大型语言模型(LLM)会撰写一段文字,推测高重合度背后的原因。此次升级还增加了更多数据格式和可视化选项。数据集可以知识图谱(KG)断言的形式下载,并通过统一表层逼近和投影(UMAP)图进行可视化。可通过用户界面探索 KG 断言,该界面将基因属性关联可视化为球棍图。总之,Harmonizome 3.0 是一个包含丰富的经处理的 omics 数据集的资源库,以多种 AI 就绪格式提供。Harmonizome 3.0 可在 https://maayanlab.cloud/Harmonizome/ 上获取。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Nucleic Acids Research
Nucleic Acids Research 生物-生化与分子生物学
CiteScore
27.10
自引率
4.70%
发文量
1057
审稿时长
2 months
期刊介绍: Nucleic Acids Research (NAR) is a scientific journal that publishes research on various aspects of nucleic acids and proteins involved in nucleic acid metabolism and interactions. It covers areas such as chemistry and synthetic biology, computational biology, gene regulation, chromatin and epigenetics, genome integrity, repair and replication, genomics, molecular biology, nucleic acid enzymes, RNA, and structural biology. The journal also includes a Survey and Summary section for brief reviews. Additionally, each year, the first issue is dedicated to biological databases, and an issue in July focuses on web-based software resources for the biological community. Nucleic Acids Research is indexed by several services including Abstracts on Hygiene and Communicable Diseases, Animal Breeding Abstracts, Agricultural Engineering Abstracts, Agbiotech News and Information, BIOSIS Previews, CAB Abstracts, and EMBASE.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信