基于均匀嵌入方法的RNA知识图谱分析。

IF 2.8 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY
Bioinformatics advances Pub Date : 2025-05-13 eCollection Date: 2025-01-01 DOI:10.1093/bioadv/vbaf109
Francesco Torgano, Mauricio Soto Gomez, Matteo Zignani, Jessica Gliozzo, Emanuele Cavalleri, Marco Mesiti, Elena Casiraghi, Giorgio Valentini
{"title":"基于均匀嵌入方法的RNA知识图谱分析。","authors":"Francesco Torgano, Mauricio Soto Gomez, Matteo Zignani, Jessica Gliozzo, Emanuele Cavalleri, Marco Mesiti, Elena Casiraghi, Giorgio Valentini","doi":"10.1093/bioadv/vbaf109","DOIUrl":null,"url":null,"abstract":"<p><strong>Motivation: </strong>We recently introduced RNA-knowledge graph (KG), an ontology-based KG that integrates biological data on RNAs from over 60 public databases. RNA-KG captures functional relationships and interactions between RNA molecules and other biomolecules, chemicals, and biomedical concepts such as diseases and phenotypes, all represented within graph-structured bio-ontologies. We present the first comprehensive computational analysis of RNA-KG, evaluating the potential of graph representation learning and machine learning models to predict node types and edges within the graph.</p><p><strong>Results: </strong>We performed node classification experiments to predict up to 81 distinct node types, and performed both generic- and specific-edge prediction tasks. Generic-edge prediction focused on identifying the presence of an edge irrespective of its type, while specific-edge prediction targeted specific interactions between ncRNAs, e.g. between microRNAs (miRNA-miRNA) or between small interfering RNA-messenger and RNA-messenger molecules (siRNA-mRNA), or relationships between ncRNA and biomedical concepts, e.g. miRNA-disease or lncRNA-Gene Ontology term relationships. Using embedding methods for homogeneous graphs, such as Large-scale Information Network Embedding (LINE) and node2vec, in combination with machine learning models like decision trees and random forests, we achieved balanced accuracy exceeding 90% for the 20 most common node types and over 80% for most specific-edge prediction tasks. These results show that simple embedding methods for homogeneous graphs can successfully predict nodes and edges of the RNA-KG, paving the way to discover novel ncRNA interactions and laying the foundation for further exploration, and utilization of this rich information source to enhance prediction accuracy and support further research into the \"RNA world.\"</p><p><strong>Availability and implementation: </strong>Python code to reproduce the experiments is available at https://github.com/AnacletoLAB/RNA-KG_homogeneous_emb_analysis.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf109"},"PeriodicalIF":2.8000,"publicationDate":"2025-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12150776/pdf/","citationCount":"0","resultStr":"{\"title\":\"RNA knowledge-graph analysis through homogeneous embedding methods.\",\"authors\":\"Francesco Torgano, Mauricio Soto Gomez, Matteo Zignani, Jessica Gliozzo, Emanuele Cavalleri, Marco Mesiti, Elena Casiraghi, Giorgio Valentini\",\"doi\":\"10.1093/bioadv/vbaf109\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Motivation: </strong>We recently introduced RNA-knowledge graph (KG), an ontology-based KG that integrates biological data on RNAs from over 60 public databases. RNA-KG captures functional relationships and interactions between RNA molecules and other biomolecules, chemicals, and biomedical concepts such as diseases and phenotypes, all represented within graph-structured bio-ontologies. We present the first comprehensive computational analysis of RNA-KG, evaluating the potential of graph representation learning and machine learning models to predict node types and edges within the graph.</p><p><strong>Results: </strong>We performed node classification experiments to predict up to 81 distinct node types, and performed both generic- and specific-edge prediction tasks. Generic-edge prediction focused on identifying the presence of an edge irrespective of its type, while specific-edge prediction targeted specific interactions between ncRNAs, e.g. between microRNAs (miRNA-miRNA) or between small interfering RNA-messenger and RNA-messenger molecules (siRNA-mRNA), or relationships between ncRNA and biomedical concepts, e.g. miRNA-disease or lncRNA-Gene Ontology term relationships. Using embedding methods for homogeneous graphs, such as Large-scale Information Network Embedding (LINE) and node2vec, in combination with machine learning models like decision trees and random forests, we achieved balanced accuracy exceeding 90% for the 20 most common node types and over 80% for most specific-edge prediction tasks. These results show that simple embedding methods for homogeneous graphs can successfully predict nodes and edges of the RNA-KG, paving the way to discover novel ncRNA interactions and laying the foundation for further exploration, and utilization of this rich information source to enhance prediction accuracy and support further research into the \\\"RNA world.\\\"</p><p><strong>Availability and implementation: </strong>Python code to reproduce the experiments is available at https://github.com/AnacletoLAB/RNA-KG_homogeneous_emb_analysis.</p>\",\"PeriodicalId\":72368,\"journal\":{\"name\":\"Bioinformatics advances\",\"volume\":\"5 1\",\"pages\":\"vbaf109\"},\"PeriodicalIF\":2.8000,\"publicationDate\":\"2025-05-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12150776/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Bioinformatics advances\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1093/bioadv/vbaf109\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q2\",\"JCRName\":\"MATHEMATICAL & COMPUTATIONAL BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bioinformatics advances","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/bioadv/vbaf109","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}
引用次数: 0

摘要

动机:我们最近介绍了RNA-knowledge graph (KG),这是一个基于本体的KG,它集成了来自60多个公共数据库的rna生物学数据。RNA- kg捕获RNA分子与其他生物分子、化学物质和生物医学概念(如疾病和表型)之间的功能关系和相互作用,所有这些都在图结构的生物本体中表示。我们首次对RNA-KG进行了全面的计算分析,评估了图表示学习和机器学习模型在预测图中的节点类型和边方面的潜力。结果:我们进行了节点分类实验,预测了多达81种不同的节点类型,并执行了通用边缘和特定边缘预测任务。通用边缘预测侧重于识别边缘的存在,而不考虑其类型,而特定边缘预测针对ncRNA之间的特定相互作用,例如microrna之间(miRNA-miRNA)或小干扰rna -信使和rna -信使分子之间(siRNA-mRNA),或ncRNA与生物医学概念之间的关系,例如mirna -疾病或lncrna -基因本体术语关系。使用大规模信息网络嵌入(LINE)和node2vec等同构图的嵌入方法,结合决策树和随机森林等机器学习模型,我们在20种最常见的节点类型中实现了超过90%的平衡精度,在大多数特定边缘预测任务中实现了超过80%的平衡精度。这些结果表明,简单的同质图嵌入方法可以成功预测RNA- kg的节点和边缘,为发现新的ncRNA相互作用铺平了道路,为进一步探索奠定了基础,并利用这一丰富的信息源提高预测精度,支持进一步研究“RNA世界”。可用性和实现:重现实验的Python代码可从https://github.com/AnacletoLAB/RNA-KG_homogeneous_emb_analysis获得。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
RNA knowledge-graph analysis through homogeneous embedding methods.

Motivation: We recently introduced RNA-knowledge graph (KG), an ontology-based KG that integrates biological data on RNAs from over 60 public databases. RNA-KG captures functional relationships and interactions between RNA molecules and other biomolecules, chemicals, and biomedical concepts such as diseases and phenotypes, all represented within graph-structured bio-ontologies. We present the first comprehensive computational analysis of RNA-KG, evaluating the potential of graph representation learning and machine learning models to predict node types and edges within the graph.

Results: We performed node classification experiments to predict up to 81 distinct node types, and performed both generic- and specific-edge prediction tasks. Generic-edge prediction focused on identifying the presence of an edge irrespective of its type, while specific-edge prediction targeted specific interactions between ncRNAs, e.g. between microRNAs (miRNA-miRNA) or between small interfering RNA-messenger and RNA-messenger molecules (siRNA-mRNA), or relationships between ncRNA and biomedical concepts, e.g. miRNA-disease or lncRNA-Gene Ontology term relationships. Using embedding methods for homogeneous graphs, such as Large-scale Information Network Embedding (LINE) and node2vec, in combination with machine learning models like decision trees and random forests, we achieved balanced accuracy exceeding 90% for the 20 most common node types and over 80% for most specific-edge prediction tasks. These results show that simple embedding methods for homogeneous graphs can successfully predict nodes and edges of the RNA-KG, paving the way to discover novel ncRNA interactions and laying the foundation for further exploration, and utilization of this rich information source to enhance prediction accuracy and support further research into the "RNA world."

Availability and implementation: Python code to reproduce the experiments is available at https://github.com/AnacletoLAB/RNA-KG_homogeneous_emb_analysis.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
1.60
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信