GCNMF-SDA: predicting snoRNA-disease associations based on graph convolution and non-negative matrix factorization.

IF 7.7 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS
Yaowu Zhang, Xiu Jin, Xiaodan Zhang
{"title":"GCNMF-SDA: predicting snoRNA-disease associations based on graph convolution and non-negative matrix factorization.","authors":"Yaowu Zhang, Xiu Jin, Xiaodan Zhang","doi":"10.1093/bib/bbaf453","DOIUrl":null,"url":null,"abstract":"<p><p>Small nucleolar RNAs (snoRNAs) play crucial roles in a wide range of biological processes, and studying their association with diseases can enhance our understanding of disease pathogenesis. Nevertheless, current knowledge of these associations is limited traditional biological experiments are both costly and time-consuming. Consequently, developing efficient computational methods is essential for predicting potential snoRNA-disease associations. We propose a novel prediction method based on non-negative matrix factorization and graph convolution for predicting snoRNA-disease associations (GCNMF-SDA). First, five different types of similarity information from snoRNA and disease entities are introduced to fully mine and refine the feature information. Then the snoRNA and disease similarity networks are integrated using nonlinearity approach Similarity Network Fusion (SNF), while the weighted K nearest known neighbors (WKNKN) algorithm is applied to optimize the snoRNA-disease association matrix. Following this, the graph convolution module and the non-negative matrix factorization module extract disease features and snoRNA features, respectively. After extracting these features, they are combined into a composite feature vector for each snoRNA-disease pair. Finally, the composite feature vectors along with their corresponding labels, are input into a multilayer perceptron for training. Our experiments, conducted using a rigorous five-fold cross-validation approach, reveal that the GCNMF-SDA model achieves an impressive area under the receiver operating characteristic curve (AUC-ROC) of 0.9659 and an area under the precision-recall curve (AUC-PR) of 0.9522. Furthermore, most of the novel associations identified by GCNMF-SDA were validated through case studies, underscoring the method's reliability in predicting potential relationships between snoRNAs and diseases.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 5","pages":""},"PeriodicalIF":7.7000,"publicationDate":"2025-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12409419/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Briefings in bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/bib/bbaf453","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0

Abstract

Small nucleolar RNAs (snoRNAs) play crucial roles in a wide range of biological processes, and studying their association with diseases can enhance our understanding of disease pathogenesis. Nevertheless, current knowledge of these associations is limited traditional biological experiments are both costly and time-consuming. Consequently, developing efficient computational methods is essential for predicting potential snoRNA-disease associations. We propose a novel prediction method based on non-negative matrix factorization and graph convolution for predicting snoRNA-disease associations (GCNMF-SDA). First, five different types of similarity information from snoRNA and disease entities are introduced to fully mine and refine the feature information. Then the snoRNA and disease similarity networks are integrated using nonlinearity approach Similarity Network Fusion (SNF), while the weighted K nearest known neighbors (WKNKN) algorithm is applied to optimize the snoRNA-disease association matrix. Following this, the graph convolution module and the non-negative matrix factorization module extract disease features and snoRNA features, respectively. After extracting these features, they are combined into a composite feature vector for each snoRNA-disease pair. Finally, the composite feature vectors along with their corresponding labels, are input into a multilayer perceptron for training. Our experiments, conducted using a rigorous five-fold cross-validation approach, reveal that the GCNMF-SDA model achieves an impressive area under the receiver operating characteristic curve (AUC-ROC) of 0.9659 and an area under the precision-recall curve (AUC-PR) of 0.9522. Furthermore, most of the novel associations identified by GCNMF-SDA were validated through case studies, underscoring the method's reliability in predicting potential relationships between snoRNAs and diseases.

Abstract Image

Abstract Image

Abstract Image

GCNMF-SDA:基于图卷积和非负矩阵分解预测snorna -疾病关联。
小核仁rna (Small nucleolar rna, snoRNAs)在广泛的生物学过程中起着至关重要的作用,研究它们与疾病的关联可以增强我们对疾病发病机制的理解。然而,目前对这些关联的了解是有限的,传统的生物学实验既昂贵又耗时。因此,开发有效的计算方法对于预测潜在的snorna -疾病关联至关重要。我们提出了一种基于非负矩阵分解和图卷积的预测方法来预测snorna -疾病关联(GCNMF-SDA)。首先,引入来自snoRNA和疾病实体的五种不同类型的相似信息,充分挖掘和提炼特征信息。然后利用非线性相似网络融合(SNF)方法将snoRNA与疾病相似网络进行整合,并采用加权K近邻(WKNKN)算法优化snoRNA-疾病关联矩阵。随后,图卷积模块和非负矩阵分解模块分别提取疾病特征和snoRNA特征。提取这些特征后,将它们组合成每个snoRNA-disease对的复合特征向量。最后,将复合特征向量及其对应的标签输入到多层感知器中进行训练。采用严格的五重交叉验证方法进行的实验表明,GCNMF-SDA模型的受试者工作特征曲线(AUC-ROC)下面积为0.9659,精密度-召回率曲线(AUC-PR)下面积为0.9522。此外,GCNMF-SDA发现的大多数新关联都通过案例研究得到了验证,强调了该方法在预测snorna与疾病之间潜在关系方面的可靠性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Briefings in bioinformatics
Briefings in bioinformatics 生物-生化研究方法
CiteScore
13.20
自引率
13.70%
发文量
549
审稿时长
6 months
期刊介绍: Briefings in Bioinformatics is an international journal serving as a platform for researchers and educators in the life sciences. It also appeals to mathematicians, statisticians, and computer scientists applying their expertise to biological challenges. The journal focuses on reviews tailored for users of databases and analytical tools in contemporary genetics, molecular and systems biology. It stands out by offering practical assistance and guidance to non-specialists in computerized methodologies. Covering a wide range from introductory concepts to specific protocols and analyses, the papers address bacterial, plant, fungal, animal, and human data. The journal's detailed subject areas include genetic studies of phenotypes and genotypes, mapping, DNA sequencing, expression profiling, gene expression studies, microarrays, alignment methods, protein profiles and HMMs, lipids, metabolic and signaling pathways, structure determination and function prediction, phylogenetic studies, and education and training.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信