GBDTSVM: Combined Support Vector Machine and Gradient Boosting Decision Tree Framework for efficient snoRNA-disease association prediction

IF 7 2区 医学 Q1 BIOLOGY
Ummay Maria Muna , Fahim Hafiz , Shanta Biswas , Riasat Azim
{"title":"GBDTSVM: Combined Support Vector Machine and Gradient Boosting Decision Tree Framework for efficient snoRNA-disease association prediction","authors":"Ummay Maria Muna ,&nbsp;Fahim Hafiz ,&nbsp;Shanta Biswas ,&nbsp;Riasat Azim","doi":"10.1016/j.compbiomed.2025.110219","DOIUrl":null,"url":null,"abstract":"<div><div>Small nucleolar RNAs (snoRNAs) are increasingly recognized for their critical role in the pathogenesis and characterization of various human diseases. Consequently, the precise identification of snoRNA-disease associations (SDAs) is essential for the progression of diseases and the advancement of treatment strategies. However, conventional biological experimental approaches are costly, time-consuming, and resource-intensive; therefore, machine learning-based computational methods offer a promising solution to mitigate these limitations. This paper proposes a model called ‘GBDTSVM’, representing a novel and efficient machine learning approach for predicting snoRNA-disease associations by leveraging a Gradient Boosting Decision Tree (GBDT) and Support Vector Machine (SVM). ‘GBDTSVM’ effectively extracts integrated snoRNA-disease feature representations utilizing GBDT, and SVM is subsequently utilized to classify and identify potential associations. Furthermore, the method enhances the accuracy of these predictions by incorporating Gaussian integrated profile kernel similarity for both snoRNAs and diseases. Experimental evaluation of the GBDTSVM model demonstrates superior performance compared to state-of-the-art methods in the field, achieving an AUROC of 0.96 and an AUPRC of 0.95 on the ‘MDRF’ dataset. Moreover, our model shows superior performance on two more datasets named ‘LSGT’ and ‘PsnoD’. Additionally, a case study conducted on the predicted snoRNA-disease associations verified the top-ranked snoRNAs across twelve prevalent diseases, further validating the efficacy of the GBDTSVM approach. These results underscore the model’s potential as a robust tool for advancing snoRNA-related disease research. Source codes and datasets for our proposed framework can be obtained from: <span><span>https://github.com/mariamuna04/gbdtsvm</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":10578,"journal":{"name":"Computers in biology and medicine","volume":"192 ","pages":"Article 110219"},"PeriodicalIF":7.0000,"publicationDate":"2025-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers in biology and medicine","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0010482525005700","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Small nucleolar RNAs (snoRNAs) are increasingly recognized for their critical role in the pathogenesis and characterization of various human diseases. Consequently, the precise identification of snoRNA-disease associations (SDAs) is essential for the progression of diseases and the advancement of treatment strategies. However, conventional biological experimental approaches are costly, time-consuming, and resource-intensive; therefore, machine learning-based computational methods offer a promising solution to mitigate these limitations. This paper proposes a model called ‘GBDTSVM’, representing a novel and efficient machine learning approach for predicting snoRNA-disease associations by leveraging a Gradient Boosting Decision Tree (GBDT) and Support Vector Machine (SVM). ‘GBDTSVM’ effectively extracts integrated snoRNA-disease feature representations utilizing GBDT, and SVM is subsequently utilized to classify and identify potential associations. Furthermore, the method enhances the accuracy of these predictions by incorporating Gaussian integrated profile kernel similarity for both snoRNAs and diseases. Experimental evaluation of the GBDTSVM model demonstrates superior performance compared to state-of-the-art methods in the field, achieving an AUROC of 0.96 and an AUPRC of 0.95 on the ‘MDRF’ dataset. Moreover, our model shows superior performance on two more datasets named ‘LSGT’ and ‘PsnoD’. Additionally, a case study conducted on the predicted snoRNA-disease associations verified the top-ranked snoRNAs across twelve prevalent diseases, further validating the efficacy of the GBDTSVM approach. These results underscore the model’s potential as a robust tool for advancing snoRNA-related disease research. Source codes and datasets for our proposed framework can be obtained from: https://github.com/mariamuna04/gbdtsvm.

Abstract Image

GBDTSVM:基于支持向量机和梯度增强决策树框架的有效snorna -疾病关联预测
小核仁rna (Small nucleolar rna, snoRNAs)因其在各种人类疾病的发病机制和特征中发挥的关键作用而越来越受到人们的认可。因此,准确识别snorna -疾病关联(SDAs)对于疾病的进展和治疗策略的推进至关重要。然而,传统的生物实验方法成本高、耗时长、资源密集;因此,基于机器学习的计算方法为减轻这些限制提供了一个有希望的解决方案。本文提出了一个名为“GBDTSVM”的模型,该模型代表了一种利用梯度增强决策树(GBDT)和支持向量机(SVM)预测snorna疾病关联的新颖高效的机器学习方法。“GBDTSVM”利用GBDT有效地提取了snorna -疾病的综合特征表示,随后利用SVM对潜在关联进行分类和识别。此外,该方法通过结合snorna和疾病的高斯积分剖面核相似性来提高这些预测的准确性。与该领域最先进的方法相比,GBDTSVM模型的实验评估显示出优越的性能,在“MDRF”数据集上实现了0.96的AUROC和0.95的AUPRC。此外,我们的模型在另外两个名为“LSGT”和“PsnoD”的数据集上显示出优越的性能。此外,对预测的snorna -疾病关联进行的一项案例研究验证了12种流行疾病中排名最高的snorna,进一步验证了GBDTSVM方法的有效性。这些结果强调了该模型作为推进snorna相关疾病研究的强大工具的潜力。我们提出的框架的源代码和数据集可以从https://github.com/mariamuna04/gbdtsvm获得。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Computers in biology and medicine
Computers in biology and medicine 工程技术-工程:生物医学
CiteScore
11.70
自引率
10.40%
发文量
1086
审稿时长
74 days
期刊介绍: Computers in Biology and Medicine is an international forum for sharing groundbreaking advancements in the use of computers in bioscience and medicine. This journal serves as a medium for communicating essential research, instruction, ideas, and information regarding the rapidly evolving field of computer applications in these domains. By encouraging the exchange of knowledge, we aim to facilitate progress and innovation in the utilization of computers in biology and medicine.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信