{"title":"GBDTSVM: Combined Support Vector Machine and Gradient Boosting Decision Tree Framework for efficient snoRNA-disease association prediction","authors":"Ummay Maria Muna , Fahim Hafiz , Shanta Biswas , Riasat Azim","doi":"10.1016/j.compbiomed.2025.110219","DOIUrl":null,"url":null,"abstract":"<div><div>Small nucleolar RNAs (snoRNAs) are increasingly recognized for their critical role in the pathogenesis and characterization of various human diseases. Consequently, the precise identification of snoRNA-disease associations (SDAs) is essential for the progression of diseases and the advancement of treatment strategies. However, conventional biological experimental approaches are costly, time-consuming, and resource-intensive; therefore, machine learning-based computational methods offer a promising solution to mitigate these limitations. This paper proposes a model called ‘GBDTSVM’, representing a novel and efficient machine learning approach for predicting snoRNA-disease associations by leveraging a Gradient Boosting Decision Tree (GBDT) and Support Vector Machine (SVM). ‘GBDTSVM’ effectively extracts integrated snoRNA-disease feature representations utilizing GBDT, and SVM is subsequently utilized to classify and identify potential associations. Furthermore, the method enhances the accuracy of these predictions by incorporating Gaussian integrated profile kernel similarity for both snoRNAs and diseases. Experimental evaluation of the GBDTSVM model demonstrates superior performance compared to state-of-the-art methods in the field, achieving an AUROC of 0.96 and an AUPRC of 0.95 on the ‘MDRF’ dataset. Moreover, our model shows superior performance on two more datasets named ‘LSGT’ and ‘PsnoD’. Additionally, a case study conducted on the predicted snoRNA-disease associations verified the top-ranked snoRNAs across twelve prevalent diseases, further validating the efficacy of the GBDTSVM approach. These results underscore the model’s potential as a robust tool for advancing snoRNA-related disease research. Source codes and datasets for our proposed framework can be obtained from: <span><span>https://github.com/mariamuna04/gbdtsvm</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":10578,"journal":{"name":"Computers in biology and medicine","volume":"192 ","pages":"Article 110219"},"PeriodicalIF":7.0000,"publicationDate":"2025-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers in biology and medicine","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0010482525005700","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Small nucleolar RNAs (snoRNAs) are increasingly recognized for their critical role in the pathogenesis and characterization of various human diseases. Consequently, the precise identification of snoRNA-disease associations (SDAs) is essential for the progression of diseases and the advancement of treatment strategies. However, conventional biological experimental approaches are costly, time-consuming, and resource-intensive; therefore, machine learning-based computational methods offer a promising solution to mitigate these limitations. This paper proposes a model called ‘GBDTSVM’, representing a novel and efficient machine learning approach for predicting snoRNA-disease associations by leveraging a Gradient Boosting Decision Tree (GBDT) and Support Vector Machine (SVM). ‘GBDTSVM’ effectively extracts integrated snoRNA-disease feature representations utilizing GBDT, and SVM is subsequently utilized to classify and identify potential associations. Furthermore, the method enhances the accuracy of these predictions by incorporating Gaussian integrated profile kernel similarity for both snoRNAs and diseases. Experimental evaluation of the GBDTSVM model demonstrates superior performance compared to state-of-the-art methods in the field, achieving an AUROC of 0.96 and an AUPRC of 0.95 on the ‘MDRF’ dataset. Moreover, our model shows superior performance on two more datasets named ‘LSGT’ and ‘PsnoD’. Additionally, a case study conducted on the predicted snoRNA-disease associations verified the top-ranked snoRNAs across twelve prevalent diseases, further validating the efficacy of the GBDTSVM approach. These results underscore the model’s potential as a robust tool for advancing snoRNA-related disease research. Source codes and datasets for our proposed framework can be obtained from: https://github.com/mariamuna04/gbdtsvm.
期刊介绍:
Computers in Biology and Medicine is an international forum for sharing groundbreaking advancements in the use of computers in bioscience and medicine. This journal serves as a medium for communicating essential research, instruction, ideas, and information regarding the rapidly evolving field of computer applications in these domains. By encouraging the exchange of knowledge, we aim to facilitate progress and innovation in the utilization of computers in biology and medicine.