DMGAT: predicting ncRNA-drug resistance associations based on diffusion map and heterogeneous graph attention network.

IF 6.8 2区生物学 Q1 BIOCHEMICAL RESEARCH METHODS

Briefings in bioinformatics Pub Date : 2025-03-04 DOI:10.1093/bib/bbaf179

Tingyu Liu, Qiuhao Chen, Renjie Liu, Yuzhi Sun, Yadong Wang, Yan Zhu, Tianyi Zhao

{"title":"DMGAT: predicting ncRNA-drug resistance associations based on diffusion map and heterogeneous graph attention network.","authors":"Tingyu Liu, Qiuhao Chen, Renjie Liu, Yuzhi Sun, Yadong Wang, Yan Zhu, Tianyi Zhao","doi":"10.1093/bib/bbaf179","DOIUrl":null,"url":null,"abstract":"<p><p>Non-coding RNAs (ncRNAs) play crucial roles in drug resistance and sensitivity, making them important biomarkers and therapeutic targets. However, predicting ncRNA-drug associations is challenging due to issues such as dataset imbalance and sparsity, limiting the identification of robust biomarkers. Existing models often fall short in capturing local and global sequence information, limiting the reliability of predictions. This study introduces DMGAT (diffusion map and heterogeneous graph attention network), a novel deep learning model designed to predict ncRNA-drug associations. DMGAT integrates diffusion maps for sequence embedding, graph convolutional networks for feature extraction, and GAT for heterogeneous information fusion. To address dataset imbalance, the model incorporates sensitivity associations and employs a random forest classifier to select reliable negative samples. DMGAT embeds ncRNA sequences and drug SMILES using the word2vec technique, capturing local and global sequence information. The model constructs a heterogeneous network by combining sequence similarity and Gaussian Interaction Profile kernel similarity, providing a comprehensive representation of ncRNA-drug interactions. Evaluated through five-fold cross-validation on a curated dataset from NoncoRNA and ncDR, DMGAT outperforms seven state-of-the-art methods, achieving the highest area under the receiver operating characteristic curve (0.8964), area under the precision-recall curve (0.8984), recall (0.9576), and F1-score (0.8285). The raw data are released to Zenodo with identifier 13929676. The source code of DMGAT is available at https://github.com/liutingyu0616/DMGAT/tree/main.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 2","pages":""},"PeriodicalIF":6.8000,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12008124/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Briefings in bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/bib/bbaf179","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

Non-coding RNAs (ncRNAs) play crucial roles in drug resistance and sensitivity, making them important biomarkers and therapeutic targets. However, predicting ncRNA-drug associations is challenging due to issues such as dataset imbalance and sparsity, limiting the identification of robust biomarkers. Existing models often fall short in capturing local and global sequence information, limiting the reliability of predictions. This study introduces DMGAT (diffusion map and heterogeneous graph attention network), a novel deep learning model designed to predict ncRNA-drug associations. DMGAT integrates diffusion maps for sequence embedding, graph convolutional networks for feature extraction, and GAT for heterogeneous information fusion. To address dataset imbalance, the model incorporates sensitivity associations and employs a random forest classifier to select reliable negative samples. DMGAT embeds ncRNA sequences and drug SMILES using the word2vec technique, capturing local and global sequence information. The model constructs a heterogeneous network by combining sequence similarity and Gaussian Interaction Profile kernel similarity, providing a comprehensive representation of ncRNA-drug interactions. Evaluated through five-fold cross-validation on a curated dataset from NoncoRNA and ncDR, DMGAT outperforms seven state-of-the-art methods, achieving the highest area under the receiver operating characteristic curve (0.8964), area under the precision-recall curve (0.8984), recall (0.9576), and F1-score (0.8285). The raw data are released to Zenodo with identifier 13929676. The source code of DMGAT is available at https://github.com/liutingyu0616/DMGAT/tree/main.

查看原文本刊更多论文

DMGAT：基于扩散图和异质图注意网络的ncrna -耐药关联预测。

非编码rna （ncRNAs）在耐药和敏感性中起着至关重要的作用，使其成为重要的生物标志物和治疗靶点。然而，由于数据集不平衡和稀疏性等问题，预测ncrna与药物的关联是具有挑战性的，这限制了对强大生物标志物的识别。现有的模型在捕获局部和全局序列信息方面往往存在不足，从而限制了预测的可靠性。本研究引入DMGAT (diffusion map and heterogeneous graph attention network)，这是一种新的深度学习模型，旨在预测ncrna与药物的关联。DMGAT集成了用于序列嵌入的扩散图、用于特征提取的图卷积网络和用于异构信息融合的GAT。为了解决数据不平衡问题，该模型结合了敏感性关联，并采用随机森林分类器选择可靠的负样本。DMGAT使用word2vec技术嵌入ncRNA序列和药物SMILES，捕获局部和全局序列信息。该模型结合序列相似度和高斯相互作用谱核相似度构建异构网络，提供了ncrna -药物相互作用的综合表征。通过对来自NoncoRNA和ncDR的精选数据集进行五倍交叉验证，DMGAT优于7种最先进的方法，实现了最高的接收者工作特征曲线下面积（0.8964），精度-召回曲线下面积（0.8984），召回（0.9576）和f1得分（0.8285）。原始数据以标识符13929676发布到Zenodo。DMGAT的源代码可从https://github.com/liutingyu0616/DMGAT/tree/main获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Briefings in bioinformatics 生物-生化研究方法

CiteScore

13.20

自引率

13.70%

发文量

549

审稿时长

6 months

期刊介绍： Briefings in Bioinformatics is an international journal serving as a platform for researchers and educators in the life sciences. It also appeals to mathematicians, statisticians, and computer scientists applying their expertise to biological challenges. The journal focuses on reviews tailored for users of databases and analytical tools in contemporary genetics, molecular and systems biology. It stands out by offering practical assistance and guidance to non-specialists in computerized methodologies. Covering a wide range from introductory concepts to specific protocols and analyses, the papers address bacterial, plant, fungal, animal, and human data. The journal's detailed subject areas include genetic studies of phenotypes and genotypes, mapping, DNA sequencing, expression profiling, gene expression studies, microarrays, alignment methods, protein profiles and HMMs, lipids, metabolic and signaling pathways, structure determination and function prediction, phylogenetic studies, and education and training.