A heterogeneous graph transformer framework for accurate cancer driver gene prediction and downstream analysis

IF 4.2 3区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS
Shuwen Xiong , Junming Zhang , Hong Luo , Yongqing Zhang , Qinyin Xiao
{"title":"A heterogeneous graph transformer framework for accurate cancer driver gene prediction and downstream analysis","authors":"Shuwen Xiong ,&nbsp;Junming Zhang ,&nbsp;Hong Luo ,&nbsp;Yongqing Zhang ,&nbsp;Qinyin Xiao","doi":"10.1016/j.ymeth.2024.09.018","DOIUrl":null,"url":null,"abstract":"<div><div>Accurately predicting cancer driver genes remains a formidable challenge amidst the burgeoning volume and intricacy of cancer genomic data. In this investigation, we propose HGTDG, an innovative heterogeneous graph transformer framework tailored for precisely predicting cancer driver genes and exploring downstream tasks. A heterogeneous graph construction module is central to the framework, which assembles a gene-protein heterogeneous network leveraging the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways and protein-protein interactions sourced from the STRING (search tool for recurring instances of neighboring genes) database. Moreover, our framework introduces a pioneering heterogeneous graph transformer module, harnessing multi-head attention mechanisms for nuanced node embedding. This transformative module proficiently captures distinct representations for both nodes and edges, thereby enriching the model's predictive capacity. Subsequently, the generated node embeddings are seamlessly integrated into a classification module, facilitating the discrimination between driver and non-driver genes. Our experimental findings evince the superiority of HGTDG over existing methodologies, as evidenced by the enhanced performance metrics, including the area under the receiver operating characteristic curves (AUROC) and the area under the precision-recall curves (AUPRC). Furthermore, the downstream analysis utilizing the newly identified cancer driver genes underscores the efficacy and versatility of our proposed framework.</div></div>","PeriodicalId":390,"journal":{"name":"Methods","volume":"232 ","pages":"Pages 9-17"},"PeriodicalIF":4.2000,"publicationDate":"2024-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Methods","FirstCategoryId":"99","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1046202324002160","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0

Abstract

Accurately predicting cancer driver genes remains a formidable challenge amidst the burgeoning volume and intricacy of cancer genomic data. In this investigation, we propose HGTDG, an innovative heterogeneous graph transformer framework tailored for precisely predicting cancer driver genes and exploring downstream tasks. A heterogeneous graph construction module is central to the framework, which assembles a gene-protein heterogeneous network leveraging the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways and protein-protein interactions sourced from the STRING (search tool for recurring instances of neighboring genes) database. Moreover, our framework introduces a pioneering heterogeneous graph transformer module, harnessing multi-head attention mechanisms for nuanced node embedding. This transformative module proficiently captures distinct representations for both nodes and edges, thereby enriching the model's predictive capacity. Subsequently, the generated node embeddings are seamlessly integrated into a classification module, facilitating the discrimination between driver and non-driver genes. Our experimental findings evince the superiority of HGTDG over existing methodologies, as evidenced by the enhanced performance metrics, including the area under the receiver operating characteristic curves (AUROC) and the area under the precision-recall curves (AUPRC). Furthermore, the downstream analysis utilizing the newly identified cancer driver genes underscores the efficacy and versatility of our proposed framework.
用于准确预测癌症驱动基因和下游分析的异构图转换器框架
随着癌症基因组数据量的激增和复杂性的增加,准确预测癌症驱动基因仍然是一项艰巨的挑战。在这项研究中,我们提出了 HGTDG,这是一个创新的异构图转换器框架,专为精确预测癌症驱动基因和探索下游任务而量身定制。异构图构建模块是该框架的核心,它利用《京都基因与基因组百科全书》(KEGG)中的通路和 STRING(相邻基因重复实例搜索工具)数据库中的蛋白质-蛋白质相互作用,构建基因-蛋白质异构网络。此外,我们的框架还引入了一个开创性的异构图转换器模块,利用多头关注机制进行细微的节点嵌入。这一转换模块能熟练捕捉节点和边的不同表征,从而丰富模型的预测能力。随后,生成的节点嵌入被无缝集成到分类模块中,从而有助于区分驱动基因和非驱动基因。我们的实验结果表明,与现有方法相比,HGTDG 具有更优越的性能指标,包括接收者操作特征曲线下面积(AUROC)和精度-召回曲线下面积(AUPRC)。此外,利用新发现的癌症驱动基因进行的下游分析也凸显了我们提出的框架的有效性和多功能性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Methods
Methods 生物-生化研究方法
CiteScore
9.80
自引率
2.10%
发文量
222
审稿时长
11.3 weeks
期刊介绍: Methods focuses on rapidly developing techniques in the experimental biological and medical sciences. Each topical issue, organized by a guest editor who is an expert in the area covered, consists solely of invited quality articles by specialist authors, many of them reviews. Issues are devoted to specific technical approaches with emphasis on clear detailed descriptions of protocols that allow them to be reproduced easily. The background information provided enables researchers to understand the principles underlying the methods; other helpful sections include comparisons of alternative methods giving the advantages and disadvantages of particular methods, guidance on avoiding potential pitfalls, and suggestions for troubleshooting.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信