{"title":"基于多组学数据发现泛癌症驱动基因的特征提取框架","authors":"Xiaomeng Xue, Feng Li, J. Shang, Lingyun Dai, Daohui Ge, Qianqian Ren","doi":"10.1002/qub2.40","DOIUrl":null,"url":null,"abstract":"The identification of tumor driver genes facilitates accurate cancer diagnosis and treatment, playing a key role in precision oncology, along with gene signaling, regulation, and their interaction with protein complexes. To tackle the challenge of distinguishing driver genes from a large number of genomic data, we construct a feature extraction framework for discovering pan‐cancer driver genes based on multi‐omics data (mutations, gene expression, copy number variants, and DNA methylation) combined with protein–protein interaction (PPI) networks. Using a network propagation algorithm, we mine functional information among nodes in the PPI network, focusing on genes with weak node information to represent specific cancer information. From these functional features, we extract distribution features of pan‐cancer data, pan‐cancer TOPSIS features of functional features using the ideal solution method, and SetExpan features of pan‐cancer data from the gene functional features, a method to rank pan‐cancer data based on the average inverse rank. These features represent the common message of pan‐cancer. Finally, we use the lightGBM classification algorithm for gene prediction. Experimental results show that our method outperforms existing methods in terms of the area under the check precision‐recall curve (AUPRC) and demonstrates better performance across different PPI networks. This indicates our framework’s effectiveness in predicting potential cancer genes, offering valuable insights for the diagnosis and treatment of tumors.","PeriodicalId":45660,"journal":{"name":"Quantitative Biology","volume":null,"pages":null},"PeriodicalIF":0.6000,"publicationDate":"2024-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A feature extraction framework for discovering pan‐cancer driver genes based on multi‐omics data\",\"authors\":\"Xiaomeng Xue, Feng Li, J. Shang, Lingyun Dai, Daohui Ge, Qianqian Ren\",\"doi\":\"10.1002/qub2.40\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The identification of tumor driver genes facilitates accurate cancer diagnosis and treatment, playing a key role in precision oncology, along with gene signaling, regulation, and their interaction with protein complexes. To tackle the challenge of distinguishing driver genes from a large number of genomic data, we construct a feature extraction framework for discovering pan‐cancer driver genes based on multi‐omics data (mutations, gene expression, copy number variants, and DNA methylation) combined with protein–protein interaction (PPI) networks. Using a network propagation algorithm, we mine functional information among nodes in the PPI network, focusing on genes with weak node information to represent specific cancer information. From these functional features, we extract distribution features of pan‐cancer data, pan‐cancer TOPSIS features of functional features using the ideal solution method, and SetExpan features of pan‐cancer data from the gene functional features, a method to rank pan‐cancer data based on the average inverse rank. These features represent the common message of pan‐cancer. Finally, we use the lightGBM classification algorithm for gene prediction. Experimental results show that our method outperforms existing methods in terms of the area under the check precision‐recall curve (AUPRC) and demonstrates better performance across different PPI networks. This indicates our framework’s effectiveness in predicting potential cancer genes, offering valuable insights for the diagnosis and treatment of tumors.\",\"PeriodicalId\":45660,\"journal\":{\"name\":\"Quantitative Biology\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.6000,\"publicationDate\":\"2024-04-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Quantitative Biology\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1002/qub2.40\",\"RegionNum\":4,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"MATHEMATICAL & COMPUTATIONAL BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Quantitative Biology","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1002/qub2.40","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}
引用次数: 0
摘要
肿瘤驱动基因的鉴定有助于癌症的准确诊断和治疗,在精准肿瘤学中发挥着关键作用,同时还涉及基因信号转导、调控及其与蛋白质复合物的相互作用。为了应对从大量基因组数据中区分驱动基因的挑战,我们构建了一个特征提取框架,用于发现基于多组学数据(突变、基因表达、拷贝数变异和DNA甲基化)和蛋白质-蛋白质相互作用(PPI)网络的泛癌症驱动基因。我们利用网络传播算法挖掘 PPI 网络中节点间的功能信息,重点关注节点信息较弱的基因,以代表特定的癌症信息。从这些功能特征中,我们提取了泛癌症数据的分布特征,利用理想解法提取了功能特征的泛癌症 TOPSIS 特征,并从基因功能特征中提取了泛癌症数据的 SetExpan 特征,这是一种基于平均逆等级对泛癌症数据进行排序的方法。这些特征代表了泛癌症的共同信息。最后,我们使用 lightGBM 分类算法进行基因预测。实验结果表明,我们的方法在检查精度-召回曲线下面积(AUPRC)方面优于现有方法,并在不同的 PPI 网络中表现出更好的性能。这表明我们的框架能有效预测潜在的癌症基因,为肿瘤的诊断和治疗提供有价值的见解。
A feature extraction framework for discovering pan‐cancer driver genes based on multi‐omics data
The identification of tumor driver genes facilitates accurate cancer diagnosis and treatment, playing a key role in precision oncology, along with gene signaling, regulation, and their interaction with protein complexes. To tackle the challenge of distinguishing driver genes from a large number of genomic data, we construct a feature extraction framework for discovering pan‐cancer driver genes based on multi‐omics data (mutations, gene expression, copy number variants, and DNA methylation) combined with protein–protein interaction (PPI) networks. Using a network propagation algorithm, we mine functional information among nodes in the PPI network, focusing on genes with weak node information to represent specific cancer information. From these functional features, we extract distribution features of pan‐cancer data, pan‐cancer TOPSIS features of functional features using the ideal solution method, and SetExpan features of pan‐cancer data from the gene functional features, a method to rank pan‐cancer data based on the average inverse rank. These features represent the common message of pan‐cancer. Finally, we use the lightGBM classification algorithm for gene prediction. Experimental results show that our method outperforms existing methods in terms of the area under the check precision‐recall curve (AUPRC) and demonstrates better performance across different PPI networks. This indicates our framework’s effectiveness in predicting potential cancer genes, offering valuable insights for the diagnosis and treatment of tumors.
期刊介绍:
Quantitative Biology is an interdisciplinary journal that focuses on original research that uses quantitative approaches and technologies to analyze and integrate biological systems, construct and model engineered life systems, and gain a deeper understanding of the life sciences. It aims to provide a platform for not only the analysis but also the integration and construction of biological systems. It is a quarterly journal seeking to provide an inter- and multi-disciplinary forum for a broad blend of peer-reviewed academic papers in order to promote rapid communication and exchange between scientists in the East and the West. The content of Quantitative Biology will mainly focus on the two broad and related areas: ·bioinformatics and computational biology, which focuses on dealing with information technologies and computational methodologies that can efficiently and accurately manipulate –omics data and transform molecular information into biological knowledge. ·systems and synthetic biology, which focuses on complex interactions in biological systems and the emergent functional properties, and on the design and construction of new biological functions and systems. Its goal is to reflect the significant advances made in quantitatively investigating and modeling both natural and engineered life systems at the molecular and higher levels. The journal particularly encourages original papers that link novel theory with cutting-edge experiments, especially in the newly emerging and multi-disciplinary areas of research. The journal also welcomes high-quality reviews and perspective articles.