Morgan P H Thomas, Shoaib Ajaib, Georgette Tanner, Andrew J Bulpitt, Lucy F Stead
{"title":"GBMPurity:一个从大量RNA-seq数据估计胶质母细胞瘤肿瘤纯度的机器学习工具。","authors":"Morgan P H Thomas, Shoaib Ajaib, Georgette Tanner, Andrew J Bulpitt, Lucy F Stead","doi":"10.1093/neuonc/noaf026","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Glioblastoma (GBM) presents a significant clinical challenge due to its aggressive nature and extensive heterogeneity. Tumor purity, the proportion of malignant cells within a tumor, is an important covariate for understanding the disease, having direct clinical relevance or obscuring signal of the malignant portion in molecular analyses of bulk samples. However, current methods for estimating tumor purity are nonspecific and technically demanding. Therefore, we aimed to build a reliable and accessible purity estimator for GBM.</p><p><strong>Methods: </strong>We developed GBMPurity, a deep learning model specifically designed to estimate the purity of IDH-wild type primary GBM from bulk RNA-sequencing (RNA-seq) data. The model was trained using simulated pseudobulk tumors of known purity from labeled single-cell data acquired from the GBmap resource. The performance of GBMPurity was evaluated and compared to several existing tools using independent datasets.</p><p><strong>Results: </strong>GBMPurity outperformed existing tools, achieving a mean absolute error of 0.15 and a concordance correlation coefficient of 0.88 on validation datasets. We demonstrate the utility of GBMPurity through inference on bulk RNA-seq samples and observe reduced purity of the proneural molecular subtype relative to the classical, attributed to the increased presence of healthy brain cells.</p><p><strong>Conclusions: </strong>GBMPurity provides a reliable and accessible tool for estimating tumor purity from bulk RNA-seq data, enhancing the interpretation of bulk RNA-seq data and offering valuable insights into GBM biology. To facilitate the use of this model by the wider research community, GBMPurity is available as a web-based tool at: https://gbmdeconvoluter.leeds.ac.uk/.</p>","PeriodicalId":19377,"journal":{"name":"Neuro-oncology","volume":" ","pages":"1458-1473"},"PeriodicalIF":13.4000,"publicationDate":"2025-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12309721/pdf/","citationCount":"0","resultStr":"{\"title\":\"GBMPurity: A machine learning tool for estimating glioblastoma tumor purity from bulk RNA-sequencing data.\",\"authors\":\"Morgan P H Thomas, Shoaib Ajaib, Georgette Tanner, Andrew J Bulpitt, Lucy F Stead\",\"doi\":\"10.1093/neuonc/noaf026\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Glioblastoma (GBM) presents a significant clinical challenge due to its aggressive nature and extensive heterogeneity. Tumor purity, the proportion of malignant cells within a tumor, is an important covariate for understanding the disease, having direct clinical relevance or obscuring signal of the malignant portion in molecular analyses of bulk samples. However, current methods for estimating tumor purity are nonspecific and technically demanding. Therefore, we aimed to build a reliable and accessible purity estimator for GBM.</p><p><strong>Methods: </strong>We developed GBMPurity, a deep learning model specifically designed to estimate the purity of IDH-wild type primary GBM from bulk RNA-sequencing (RNA-seq) data. The model was trained using simulated pseudobulk tumors of known purity from labeled single-cell data acquired from the GBmap resource. The performance of GBMPurity was evaluated and compared to several existing tools using independent datasets.</p><p><strong>Results: </strong>GBMPurity outperformed existing tools, achieving a mean absolute error of 0.15 and a concordance correlation coefficient of 0.88 on validation datasets. We demonstrate the utility of GBMPurity through inference on bulk RNA-seq samples and observe reduced purity of the proneural molecular subtype relative to the classical, attributed to the increased presence of healthy brain cells.</p><p><strong>Conclusions: </strong>GBMPurity provides a reliable and accessible tool for estimating tumor purity from bulk RNA-seq data, enhancing the interpretation of bulk RNA-seq data and offering valuable insights into GBM biology. To facilitate the use of this model by the wider research community, GBMPurity is available as a web-based tool at: https://gbmdeconvoluter.leeds.ac.uk/.</p>\",\"PeriodicalId\":19377,\"journal\":{\"name\":\"Neuro-oncology\",\"volume\":\" \",\"pages\":\"1458-1473\"},\"PeriodicalIF\":13.4000,\"publicationDate\":\"2025-07-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12309721/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Neuro-oncology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1093/neuonc/noaf026\",\"RegionNum\":1,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"CLINICAL NEUROLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neuro-oncology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1093/neuonc/noaf026","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CLINICAL NEUROLOGY","Score":null,"Total":0}
GBMPurity: A machine learning tool for estimating glioblastoma tumor purity from bulk RNA-sequencing data.
Background: Glioblastoma (GBM) presents a significant clinical challenge due to its aggressive nature and extensive heterogeneity. Tumor purity, the proportion of malignant cells within a tumor, is an important covariate for understanding the disease, having direct clinical relevance or obscuring signal of the malignant portion in molecular analyses of bulk samples. However, current methods for estimating tumor purity are nonspecific and technically demanding. Therefore, we aimed to build a reliable and accessible purity estimator for GBM.
Methods: We developed GBMPurity, a deep learning model specifically designed to estimate the purity of IDH-wild type primary GBM from bulk RNA-sequencing (RNA-seq) data. The model was trained using simulated pseudobulk tumors of known purity from labeled single-cell data acquired from the GBmap resource. The performance of GBMPurity was evaluated and compared to several existing tools using independent datasets.
Results: GBMPurity outperformed existing tools, achieving a mean absolute error of 0.15 and a concordance correlation coefficient of 0.88 on validation datasets. We demonstrate the utility of GBMPurity through inference on bulk RNA-seq samples and observe reduced purity of the proneural molecular subtype relative to the classical, attributed to the increased presence of healthy brain cells.
Conclusions: GBMPurity provides a reliable and accessible tool for estimating tumor purity from bulk RNA-seq data, enhancing the interpretation of bulk RNA-seq data and offering valuable insights into GBM biology. To facilitate the use of this model by the wider research community, GBMPurity is available as a web-based tool at: https://gbmdeconvoluter.leeds.ac.uk/.
期刊介绍:
Neuro-Oncology, the official journal of the Society for Neuro-Oncology, has been published monthly since January 2010. Affiliated with the Japan Society for Neuro-Oncology and the European Association of Neuro-Oncology, it is a global leader in the field.
The journal is committed to swiftly disseminating high-quality information across all areas of neuro-oncology. It features peer-reviewed articles, reviews, symposia on various topics, abstracts from annual meetings, and updates from neuro-oncology societies worldwide.