Identification of IGFBP3 and LGALS1 as potential secreted biomarkers for clear cell renal cell carcinoma based on bioinformatics analysis and machine learning.
{"title":"Identification of IGFBP3 and LGALS1 as potential secreted biomarkers for clear cell renal cell carcinoma based on bioinformatics analysis and machine learning.","authors":"Wunchana Seubwai, Sakkarn Sangkhamanon, Xuhong Zhang","doi":"10.17219/acem/194036","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Clear cell renal cell carcinoma (ccRCC) is the most common subtype of renal cell carcinoma (RCC). Due to the lack of symptoms until advanced stages, early diagnosis of ccRCC is challenging. Therefore, the identification of novel secreted biomarkers for the early detection of ccRCC is urgently needed.</p><p><strong>Objectives: </strong>This study aimed to identify novel secreted biomarkers for diagnosing ccRCC using bioinformatics and machine learning techniques based on transcriptomics data.</p><p><strong>Material and methods: </strong>Differentially expressed genes (DEGs) in ccRCC compared to normal kidney tissues were identified using 3 transcriptomics datasets (GSE53757, GSE40435 and GSE11151) from the Gene Expression Omnibus (GEO). Potential secreted biomarkers were examined within these common DEGs using a list of human secretome proteins from The Human Protein Atlas. The recursive feature elimination (RFE) technique was used to determine the optimal number of features for building classification machine learning models. The expression levels and clinical associations of candidate biomarkers identified with RFE were validated using transcriptomics data from The Cancer Genome Atlas (TCGA). Classification models were then developed based on the expression levels of these candidate biomarkers. The performance of the models was evaluated based on accuracy, evaluation metrics, confusion matrices, and ROC-AUC (receiver operating characteristic-area under the ROC curve) curves.</p><p><strong>Results: </strong>We identified 44 DEGs that encode potential secreted proteins from 274 common DEGs found across all datasets. Among these, insulin-like growth factor binding protein 3 (IGFBP3) and lectin, galactoside-binding, soluble, 1 (LGALS1) were selected for further analysis using the RFE technique. Both IGFBP3 and LGALS1 showed significant upregulation in ccRCC tissues compared to normal tissues in the GEO and TCGA datasets. The results of the survival analysis indicated that patients with higher expression levels of these genes exhibited shorter overall and disease-free survival times (OS and DFS). Decision tree and random forest models based on IGFBP3 and LGALS1 levels achieved an accuracy of 98.04% and an AUC of 0.98.</p><p><strong>Conclusions: </strong>This study identified IGFBP3 and LGALS1 as promising novel secreted biomarkers for ccRCC diagnosis.</p>","PeriodicalId":7306,"journal":{"name":"Advances in Clinical and Experimental Medicine","volume":" ","pages":""},"PeriodicalIF":2.1000,"publicationDate":"2025-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Advances in Clinical and Experimental Medicine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.17219/acem/194036","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MEDICINE, RESEARCH & EXPERIMENTAL","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Clear cell renal cell carcinoma (ccRCC) is the most common subtype of renal cell carcinoma (RCC). Due to the lack of symptoms until advanced stages, early diagnosis of ccRCC is challenging. Therefore, the identification of novel secreted biomarkers for the early detection of ccRCC is urgently needed.
Objectives: This study aimed to identify novel secreted biomarkers for diagnosing ccRCC using bioinformatics and machine learning techniques based on transcriptomics data.
Material and methods: Differentially expressed genes (DEGs) in ccRCC compared to normal kidney tissues were identified using 3 transcriptomics datasets (GSE53757, GSE40435 and GSE11151) from the Gene Expression Omnibus (GEO). Potential secreted biomarkers were examined within these common DEGs using a list of human secretome proteins from The Human Protein Atlas. The recursive feature elimination (RFE) technique was used to determine the optimal number of features for building classification machine learning models. The expression levels and clinical associations of candidate biomarkers identified with RFE were validated using transcriptomics data from The Cancer Genome Atlas (TCGA). Classification models were then developed based on the expression levels of these candidate biomarkers. The performance of the models was evaluated based on accuracy, evaluation metrics, confusion matrices, and ROC-AUC (receiver operating characteristic-area under the ROC curve) curves.
Results: We identified 44 DEGs that encode potential secreted proteins from 274 common DEGs found across all datasets. Among these, insulin-like growth factor binding protein 3 (IGFBP3) and lectin, galactoside-binding, soluble, 1 (LGALS1) were selected for further analysis using the RFE technique. Both IGFBP3 and LGALS1 showed significant upregulation in ccRCC tissues compared to normal tissues in the GEO and TCGA datasets. The results of the survival analysis indicated that patients with higher expression levels of these genes exhibited shorter overall and disease-free survival times (OS and DFS). Decision tree and random forest models based on IGFBP3 and LGALS1 levels achieved an accuracy of 98.04% and an AUC of 0.98.
Conclusions: This study identified IGFBP3 and LGALS1 as promising novel secreted biomarkers for ccRCC diagnosis.
背景:透明细胞肾细胞癌(ccRCC)是肾细胞癌(RCC)中最常见的亚型。由于没有症状,直到晚期,早期诊断ccRCC是具有挑战性的。因此,迫切需要寻找新的分泌性生物标志物用于ccRCC的早期检测。目的:本研究旨在利用基于转录组学数据的生物信息学和机器学习技术,鉴定诊断ccRCC的新型分泌生物标志物。材料和方法:使用基因表达Omnibus (GEO)的3个转录组学数据集(GSE53757、GSE40435和GSE11151)鉴定ccRCC与正常肾脏组织的差异表达基因(DEGs)。使用来自The human Protein Atlas的人类分泌组蛋白列表,在这些常见的deg中检测潜在的分泌生物标志物。采用递归特征消除(RFE)技术确定构建分类机器学习模型的最优特征数量。RFE鉴定的候选生物标志物的表达水平和临床相关性使用来自癌症基因组图谱(TCGA)的转录组学数据进行验证。然后根据这些候选生物标志物的表达水平建立分类模型。根据准确率、评价指标、混淆矩阵和ROC- auc (ROC曲线下的受试者工作特征面积)曲线对模型的性能进行评估。结果:我们从所有数据集中发现的274个常见deg中鉴定出44个编码潜在分泌蛋白的deg。其中,选择胰岛素样生长因子结合蛋白3 (IGFBP3)和凝集素,半乳糖苷结合,可溶性,1 (LGALS1)进行RFE技术进一步分析。在GEO和TCGA数据集中,与正常组织相比,IGFBP3和LGALS1在ccRCC组织中均表现出显著上调。生存分析结果表明,这些基因表达水平较高的患者总体生存时间和无病生存时间(OS和DFS)较短。基于IGFBP3和LGALS1水平的决策树和随机森林模型的准确率为98.04%,AUC为0.98。结论:本研究确定IGFBP3和LGALS1是有希望诊断ccRCC的新型分泌生物标志物。
期刊介绍:
Advances in Clinical and Experimental Medicine has been published by the Wroclaw Medical University since 1992. Establishing the medical journal was the idea of Prof. Bogumił Halawa, Chair of the Department of Cardiology, and was fully supported by the Rector of Wroclaw Medical University, Prof. Zbigniew Knapik. Prof. Halawa was also the first editor-in-chief, between 1992-1997. The journal, then entitled "Postępy Medycyny Klinicznej i Doświadczalnej", appeared quarterly.
Prof. Leszek Paradowski was editor-in-chief from 1997-1999. In 1998 he initiated alterations in the profile and cover design of the journal which were accepted by the Editorial Board. The title was changed to Advances in Clinical and Experimental Medicine. Articles in English were welcomed. A number of outstanding representatives of medical science from Poland and abroad were invited to participate in the newly established International Editorial Staff.
Prof. Antonina Harłozińska-Szmyrka was editor-in-chief in years 2000-2005, in years 2006-2007 once again prof. Leszek Paradowski and prof. Maria Podolak-Dawidziak was editor-in-chief in years 2008-2016. Since 2017 the editor-in chief is prof. Maciej Bagłaj.
Since July 2005, original papers have been published only in English. Case reports are no longer accepted. The manuscripts are reviewed by two independent reviewers and a statistical reviewer, and English texts are proofread by a native speaker.
The journal has been indexed in several databases: Scopus, Ulrich’sTM International Periodicals Directory, Index Copernicus and since 2007 in Thomson Reuters databases: Science Citation Index Expanded i Journal Citation Reports/Science Edition.
In 2010 the journal obtained Impact Factor which is now 1.179 pts. Articles published in the journal are worth 15 points among Polish journals according to the Polish Committee for Scientific Research and 169.43 points according to the Index Copernicus.
Since November 7, 2012, Advances in Clinical and Experimental Medicine has been indexed and included in National Library of Medicine’s MEDLINE database. English abstracts printed in the journal are included and searchable using PubMed http://www.ncbi.nlm.nih.gov/pubmed.