Integrative Machine Learning and Bioinformatics Approach for Identifying Key Biomarkers in Gallbladder Cancer Diagnosis and Progression

IF 1.9 4区 生物学 Q4 CELL BIOLOGY
Rabea Khatun, Wahia Tasnim, Maksuda Akter, Md. Manowarul Islam, Md. Ashraf Uddin, Saurav Chandra Das, Md. Zulfiker Mahmud
{"title":"Integrative Machine Learning and Bioinformatics Approach for Identifying Key Biomarkers in Gallbladder Cancer Diagnosis and Progression","authors":"Rabea Khatun,&nbsp;Wahia Tasnim,&nbsp;Maksuda Akter,&nbsp;Md. Manowarul Islam,&nbsp;Md. Ashraf Uddin,&nbsp;Saurav Chandra Das,&nbsp;Md. Zulfiker Mahmud","doi":"10.1049/syb2.70022","DOIUrl":null,"url":null,"abstract":"<p>Gallbladder cancer (GBC) is the most common biliary tract neoplasm. Identifying biomarkers for GBC initiation and progression remains a challenge. This study aimed to identify GBC biomarkers using machine learning and bioinformatics. Differentially expressed genes (DEGs) were identified from two microarray datasets (GSE100363, GSE139682) from the GEO database. Gene Ontology and pathway analyses were performed using DAVID. A protein–protein interaction network was constructed using STRING, and hub genes were identified via three ranking algorithms (degree, MNC and closeness centrality). Feature selection methods (Pearson correlation, recursive feature elimination) were applied to extract key gene subsets. Machine learning models (SVM, NB and RF) were trained on GSE100363 and validated on GSE139682 to assess predictive performance. Biomarkers were further validated using the GEPIA database. A total of 146 DEGs were identified, including 39 upregulated and 107 downregulated genes. Eleven hub genes were identified, with SLIT3, COL7A1 and CLDN4 strongly correlated with GBC. Machine learning results confirmed their diagnostic potential. The study highlights NTRK2, COL14A1, SCN4B, ATP1A2, SLC17A7, SLIT3, COL7A1, CLDN4, CLEC3B, ADCYAP1R1 and MFAP4 as crucial genes associated with GBC. SLIT3, COL7A1 and CLDN4 serve as highly predictive biomarkers, and findings can improve early diagnosis and prognosis, aiding clinical decision-making.</p>","PeriodicalId":50379,"journal":{"name":"IET Systems Biology","volume":"19 1","pages":""},"PeriodicalIF":1.9000,"publicationDate":"2025-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/syb2.70022","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IET Systems Biology","FirstCategoryId":"99","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1049/syb2.70022","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"CELL BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Gallbladder cancer (GBC) is the most common biliary tract neoplasm. Identifying biomarkers for GBC initiation and progression remains a challenge. This study aimed to identify GBC biomarkers using machine learning and bioinformatics. Differentially expressed genes (DEGs) were identified from two microarray datasets (GSE100363, GSE139682) from the GEO database. Gene Ontology and pathway analyses were performed using DAVID. A protein–protein interaction network was constructed using STRING, and hub genes were identified via three ranking algorithms (degree, MNC and closeness centrality). Feature selection methods (Pearson correlation, recursive feature elimination) were applied to extract key gene subsets. Machine learning models (SVM, NB and RF) were trained on GSE100363 and validated on GSE139682 to assess predictive performance. Biomarkers were further validated using the GEPIA database. A total of 146 DEGs were identified, including 39 upregulated and 107 downregulated genes. Eleven hub genes were identified, with SLIT3, COL7A1 and CLDN4 strongly correlated with GBC. Machine learning results confirmed their diagnostic potential. The study highlights NTRK2, COL14A1, SCN4B, ATP1A2, SLC17A7, SLIT3, COL7A1, CLDN4, CLEC3B, ADCYAP1R1 and MFAP4 as crucial genes associated with GBC. SLIT3, COL7A1 and CLDN4 serve as highly predictive biomarkers, and findings can improve early diagnosis and prognosis, aiding clinical decision-making.

综合机器学习和生物信息学方法识别胆囊癌诊断和进展中的关键生物标志物
胆囊癌(GBC)是最常见的胆道肿瘤。确定GBC发生和进展的生物标志物仍然是一个挑战。本研究旨在利用机器学习和生物信息学鉴定GBC生物标志物。从GEO数据库的两个微阵列数据集(GSE100363, GSE139682)中鉴定出差异表达基因(DEGs)。使用DAVID进行基因本体和通路分析。利用STRING构建了蛋白相互作用网络,并通过度、MNC和紧密中心性三种排序算法对枢纽基因进行了鉴定。采用特征选择方法(Pearson相关、递归特征消除)提取关键基因子集。在GSE100363上训练机器学习模型(SVM、NB和RF),并在GSE139682上进行验证,评估预测性能。使用GEPIA数据库进一步验证生物标志物。共鉴定出146个基因,其中39个基因上调,107个基因下调。共鉴定出11个中心基因,其中SLIT3、COL7A1和CLDN4与GBC密切相关。机器学习结果证实了它们的诊断潜力。该研究强调NTRK2、COL14A1、SCN4B、ATP1A2、SLC17A7、SLIT3、COL7A1、CLDN4、cle3b、ADCYAP1R1和MFAP4是与GBC相关的关键基因。SLIT3、COL7A1和CLDN4是具有高度预测性的生物标志物,其发现可以改善早期诊断和预后,帮助临床决策。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
IET Systems Biology
IET Systems Biology 生物-数学与计算生物学
CiteScore
4.20
自引率
4.30%
发文量
17
审稿时长
>12 weeks
期刊介绍: IET Systems Biology covers intra- and inter-cellular dynamics, using systems- and signal-oriented approaches. Papers that analyse genomic data in order to identify variables and basic relationships between them are considered if the results provide a basis for mathematical modelling and simulation of cellular dynamics. Manuscripts on molecular and cell biological studies are encouraged if the aim is a systems approach to dynamic interactions within and between cells. The scope includes the following topics: Genomics, transcriptomics, proteomics, metabolomics, cells, tissue and the physiome; molecular and cellular interaction, gene, cell and protein function; networks and pathways; metabolism and cell signalling; dynamics, regulation and control; systems, signals, and information; experimental data analysis; mathematical modelling, simulation and theoretical analysis; biological modelling, simulation, prediction and control; methodologies, databases, tools and algorithms for modelling and simulation; modelling, analysis and control of biological networks; synthetic biology and bioengineering based on systems biology.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信