{"title":"Integrative Machine Learning and Bioinformatics Approach for Identifying Key Biomarkers in Gallbladder Cancer Diagnosis and Progression","authors":"Rabea Khatun, Wahia Tasnim, Maksuda Akter, Md. Manowarul Islam, Md. Ashraf Uddin, Saurav Chandra Das, Md. Zulfiker Mahmud","doi":"10.1049/syb2.70022","DOIUrl":null,"url":null,"abstract":"<p>Gallbladder cancer (GBC) is the most common biliary tract neoplasm. Identifying biomarkers for GBC initiation and progression remains a challenge. This study aimed to identify GBC biomarkers using machine learning and bioinformatics. Differentially expressed genes (DEGs) were identified from two microarray datasets (GSE100363, GSE139682) from the GEO database. Gene Ontology and pathway analyses were performed using DAVID. A protein–protein interaction network was constructed using STRING, and hub genes were identified via three ranking algorithms (degree, MNC and closeness centrality). Feature selection methods (Pearson correlation, recursive feature elimination) were applied to extract key gene subsets. Machine learning models (SVM, NB and RF) were trained on GSE100363 and validated on GSE139682 to assess predictive performance. Biomarkers were further validated using the GEPIA database. A total of 146 DEGs were identified, including 39 upregulated and 107 downregulated genes. Eleven hub genes were identified, with SLIT3, COL7A1 and CLDN4 strongly correlated with GBC. Machine learning results confirmed their diagnostic potential. The study highlights NTRK2, COL14A1, SCN4B, ATP1A2, SLC17A7, SLIT3, COL7A1, CLDN4, CLEC3B, ADCYAP1R1 and MFAP4 as crucial genes associated with GBC. SLIT3, COL7A1 and CLDN4 serve as highly predictive biomarkers, and findings can improve early diagnosis and prognosis, aiding clinical decision-making.</p>","PeriodicalId":50379,"journal":{"name":"IET Systems Biology","volume":"19 1","pages":""},"PeriodicalIF":1.9000,"publicationDate":"2025-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/syb2.70022","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IET Systems Biology","FirstCategoryId":"99","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1049/syb2.70022","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"CELL BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Gallbladder cancer (GBC) is the most common biliary tract neoplasm. Identifying biomarkers for GBC initiation and progression remains a challenge. This study aimed to identify GBC biomarkers using machine learning and bioinformatics. Differentially expressed genes (DEGs) were identified from two microarray datasets (GSE100363, GSE139682) from the GEO database. Gene Ontology and pathway analyses were performed using DAVID. A protein–protein interaction network was constructed using STRING, and hub genes were identified via three ranking algorithms (degree, MNC and closeness centrality). Feature selection methods (Pearson correlation, recursive feature elimination) were applied to extract key gene subsets. Machine learning models (SVM, NB and RF) were trained on GSE100363 and validated on GSE139682 to assess predictive performance. Biomarkers were further validated using the GEPIA database. A total of 146 DEGs were identified, including 39 upregulated and 107 downregulated genes. Eleven hub genes were identified, with SLIT3, COL7A1 and CLDN4 strongly correlated with GBC. Machine learning results confirmed their diagnostic potential. The study highlights NTRK2, COL14A1, SCN4B, ATP1A2, SLC17A7, SLIT3, COL7A1, CLDN4, CLEC3B, ADCYAP1R1 and MFAP4 as crucial genes associated with GBC. SLIT3, COL7A1 and CLDN4 serve as highly predictive biomarkers, and findings can improve early diagnosis and prognosis, aiding clinical decision-making.
期刊介绍:
IET Systems Biology covers intra- and inter-cellular dynamics, using systems- and signal-oriented approaches. Papers that analyse genomic data in order to identify variables and basic relationships between them are considered if the results provide a basis for mathematical modelling and simulation of cellular dynamics. Manuscripts on molecular and cell biological studies are encouraged if the aim is a systems approach to dynamic interactions within and between cells.
The scope includes the following topics:
Genomics, transcriptomics, proteomics, metabolomics, cells, tissue and the physiome; molecular and cellular interaction, gene, cell and protein function; networks and pathways; metabolism and cell signalling; dynamics, regulation and control; systems, signals, and information; experimental data analysis; mathematical modelling, simulation and theoretical analysis; biological modelling, simulation, prediction and control; methodologies, databases, tools and algorithms for modelling and simulation; modelling, analysis and control of biological networks; synthetic biology and bioengineering based on systems biology.