使用机器学习集成集成特征选择的生物标志物驱动药物用于nafld相关的肝细胞癌。

IF 3.9 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Frontiers in bioinformatics Pub Date : 2025-04-17 eCollection Date: 2025-01-01 DOI:10.3389/fbinf.2025.1522401

Subhajit Ghosh, Sukhen Das Mandal, Subarna Thakur

{"title":"使用机器学习集成集成特征选择的生物标志物驱动药物用于nafld相关的肝细胞癌。","authors":"Subhajit Ghosh, Sukhen Das Mandal, Subarna Thakur","doi":"10.3389/fbinf.2025.1522401","DOIUrl":null,"url":null,"abstract":"The incidence of non-alcoholic fatty liver disease (NAFLD), encompassing the more severe non-alcoholic steatohepatitis (NASH), is rising alongside the surges in diabetes and obesity. Increasing evidence indicates that NASH is responsible for a significant share of idiopathic hepatocellular carcinoma (HCC) cases, a fatal cancer with a 5-year survival rate below 22%. Biomarkers can facilitate early screening and monitoring of at-risk NAFLD/NASH patients and assist in identifying potential drug candidates for treatment. This study utilized an ensemble feature selection framework to analyze transcriptomic data, identifying biomarker genes associated with the stage-wise progression of NAFLD-related HCC. Seven machine learning algorithms were assessed for disease stage classification. Twelve feature selection methods including correlation-based techniques, mutual information-based methods, and embedded techniques were utilized to rank the top genes as features, through this approach, multiple feature selection methods were combined to yield more robust features important in this disease progression. Cox regression-based survival analysis was carried out to evaluate the biomarker potentiality of these genes. Furthermore, multiphase drug repurposing strategy and molecular docking were employed to identify potential drug candidates against these biomarkers. Among the seven machine learning models initially evaluated, DISCR resulted as the most accurate disease stage classifier. Ensemble feature selection identified ten top genes, among which eight were recognized as potential biomarkers based on survival analysis. These include genes ABAT, ABCB11, MBTPS1, and ZFP1 mostly involved in alanine and glutamate metabolism, butanoate metabolism, and ER protein processing. Through drug repurposing, 81 candidate drugs were found to be effective against these markers genes, with Diosmin, Esculin, Lapatinib, and Phenelzine as the best candidates screened through molecular docking and MMGBSA. The consensus derived from multiple methods enhances the accuracy of identifying relevant robust biomarkers for NAFLD-associated HCC. The use of these biomarkers in a multiphase drug repurposing strategy highlights potential therapeutic options for early intervention, which is essential to stop disease progression and improve outcomes.","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1522401"},"PeriodicalIF":3.9000,"publicationDate":"2025-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12043677/pdf/","citationCount":"0","resultStr":"{\"title\":\"Biomarker-driven drug repurposing for NAFLD-associated hepatocellular carcinoma using machine learning integrated ensemble feature selection.\",\"authors\":\"Subhajit Ghosh, Sukhen Das Mandal, Subarna Thakur\",\"doi\":\"10.3389/fbinf.2025.1522401\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The incidence of non-alcoholic fatty liver disease (NAFLD), encompassing the more severe non-alcoholic steatohepatitis (NASH), is rising alongside the surges in diabetes and obesity. Increasing evidence indicates that NASH is responsible for a significant share of idiopathic hepatocellular carcinoma (HCC) cases, a fatal cancer with a 5-year survival rate below 22%. Biomarkers can facilitate early screening and monitoring of at-risk NAFLD/NASH patients and assist in identifying potential drug candidates for treatment. This study utilized an ensemble feature selection framework to analyze transcriptomic data, identifying biomarker genes associated with the stage-wise progression of NAFLD-related HCC. Seven machine learning algorithms were assessed for disease stage classification. Twelve feature selection methods including correlation-based techniques, mutual information-based methods, and embedded techniques were utilized to rank the top genes as features, through this approach, multiple feature selection methods were combined to yield more robust features important in this disease progression. Cox regression-based survival analysis was carried out to evaluate the biomarker potentiality of these genes. Furthermore, multiphase drug repurposing strategy and molecular docking were employed to identify potential drug candidates against these biomarkers. Among the seven machine learning models initially evaluated, DISCR resulted as the most accurate disease stage classifier. Ensemble feature selection identified ten top genes, among which eight were recognized as potential biomarkers based on survival analysis. These include genes ABAT, ABCB11, MBTPS1, and ZFP1 mostly involved in alanine and glutamate metabolism, butanoate metabolism, and ER protein processing. Through drug repurposing, 81 candidate drugs were found to be effective against these markers genes, with Diosmin, Esculin, Lapatinib, and Phenelzine as the best candidates screened through molecular docking and MMGBSA. The consensus derived from multiple methods enhances the accuracy of identifying relevant robust biomarkers for NAFLD-associated HCC. The use of these biomarkers in a multiphase drug repurposing strategy highlights potential therapeutic options for early intervention, which is essential to stop disease progression and improve outcomes.\",\"PeriodicalId\":73066,\"journal\":{\"name\":\"Frontiers in bioinformatics\",\"volume\":\"5 \",\"pages\":\"1522401\"},\"PeriodicalIF\":3.9000,\"publicationDate\":\"2025-04-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12043677/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Frontiers in bioinformatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3389/fbinf.2025.1522401\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q2\",\"JCRName\":\"MATHEMATICAL & COMPUTATIONAL BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in bioinformatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/fbinf.2025.1522401","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}

引用次数: 0

摘要

非酒精性脂肪性肝病（NAFLD）的发病率，包括更严重的非酒精性脂肪性肝炎（NASH），随着糖尿病和肥胖症的激增而上升。越来越多的证据表明，NASH是特发性肝细胞癌（HCC）病例的重要原因，HCC是一种5年生存率低于22%的致命癌症。生物标志物可以促进早期筛查和监测高危NAFLD/NASH患者，并协助确定潜在的候选治疗药物。本研究利用集合特征选择框架分析转录组学数据，确定与nafld相关HCC分期进展相关的生物标志物基因。评估了7种机器学习算法用于疾病分期分类。利用基于相关性的方法、基于互信息的方法和嵌入技术等12种特征选择方法对顶级基因进行特征排序，通过这种方法，多种特征选择方法相结合，产生在该疾病进展中更重要的鲁棒特征。采用基于Cox回归的生存分析来评估这些基因的生物标志物潜力。此外，采用多相药物再利用策略和分子对接来识别针对这些生物标志物的潜在候选药物。在最初评估的七个机器学习模型中，DISCR是最准确的疾病分期分类器。集合特征选择筛选出10个顶级基因，其中8个被认为是基于生存分析的潜在生物标志物。这些基因包括ABAT、ABCB11、MBTPS1和ZFP1，主要参与丙氨酸和谷氨酸代谢、丁酸代谢和内质网蛋白加工。通过药物再利用，发现81种候选药物对这些标记基因有效，通过分子对接和MMGBSA筛选到的最佳候选药物为Diosmin、Esculin、Lapatinib和Phenelzine。从多种方法中得出的共识提高了识别nafld相关HCC相关生物标志物的准确性。在多相药物再利用策略中使用这些生物标志物突出了早期干预的潜在治疗选择，这对于阻止疾病进展和改善结果至关重要。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Biomarker-driven drug repurposing for NAFLD-associated hepatocellular carcinoma using machine learning integrated ensemble feature selection.

The incidence of non-alcoholic fatty liver disease (NAFLD), encompassing the more severe non-alcoholic steatohepatitis (NASH), is rising alongside the surges in diabetes and obesity. Increasing evidence indicates that NASH is responsible for a significant share of idiopathic hepatocellular carcinoma (HCC) cases, a fatal cancer with a 5-year survival rate below 22%. Biomarkers can facilitate early screening and monitoring of at-risk NAFLD/NASH patients and assist in identifying potential drug candidates for treatment. This study utilized an ensemble feature selection framework to analyze transcriptomic data, identifying biomarker genes associated with the stage-wise progression of NAFLD-related HCC. Seven machine learning algorithms were assessed for disease stage classification. Twelve feature selection methods including correlation-based techniques, mutual information-based methods, and embedded techniques were utilized to rank the top genes as features, through this approach, multiple feature selection methods were combined to yield more robust features important in this disease progression. Cox regression-based survival analysis was carried out to evaluate the biomarker potentiality of these genes. Furthermore, multiphase drug repurposing strategy and molecular docking were employed to identify potential drug candidates against these biomarkers. Among the seven machine learning models initially evaluated, DISCR resulted as the most accurate disease stage classifier. Ensemble feature selection identified ten top genes, among which eight were recognized as potential biomarkers based on survival analysis. These include genes ABAT, ABCB11, MBTPS1, and ZFP1 mostly involved in alanine and glutamate metabolism, butanoate metabolism, and ER protein processing. Through drug repurposing, 81 candidate drugs were found to be effective against these markers genes, with Diosmin, Esculin, Lapatinib, and Phenelzine as the best candidates screened through molecular docking and MMGBSA. The consensus derived from multiple methods enhances the accuracy of identifying relevant robust biomarkers for NAFLD-associated HCC. The use of these biomarkers in a multiphase drug repurposing strategy highlights potential therapeutic options for early intervention, which is essential to stop disease progression and improve outcomes.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Frontiers in bioinformatics

CiteScore

2.60

自引率

0.00%

发文量