差异基因表达分析和机器学习发现结构、tf、细胞因子和糖蛋白,包括SOX2、TOP2A、SPP1、COL1A1和TIMP1是肺癌的潜在驱动因素。

IF 2 4区 医学 Q3 BIOTECHNOLOGY & APPLIED MICROBIOLOGY
Biomarkers Pub Date : 2025-03-01 Epub Date: 2025-02-10 DOI:10.1080/1354750X.2025.2461698
Syed Naseer Ahmad Shah, Rafat Parveen
{"title":"差异基因表达分析和机器学习发现结构、tf、细胞因子和糖蛋白,包括SOX2、TOP2A、SPP1、COL1A1和TIMP1是肺癌的潜在驱动因素。","authors":"Syed Naseer Ahmad Shah, Rafat Parveen","doi":"10.1080/1354750X.2025.2461698","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Lung cancer is a primary global health concern, responsible for a considerable portion of cancer-related fatalities worldwide. Understanding its molecular complexities is crucial for identifying potential targets for treatment. The goal is to slow disease progression and intervene early to prevent the development of advanced lung cancer cases. Hence, there's an urgent need for new biomarkers that can detect lung cancer in its early stages.</p><p><strong>Methods: </strong>The study conducted RNA-Seq analysis of lung cancer samples from the publicly available SRA database (NCBI SRP009408), including both control and tumour samples. The genes with differential expression between tumour and healthy tissues were identified using R and Bioconductor. Machine learning (ML) techniques, Random Forest, Lasso, XGBoost, Gradient Boosting and Elastic Net were employed to pinpoint significant genes followed by classifiers, Multilayer Perceptron (MLP), Support Vector Machines (SVM) and k-Nearest Neighbours (k-NN). Gene ontology and pathway analyses were performed on the significant differentially expressed genes (DEGs). The top genes from DEG and machine learning analyses were combined for protein-protein interaction (PPI) analysis, identifying 10 hub genes essential for lung cancer progression.</p><p><strong>Results: </strong>The integrated analysis of ML and DEGs revealed the significance of specific genes in lung cancer samples, identified the top 5 upregulated genes (COL11A1, TOP2A, SULF1, DIO2, MIR196A2) and the top 5 downregulated genes (PDK4, FOSB, FLYWCH1, CYB5D2, MIR328), along with their associated genes implicated in pathways or co-expression networks were identified. Among the various algorithms employed, Random Forest and XGBoost proved effective in identifying common genes, underscoring their potential significance in lung cancer pathogenesis. The MLP exhibited the highest accuracy in classifying samples using all genes. Additionally, the protein-protein interaction (PPI) analysis identified 10 hub genes that are pivotal in lung cancer pathogenesis: COL1A1, SOX2, SPP1, THBS2, POSTN, COL5A1, COL11A1, TIMP1, TOP2A and PKP1.</p><p><strong>Conclusion: </strong>The study contributes to the early prediction of lung cancer by identifying potential biomarkers that could enhance early diagnosis and pave the way for practical clinical applications in the future. Integrating DEGs and machine learning-derived significant genes for PPI analysis offers a robust approach to uncovering critical molecular targets for lung cancer treatment.</p>","PeriodicalId":8921,"journal":{"name":"Biomarkers","volume":" ","pages":"200-215"},"PeriodicalIF":2.0000,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Differential gene expression analysis and machine learning identified structural, TFs, cytokine and glycoproteins, including SOX2, TOP2A, SPP1, COL1A1, and TIMP1 as potential drivers of lung cancer.\",\"authors\":\"Syed Naseer Ahmad Shah, Rafat Parveen\",\"doi\":\"10.1080/1354750X.2025.2461698\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Lung cancer is a primary global health concern, responsible for a considerable portion of cancer-related fatalities worldwide. Understanding its molecular complexities is crucial for identifying potential targets for treatment. The goal is to slow disease progression and intervene early to prevent the development of advanced lung cancer cases. Hence, there's an urgent need for new biomarkers that can detect lung cancer in its early stages.</p><p><strong>Methods: </strong>The study conducted RNA-Seq analysis of lung cancer samples from the publicly available SRA database (NCBI SRP009408), including both control and tumour samples. The genes with differential expression between tumour and healthy tissues were identified using R and Bioconductor. Machine learning (ML) techniques, Random Forest, Lasso, XGBoost, Gradient Boosting and Elastic Net were employed to pinpoint significant genes followed by classifiers, Multilayer Perceptron (MLP), Support Vector Machines (SVM) and k-Nearest Neighbours (k-NN). Gene ontology and pathway analyses were performed on the significant differentially expressed genes (DEGs). The top genes from DEG and machine learning analyses were combined for protein-protein interaction (PPI) analysis, identifying 10 hub genes essential for lung cancer progression.</p><p><strong>Results: </strong>The integrated analysis of ML and DEGs revealed the significance of specific genes in lung cancer samples, identified the top 5 upregulated genes (COL11A1, TOP2A, SULF1, DIO2, MIR196A2) and the top 5 downregulated genes (PDK4, FOSB, FLYWCH1, CYB5D2, MIR328), along with their associated genes implicated in pathways or co-expression networks were identified. Among the various algorithms employed, Random Forest and XGBoost proved effective in identifying common genes, underscoring their potential significance in lung cancer pathogenesis. The MLP exhibited the highest accuracy in classifying samples using all genes. Additionally, the protein-protein interaction (PPI) analysis identified 10 hub genes that are pivotal in lung cancer pathogenesis: COL1A1, SOX2, SPP1, THBS2, POSTN, COL5A1, COL11A1, TIMP1, TOP2A and PKP1.</p><p><strong>Conclusion: </strong>The study contributes to the early prediction of lung cancer by identifying potential biomarkers that could enhance early diagnosis and pave the way for practical clinical applications in the future. Integrating DEGs and machine learning-derived significant genes for PPI analysis offers a robust approach to uncovering critical molecular targets for lung cancer treatment.</p>\",\"PeriodicalId\":8921,\"journal\":{\"name\":\"Biomarkers\",\"volume\":\" \",\"pages\":\"200-215\"},\"PeriodicalIF\":2.0000,\"publicationDate\":\"2025-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Biomarkers\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1080/1354750X.2025.2461698\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/2/10 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q3\",\"JCRName\":\"BIOTECHNOLOGY & APPLIED MICROBIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biomarkers","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1080/1354750X.2025.2461698","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/2/10 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"BIOTECHNOLOGY & APPLIED MICROBIOLOGY","Score":null,"Total":0}
引用次数: 0

摘要

背景:肺癌是一个主要的全球健康问题,在世界范围内与癌症相关的死亡中占相当大的一部分。了解其分子复杂性对于确定潜在的治疗靶点至关重要。目标是减缓疾病进展并早期干预以预防晚期肺癌病例的发展。因此,迫切需要新的生物标志物来检测肺癌的早期阶段。方法:本研究对来自公开的SRA数据库(NCBI SRP009408)的肺癌样本进行了RNA-Seq分析,包括对照和肿瘤样本。利用R和Bioconductor对肿瘤组织与健康组织之间的差异表达基因进行了鉴定。使用机器学习(ML)技术、随机森林、Lasso、XGBoost、梯度增强和弹性网络来确定重要基因,然后使用分类器、多层感知器(MLP)、支持向量机(SVM)和k-近邻(k-NN)。对显著差异表达基因(DEGs)进行基因本体和通路分析。将DEG和机器学习分析中的顶级基因结合起来进行蛋白质-蛋白质相互作用(PPI)分析,确定了10个对肺癌进展至关重要的中心基因。结果:ML和DEGs的综合分析揭示了肺癌样本中特异性基因的意义,鉴定出了前5个上调基因(COL11A1、TOP2A、SULF1、DIO2、MIR196A2)和前5个下调基因(PDK4、FOSB、FLYWCH1、CYB5D2、MIR328),以及它们相关的通路或共表达网络基因。在使用的各种算法中,随机森林和XGBoost在识别常见基因方面被证明是有效的,强调了它们在肺癌发病机制中的潜在意义。MLP在使用所有基因的样本分类中表现出最高的准确性。此外,蛋白蛋白相互作用(PPI)分析确定了10个在肺癌发病机制中起关键作用的枢纽基因:COL1A1, SOX2, SPP1, THBS2, POSTN, COL5A1, COL11A1, TIMP1, TOP2A和PKP1。结论:该研究通过识别潜在的生物标志物,有助于肺癌的早期预测,提高早期诊断,为未来的实际临床应用铺平道路。整合deg和机器学习衍生的重要基因进行PPI分析,为发现肺癌治疗的关键分子靶点提供了一种强大的方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Differential gene expression analysis and machine learning identified structural, TFs, cytokine and glycoproteins, including SOX2, TOP2A, SPP1, COL1A1, and TIMP1 as potential drivers of lung cancer.

Background: Lung cancer is a primary global health concern, responsible for a considerable portion of cancer-related fatalities worldwide. Understanding its molecular complexities is crucial for identifying potential targets for treatment. The goal is to slow disease progression and intervene early to prevent the development of advanced lung cancer cases. Hence, there's an urgent need for new biomarkers that can detect lung cancer in its early stages.

Methods: The study conducted RNA-Seq analysis of lung cancer samples from the publicly available SRA database (NCBI SRP009408), including both control and tumour samples. The genes with differential expression between tumour and healthy tissues were identified using R and Bioconductor. Machine learning (ML) techniques, Random Forest, Lasso, XGBoost, Gradient Boosting and Elastic Net were employed to pinpoint significant genes followed by classifiers, Multilayer Perceptron (MLP), Support Vector Machines (SVM) and k-Nearest Neighbours (k-NN). Gene ontology and pathway analyses were performed on the significant differentially expressed genes (DEGs). The top genes from DEG and machine learning analyses were combined for protein-protein interaction (PPI) analysis, identifying 10 hub genes essential for lung cancer progression.

Results: The integrated analysis of ML and DEGs revealed the significance of specific genes in lung cancer samples, identified the top 5 upregulated genes (COL11A1, TOP2A, SULF1, DIO2, MIR196A2) and the top 5 downregulated genes (PDK4, FOSB, FLYWCH1, CYB5D2, MIR328), along with their associated genes implicated in pathways or co-expression networks were identified. Among the various algorithms employed, Random Forest and XGBoost proved effective in identifying common genes, underscoring their potential significance in lung cancer pathogenesis. The MLP exhibited the highest accuracy in classifying samples using all genes. Additionally, the protein-protein interaction (PPI) analysis identified 10 hub genes that are pivotal in lung cancer pathogenesis: COL1A1, SOX2, SPP1, THBS2, POSTN, COL5A1, COL11A1, TIMP1, TOP2A and PKP1.

Conclusion: The study contributes to the early prediction of lung cancer by identifying potential biomarkers that could enhance early diagnosis and pave the way for practical clinical applications in the future. Integrating DEGs and machine learning-derived significant genes for PPI analysis offers a robust approach to uncovering critical molecular targets for lung cancer treatment.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Biomarkers
Biomarkers 医学-毒理学
CiteScore
5.00
自引率
3.80%
发文量
140
审稿时长
3 months
期刊介绍: The journal Biomarkers brings together all aspects of the rapidly growing field of biomarker research, encompassing their various uses and applications in one essential source. Biomarkers provides a vital forum for the exchange of ideas and concepts in all areas of biomarker research. High quality papers in four main areas are accepted and manuscripts describing novel biomarkers and their subsequent validation are especially encouraged: • Biomarkers of disease • Biomarkers of exposure • Biomarkers of response • Biomarkers of susceptibility Manuscripts can describe biomarkers measured in humans or other animals in vivo or in vitro. Biomarkers will consider publishing negative data from studies of biomarkers of susceptibility in human populations.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信