使用集成机器学习方法识别前列腺癌的合理候选者

IF 1.4 4区生物学 Q4 BIOCHEMISTRY & MOLECULAR BIOLOGY

Current Genomics Pub Date : 2023-11-22 DOI:10.2174/0113892029240239231109082805

Bhumandeep Kour, Nidhi Shukla, Harshita Bhargava, Devendra Sharma, Amita Sharma, Jayaraman Valadi, TC Sadasukhi, Sugunakar Vuree, Prashanth Suravajhala

{"title":"使用集成机器学习方法识别前列腺癌的合理候选者","authors":"Bhumandeep Kour, Nidhi Shukla, Harshita Bhargava, Devendra Sharma, Amita Sharma, Jayaraman Valadi, TC Sadasukhi, Sugunakar Vuree, Prashanth Suravajhala","doi":"10.2174/0113892029240239231109082805","DOIUrl":null,"url":null,"abstract":"Background: Currently, prostate-specific antigen (PSA) is commonly used as a prostate cancer (PCa) biomarker. PSA is linked to some factors that frequently lead to erroneous positive results or even needless biopsies of elderly people. Objectives: In this pilot study, we undermined the potential genes and mutations from several databases and checked whether or not any putative prognostic biomarkers are central to the annotation. The aim of the study was to develop a risk prediction model that could help in clinical decision-making. Methods: An extensive literature review was conducted, and clinical parameters for related comorbidities, such as diabetes, obesity, as well as PCa, were collected. Such parameters were chosen with the understanding that variations in their threshold values could hasten the complicated process of carcinogenesis, more particularly PCa. The gathered data was converted to semi-binary data (-1, -0.5, 0, 0.5, and 1), on which machine learning (ML) methods were applied. First, we cross-checked various publicly available datasets, some published RNA-seq datasets, and our whole-exome sequencing data to find common role players in PCa, diabetes, and obesity. To narrow down their common interacting partners, interactome networks were analysed using GeneMANIA and visualised using Cytoscape, and later cBioportal was used (to compare expression level based on Z scored values) wherein various types of mutation w.r.t their expression and mRNA expression (RNA seq FPKM) plots are available. The GEPIA 2 tool was used to compare the expression of resulting similarities between the normal tissue and TCGA databases of PCa. Later, top-ranking genes were chosen to demonstrate striking clustering coefficients using the Cytoscape-cytoHubba module, and GEPIA 2 was applied again to ascertain survival plots. Results: Comparing various publicly available datasets, it was found that BLM is a frequent player in all three diseases, whereas comparing publicly available datasets, GWAS datasets, and published sequencing findings, SPFTPC and PPIMB were found to be the most common. With the assistance of GeneMANIA, TMPO and FOXP1 were found as common interacting partners, and they were also seen participating with BLM. Conclusion: A probabilistic machine learning model was achieved to identify key candidates between diabetes, obesity, and PCa. This, we believe, would herald precision scale modeling for easy prognosis.","PeriodicalId":10803,"journal":{"name":"Current Genomics","volume":"11 4","pages":""},"PeriodicalIF":1.4000,"publicationDate":"2023-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Identification of Plausible Candidates in Prostate Cancer Using Integrated Machine Learning Approaches\",\"authors\":\"Bhumandeep Kour, Nidhi Shukla, Harshita Bhargava, Devendra Sharma, Amita Sharma, Jayaraman Valadi, TC Sadasukhi, Sugunakar Vuree, Prashanth Suravajhala\",\"doi\":\"10.2174/0113892029240239231109082805\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background: Currently, prostate-specific antigen (PSA) is commonly used as a prostate cancer (PCa) biomarker. PSA is linked to some factors that frequently lead to erroneous positive results or even needless biopsies of elderly people. Objectives: In this pilot study, we undermined the potential genes and mutations from several databases and checked whether or not any putative prognostic biomarkers are central to the annotation. The aim of the study was to develop a risk prediction model that could help in clinical decision-making. Methods: An extensive literature review was conducted, and clinical parameters for related comorbidities, such as diabetes, obesity, as well as PCa, were collected. Such parameters were chosen with the understanding that variations in their threshold values could hasten the complicated process of carcinogenesis, more particularly PCa. The gathered data was converted to semi-binary data (-1, -0.5, 0, 0.5, and 1), on which machine learning (ML) methods were applied. First, we cross-checked various publicly available datasets, some published RNA-seq datasets, and our whole-exome sequencing data to find common role players in PCa, diabetes, and obesity. To narrow down their common interacting partners, interactome networks were analysed using GeneMANIA and visualised using Cytoscape, and later cBioportal was used (to compare expression level based on Z scored values) wherein various types of mutation w.r.t their expression and mRNA expression (RNA seq FPKM) plots are available. The GEPIA 2 tool was used to compare the expression of resulting similarities between the normal tissue and TCGA databases of PCa. Later, top-ranking genes were chosen to demonstrate striking clustering coefficients using the Cytoscape-cytoHubba module, and GEPIA 2 was applied again to ascertain survival plots. Results: Comparing various publicly available datasets, it was found that BLM is a frequent player in all three diseases, whereas comparing publicly available datasets, GWAS datasets, and published sequencing findings, SPFTPC and PPIMB were found to be the most common. With the assistance of GeneMANIA, TMPO and FOXP1 were found as common interacting partners, and they were also seen participating with BLM. Conclusion: A probabilistic machine learning model was achieved to identify key candidates between diabetes, obesity, and PCa. This, we believe, would herald precision scale modeling for easy prognosis.\",\"PeriodicalId\":10803,\"journal\":{\"name\":\"Current Genomics\",\"volume\":\"11 4\",\"pages\":\"\"},\"PeriodicalIF\":1.4000,\"publicationDate\":\"2023-11-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Current Genomics\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.2174/0113892029240239231109082805\",\"RegionNum\":4,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"BIOCHEMISTRY & MOLECULAR BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Current Genomics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.2174/0113892029240239231109082805","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}

引用次数: 0

摘要

背景:目前，前列腺特异性抗原(PSA)被广泛用作前列腺癌(PCa)的生物标志物。PSA与一些因素有关，这些因素经常导致错误的阳性结果，甚至导致老年人不必要的活组织检查。目的:在这项初步研究中，我们从几个数据库中破坏了潜在的基因和突变，并检查是否有任何假定的预后生物标志物是注释的核心。这项研究的目的是开发一种风险预测模型，以帮助临床决策。方法:广泛查阅文献，收集糖尿病、肥胖、前列腺癌等相关合并症的临床参数。这些参数的选择是基于这样一种认识，即它们的阈值的变化可能加速复杂的致癌过程，尤其是前列腺癌。将收集到的数据转换为半二进制数据(-1，-0.5,0,0.5和1)，并应用机器学习(ML)方法。首先，我们交叉检查了各种公开可用的数据集，一些已发表的RNA-seq数据集，以及我们的全外显子组测序数据，以找到PCa、糖尿病和肥胖的共同角色。为了缩小它们共同的相互作用伙伴，使用GeneMANIA分析相互作用组网络，并使用Cytoscape进行可视化，随后使用cbiopportal(根据Z评分值比较表达水平)，其中各种类型的突变在其表达和mRNA表达(RNA序列FPKM)图中可用。使用GEPIA 2工具比较PCa正常组织与TCGA数据库之间的相似性表达。随后，使用Cytoscape-cytoHubba模块选择排名靠前的基因显示惊人的聚类系数，并再次应用GEPIA 2确定生存图。结果:比较各种公开可用的数据集，发现BLM在这三种疾病中都是常见的参与者，而比较公开可用的数据集、GWAS数据集和已发表的测序结果，发现SPFTPC和PPIMB最常见。在GeneMANIA的协助下，TMPO和FOXP1被发现是共同的互动伙伴，并且也被看到与BLM一起参与。结论:实现了一个概率机器学习模型来识别糖尿病、肥胖和PCa之间的关键候选者。我们相信，这将预示着精确的比例建模，便于预测。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Identification of Plausible Candidates in Prostate Cancer Using Integrated Machine Learning Approaches

Background: Currently, prostate-specific antigen (PSA) is commonly used as a prostate cancer (PCa) biomarker. PSA is linked to some factors that frequently lead to erroneous positive results or even needless biopsies of elderly people. Objectives: In this pilot study, we undermined the potential genes and mutations from several databases and checked whether or not any putative prognostic biomarkers are central to the annotation. The aim of the study was to develop a risk prediction model that could help in clinical decision-making. Methods: An extensive literature review was conducted, and clinical parameters for related comorbidities, such as diabetes, obesity, as well as PCa, were collected. Such parameters were chosen with the understanding that variations in their threshold values could hasten the complicated process of carcinogenesis, more particularly PCa. The gathered data was converted to semi-binary data (-1, -0.5, 0, 0.5, and 1), on which machine learning (ML) methods were applied. First, we cross-checked various publicly available datasets, some published RNA-seq datasets, and our whole-exome sequencing data to find common role players in PCa, diabetes, and obesity. To narrow down their common interacting partners, interactome networks were analysed using GeneMANIA and visualised using Cytoscape, and later cBioportal was used (to compare expression level based on Z scored values) wherein various types of mutation w.r.t their expression and mRNA expression (RNA seq FPKM) plots are available. The GEPIA 2 tool was used to compare the expression of resulting similarities between the normal tissue and TCGA databases of PCa. Later, top-ranking genes were chosen to demonstrate striking clustering coefficients using the Cytoscape-cytoHubba module, and GEPIA 2 was applied again to ascertain survival plots. Results: Comparing various publicly available datasets, it was found that BLM is a frequent player in all three diseases, whereas comparing publicly available datasets, GWAS datasets, and published sequencing findings, SPFTPC and PPIMB were found to be the most common. With the assistance of GeneMANIA, TMPO and FOXP1 were found as common interacting partners, and they were also seen participating with BLM. Conclusion: A probabilistic machine learning model was achieved to identify key candidates between diabetes, obesity, and PCa. This, we believe, would herald precision scale modeling for easy prognosis.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Current Genomics 生物-生化与分子生物学

CiteScore

5.20

自引率

0.00%

发文量

审稿时长

>0 weeks

期刊介绍： Current Genomics is a peer-reviewed journal that provides essential reading about the latest and most important developments in genome science and related fields of research. Systems biology, systems modeling, machine learning, network inference, bioinformatics, computational biology, epigenetics, single cell genomics, extracellular vesicles, quantitative biology, and synthetic biology for the study of evolution, development, maintenance, aging and that of human health, human diseases, clinical genomics and precision medicine are topics of particular interest. The journal covers plant genomics. The journal will not consider articles dealing with breeding and livestock. Current Genomics publishes three types of articles including: i) Research papers from internationally-recognized experts reporting on new and original data generated at the genome scale level. Position papers dealing with new or challenging methodological approaches, whether experimental or mathematical, are greatly welcome in this section. ii) Authoritative and comprehensive full-length or mini reviews from widely recognized experts, covering the latest developments in genome science and related fields of research such as systems biology, statistics and machine learning, quantitative biology, and precision medicine. Proposals for mini-hot topics (2-3 review papers) and full hot topics (6-8 review papers) guest edited by internationally-recognized experts are welcome in this section. Hot topic proposals should not contain original data and they should contain articles originating from at least 2 different countries. iii) Opinion papers from internationally recognized experts addressing contemporary questions and issues in the field of genome science and systems biology and basic and clinical research practices.