{"title":"LOCC: a novel visualization and scoring of cutoffs for continuous variables with hepatocellular carcinoma prognosis as an example.","authors":"George Luo, Toby Chen, John J Letterio","doi":"10.1186/s12859-024-05932-1","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>The interpretation of large datasets, such as The Cancer Genome Atlas (TCGA), for scientific and research purposes, remains challenging despite their public availability. In this study, we focused on identifying gene expression profiles most relevant to patient prognosis and aimed to develop a method and database to address this issue. To achieve this, we introduced Luo's Optimization Categorization Curve (LOCC), an innovative tool for visualizing and scoring continuous variables against dichotomous outcomes. To demonstrate the efficacy of LOCC using real-world data, we analyzed gene expression profiles and patient data from TCGA hepatocellular carcinoma samples.</p><p><strong>Results: </strong>To showcase LOCC, we demonstrate an optimal cutoff for E2F1 expression in hepatocellular carcinoma, which was subsequently validated in an independent cohort. Compared to ROC curves and their AUC, LOCC offered a superior description of the predictive value of E2F1 expression across various cancer types. The LOCC score, comprised of factors representing significance, range, and impact of the biomarker, facilitated the ranking of all gene expression profiles in hepatocellular carcinoma, aiding in the evaluation and understanding of previously published prognostic gene signatures. We also demonstrate that LOCC does not have the same assumptions required of Cox proportional hazards modeling for accurate analysis. Repeated sampling demonstrated that LOCC scores outperformed ROC's AUC in discriminating predictors from non-predictors. Additionally, gene set enrichment analysis revealed significant associations between certain genes and prognosis, such as E2F target genes and G2M checkpoint with poor prognosis, and bile acid metabolism and oxidative phosphorylation with good prognosis.</p><p><strong>Conclusion: </strong>In summary, we present LOCC as a novel visualization tool for the analysis of gene expression in cancer, particularly for understanding and selecting cutoffs. Our findings suggest that LOCC scores, which effectively rank genes based on their prognostic potential, represent a more suitable approach than ROC curves and Cox proportional hazard for prognostic modeling and understanding in cancer gene expression analysis. LOCC holds promise as an invaluable tool for advancing precision medicine and furthering biomarker research. Further research regarding multivariable integration and validation will help LOCC reach its full potential and establish its utility across diverse cancer types and clinical settings.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":null,"pages":null},"PeriodicalIF":2.9000,"publicationDate":"2024-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11438210/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s12859-024-05932-1","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0
Abstract
Background: The interpretation of large datasets, such as The Cancer Genome Atlas (TCGA), for scientific and research purposes, remains challenging despite their public availability. In this study, we focused on identifying gene expression profiles most relevant to patient prognosis and aimed to develop a method and database to address this issue. To achieve this, we introduced Luo's Optimization Categorization Curve (LOCC), an innovative tool for visualizing and scoring continuous variables against dichotomous outcomes. To demonstrate the efficacy of LOCC using real-world data, we analyzed gene expression profiles and patient data from TCGA hepatocellular carcinoma samples.
Results: To showcase LOCC, we demonstrate an optimal cutoff for E2F1 expression in hepatocellular carcinoma, which was subsequently validated in an independent cohort. Compared to ROC curves and their AUC, LOCC offered a superior description of the predictive value of E2F1 expression across various cancer types. The LOCC score, comprised of factors representing significance, range, and impact of the biomarker, facilitated the ranking of all gene expression profiles in hepatocellular carcinoma, aiding in the evaluation and understanding of previously published prognostic gene signatures. We also demonstrate that LOCC does not have the same assumptions required of Cox proportional hazards modeling for accurate analysis. Repeated sampling demonstrated that LOCC scores outperformed ROC's AUC in discriminating predictors from non-predictors. Additionally, gene set enrichment analysis revealed significant associations between certain genes and prognosis, such as E2F target genes and G2M checkpoint with poor prognosis, and bile acid metabolism and oxidative phosphorylation with good prognosis.
Conclusion: In summary, we present LOCC as a novel visualization tool for the analysis of gene expression in cancer, particularly for understanding and selecting cutoffs. Our findings suggest that LOCC scores, which effectively rank genes based on their prognostic potential, represent a more suitable approach than ROC curves and Cox proportional hazard for prognostic modeling and understanding in cancer gene expression analysis. LOCC holds promise as an invaluable tool for advancing precision medicine and furthering biomarker research. Further research regarding multivariable integration and validation will help LOCC reach its full potential and establish its utility across diverse cancer types and clinical settings.
期刊介绍:
BMC Bioinformatics is an open access, peer-reviewed journal that considers articles on all aspects of the development, testing and novel application of computational and statistical methods for the modeling and analysis of all kinds of biological data, as well as other areas of computational biology.
BMC Bioinformatics is part of the BMC series which publishes subject-specific journals focused on the needs of individual research communities across all areas of biology and medicine. We offer an efficient, fair and friendly peer review service, and are committed to publishing all sound science, provided that there is some advance in knowledge presented by the work.