A clustering approach to integrative analyses of multiomic cancer data.

IF 1.1 4区数学 Q2 STATISTICS & PROBABILITY

Journal of Applied Statistics Pub Date : 2024-11-29 eCollection Date: 2025-01-01 DOI:10.1080/02664763.2024.2431742

Dongyan Yan, Subharup Guha

{"title":"A clustering approach to integrative analyses of multiomic cancer data.","authors":"Dongyan Yan, Subharup Guha","doi":"10.1080/02664763.2024.2431742","DOIUrl":null,"url":null,"abstract":"<p><p>Rapid technological advances have allowed for molecular profiling across multiple omics domains for clinical decision-making in many diseases, especially cancer. However, as tumor development and progression are biological processes involving composite genomic aberrations, key challenges are to effectively assimilate information from these domains to identify genomic signatures and druggable biological entities, develop accurate risk prediction profiles for future patients, and identify novel patient subgroups for tailored therapy and monitoring. We propose integrative frameworks for high-dimensional multiple-domain cancer data. These Bayesian mixture model-based approaches coherently incorporate dependence within and between domains to accurately detect tumor subtypes, thus providing a catalog of genomic aberrations associated with cancer taxonomy. The flexible and scalable Bayesian nonparametric strategy performs simultaneous bidirectional clustering of the tumor samples and genomic probes to achieve dimension reduction. We describe an efficient variable selection procedure that can identify relevant genomic aberrations and potentially reveal underlying drivers of disease. Although the work is motivated by lung cancer datasets, the proposed methods are broadly applicable in a variety of contexts involving high-dimensional data. The success of the methodology is demonstrated using artificial data and lung cancer omics profiles publicly available from The Cancer Genome Atlas.</p>","PeriodicalId":15239,"journal":{"name":"Journal of Applied Statistics","volume":"52 8","pages":"1539-1560"},"PeriodicalIF":1.1000,"publicationDate":"2024-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12147493/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Applied Statistics","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1080/02664763.2024.2431742","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}

引用次数: 0

Abstract

Rapid technological advances have allowed for molecular profiling across multiple omics domains for clinical decision-making in many diseases, especially cancer. However, as tumor development and progression are biological processes involving composite genomic aberrations, key challenges are to effectively assimilate information from these domains to identify genomic signatures and druggable biological entities, develop accurate risk prediction profiles for future patients, and identify novel patient subgroups for tailored therapy and monitoring. We propose integrative frameworks for high-dimensional multiple-domain cancer data. These Bayesian mixture model-based approaches coherently incorporate dependence within and between domains to accurately detect tumor subtypes, thus providing a catalog of genomic aberrations associated with cancer taxonomy. The flexible and scalable Bayesian nonparametric strategy performs simultaneous bidirectional clustering of the tumor samples and genomic probes to achieve dimension reduction. We describe an efficient variable selection procedure that can identify relevant genomic aberrations and potentially reveal underlying drivers of disease. Although the work is motivated by lung cancer datasets, the proposed methods are broadly applicable in a variety of contexts involving high-dimensional data. The success of the methodology is demonstrated using artificial data and lung cancer omics profiles publicly available from The Cancer Genome Atlas.

查看原文本刊更多论文

多组癌症数据综合分析的聚类方法。

快速的技术进步使得跨多个组学域的分子谱分析能够用于许多疾病，特别是癌症的临床决策。然而，由于肿瘤的发展和进展是涉及复合基因组畸变的生物学过程，关键的挑战是有效地吸收来自这些域的信息以识别基因组特征和可药物生物实体，为未来患者制定准确的风险预测概况，并确定新的患者亚群以进行定制治疗和监测。我们提出了高维多域癌症数据的集成框架。这些基于贝叶斯混合模型的方法连贯地结合域内和域间的依赖性来准确检测肿瘤亚型，从而提供与癌症分类相关的基因组畸变目录。灵活且可扩展的贝叶斯非参数策略同时对肿瘤样本和基因组探针进行双向聚类，以实现降维。我们描述了一种有效的变量选择程序，可以识别相关的基因组畸变并潜在地揭示疾病的潜在驱动因素。虽然这项工作是由肺癌数据集驱动的，但所提出的方法广泛适用于涉及高维数据的各种环境。使用人工数据和肺癌基因组图谱公开提供的肺癌组学资料证明了该方法的成功。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Applied Statistics 数学-统计学与概率论

CiteScore

3.40

自引率

0.00%

发文量

126

审稿时长

6 months

期刊介绍： Journal of Applied Statistics provides a forum for communication between both applied statisticians and users of applied statistical techniques across a wide range of disciplines. These areas include business, computing, economics, ecology, education, management, medicine, operational research and sociology, but papers from other areas are also considered. The editorial policy is to publish rigorous but clear and accessible papers on applied techniques. Purely theoretical papers are avoided but those on theoretical developments which clearly demonstrate significant applied potential are welcomed. Each paper is submitted to at least two independent referees.