OUTCOME-GUIDED DISEASE SUBTYPING BY GENERATIVE MODEL AND WEIGHTED JOINT LIKELIHOOD IN TRANSCRIPTOMIC APPLICATIONS.

IF 1.4 4区数学 Q2 STATISTICS & PROBABILITY

Annals of Applied Statistics Pub Date : 2024-09-01 Epub Date: 2024-08-05 DOI:10.1214/23-aoas1865

Yujia Li, Peng Liu, Wenjia Wang, Wei Zong, Yusi Fang, Zhao Ren, Lu Tang, Juan C Celedón, Steffi Oesterreich, George C Tseng

{"title":"OUTCOME-GUIDED DISEASE SUBTYPING BY GENERATIVE MODEL AND WEIGHTED JOINT LIKELIHOOD IN TRANSCRIPTOMIC APPLICATIONS.","authors":"Yujia Li, Peng Liu, Wenjia Wang, Wei Zong, Yusi Fang, Zhao Ren, Lu Tang, Juan C Celedón, Steffi Oesterreich, George C Tseng","doi":"10.1214/23-aoas1865","DOIUrl":null,"url":null,"abstract":"<p><p>With advances in high-throughput technology, molecular disease subtyping by high-dimensional omics data has been recognized as an effective approach for identifying subtypes of complex diseases with distinct disease mechanisms and prognoses. Conventional cluster analysis takes omics data as input and generates patient clusters with similar gene expression pattern. The omics data, however, usually contain multi-faceted cluster structures that can be defined by different sets of gene. If the gene set associated with irrelevant clinical variables (e.g., sex or age) dominates the clustering process, the resulting clusters may not capture clinically meaningful disease subtypes. This motivates the development of a clustering framework with guidance from a pre-specified disease outcome, such as lung function measurement or survival, in this paper. We propose two disease subtyping methods by omics data with outcome guidance using a generative model or a weighted joint likelihood. Both methods connect an outcome association model and a disease subtyping model by a latent variable of cluster labels. Compared to the generative model, weighted joint likelihood contains a data-driven weight parameter to balance the likelihood contributions from outcome association and gene cluster separation, which improves generalizability in independent validation but requires heavier computing. Extensive simulations and two real applications in lung disease and triple-negative breast cancer demonstrate superior disease subtyping performance of the outcome-guided clustering methods in terms of disease subtyping accuracy, gene selection and outcome association. Unlike existing clustering methods, the outcome-guided disease subtyping framework creates a new precision medicine paradigm to directly identify patient subgroups with clinical association.</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"18 3","pages":"1947-1964"},"PeriodicalIF":1.4000,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12309773/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Annals of Applied Statistics","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1214/23-aoas1865","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/8/5 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}

引用次数: 0

Abstract

With advances in high-throughput technology, molecular disease subtyping by high-dimensional omics data has been recognized as an effective approach for identifying subtypes of complex diseases with distinct disease mechanisms and prognoses. Conventional cluster analysis takes omics data as input and generates patient clusters with similar gene expression pattern. The omics data, however, usually contain multi-faceted cluster structures that can be defined by different sets of gene. If the gene set associated with irrelevant clinical variables (e.g., sex or age) dominates the clustering process, the resulting clusters may not capture clinically meaningful disease subtypes. This motivates the development of a clustering framework with guidance from a pre-specified disease outcome, such as lung function measurement or survival, in this paper. We propose two disease subtyping methods by omics data with outcome guidance using a generative model or a weighted joint likelihood. Both methods connect an outcome association model and a disease subtyping model by a latent variable of cluster labels. Compared to the generative model, weighted joint likelihood contains a data-driven weight parameter to balance the likelihood contributions from outcome association and gene cluster separation, which improves generalizability in independent validation but requires heavier computing. Extensive simulations and two real applications in lung disease and triple-negative breast cancer demonstrate superior disease subtyping performance of the outcome-guided clustering methods in terms of disease subtyping accuracy, gene selection and outcome association. Unlike existing clustering methods, the outcome-guided disease subtyping framework creates a new precision medicine paradigm to directly identify patient subgroups with clinical association.

查看原文本刊更多论文

转录组学应用中生成模型和加权联合似然的结果导向疾病亚型。

随着高通量技术的进步，利用高维组学数据进行疾病分子分型已被认为是识别具有不同发病机制和预后的复杂疾病亚型的有效方法。传统的聚类分析以组学数据为输入，生成具有相似基因表达模式的患者聚类。然而，组学数据通常包含多方面的簇结构，可以由不同的基因集来定义。如果与不相关的临床变量（例如，性别或年龄）相关的基因集在聚类过程中占主导地位，则所得的聚类可能无法捕获临床有意义的疾病亚型。在本文中，这激发了基于预先指定的疾病结果（如肺功能测量或生存率）指导的聚类框架的发展。我们提出了两种疾病分型方法组学数据与结果指导使用生成模型或加权联合似然。两种方法都通过聚类标签的潜在变量将结果关联模型和疾病亚型模型连接起来。与生成模型相比，加权联合似然包含一个数据驱动的权重参数来平衡结果关联和基因聚类分离的似然贡献，提高了独立验证的泛化性，但需要更多的计算。广泛的模拟和在肺部疾病和三阴性乳腺癌中的两个实际应用表明，结果导向聚类方法在疾病分型准确性、基因选择和结果关联方面具有优越的疾病分型性能。与现有的聚类方法不同，以结果为导向的疾病亚型框架创建了一种新的精准医学范式，可以直接识别具有临床关联的患者亚组。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Annals of Applied Statistics 社会科学-统计学与概率论

CiteScore

3.10

自引率

5.60%

发文量

131

审稿时长

6-12 weeks

期刊介绍： Statistical research spans an enormous range from direct subject-matter collaborations to pure mathematical theory. The Annals of Applied Statistics, the newest journal from the IMS, is aimed at papers in the applied half of this range. Published quarterly in both print and electronic form, our goal is to provide a timely and unified forum for all areas of applied statistics.