Cancer subtype identification by multi-omics clustering based on interpretable feature and latent subspace learning

IF 4.3 3区生物学 Q1 BIOCHEMICAL RESEARCH METHODS

Methods Pub Date : 2024-09-24 DOI:10.1016/j.ymeth.2024.09.014

Tianyi Shi, Xiucai Ye, Dong Huang, Tetsuya Sakurai

{"title":"Cancer subtype identification by multi-omics clustering based on interpretable feature and latent subspace learning","authors":"Tianyi Shi, Xiucai Ye, Dong Huang, Tetsuya Sakurai","doi":"10.1016/j.ymeth.2024.09.014","DOIUrl":null,"url":null,"abstract":"<div><div>In recent years, multi-omics clustering has become a powerful tool in cancer research, offering a comprehensive perspective on the diverse molecular characteristics inherent to various cancer subtypes. However, most existing multi-omics clustering methods directly integrate heterogeneous features from different omics, which may struggle to deal with the noise or redundancy of multi-omics data and lead to poor clustering results. Therefore, we propose a novel multi-omics clustering method to extract interpretable and discriminative features from various omics before data integration. The clinical information is used to supervise the process of feature extraction based on SHAP (SHapley Additive exPlanation) values. Singular value decomposition (SVD) is then applied to integrate the extracted features of different omics by constructing a latent subspace. Finally, we utilize shared nearest neighbor-based spectral clustering on the latent representation to obtain the clustering result. The proposed method is evaluated on several cancer datasets across three levels of omics, in comparison to several state-of-the-art multi-omics clustering methods. The comparison results demonstrate the superior performance of the proposed method in multi-omics data analysis for cancer subtyping. Additionally, experiments reveal the efficacy of utilizing clinical information based on SHAP values for feature extraction, enhancing the performance of clustering analyses. Moreover, enrichment analysis of the identified gene signatures in different subtypes is also performed to further demonstrate the effectiveness of the proposed method.</div><div><strong>Availability:</strong> The proposed method can be freely accessible at <span><span>https://github.com/Tianyi-Shi-Tsukuba/Multi-omics-clustering-based-on-SHAP</span><svg><path></path></svg></span>. Data will be made available on request.</div></div>","PeriodicalId":390,"journal":{"name":"Methods","volume":"231 ","pages":"Pages 144-153"},"PeriodicalIF":4.3000,"publicationDate":"2024-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Methods","FirstCategoryId":"99","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1046202324002123","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

In recent years, multi-omics clustering has become a powerful tool in cancer research, offering a comprehensive perspective on the diverse molecular characteristics inherent to various cancer subtypes. However, most existing multi-omics clustering methods directly integrate heterogeneous features from different omics, which may struggle to deal with the noise or redundancy of multi-omics data and lead to poor clustering results. Therefore, we propose a novel multi-omics clustering method to extract interpretable and discriminative features from various omics before data integration. The clinical information is used to supervise the process of feature extraction based on SHAP (SHapley Additive exPlanation) values. Singular value decomposition (SVD) is then applied to integrate the extracted features of different omics by constructing a latent subspace. Finally, we utilize shared nearest neighbor-based spectral clustering on the latent representation to obtain the clustering result. The proposed method is evaluated on several cancer datasets across three levels of omics, in comparison to several state-of-the-art multi-omics clustering methods. The comparison results demonstrate the superior performance of the proposed method in multi-omics data analysis for cancer subtyping. Additionally, experiments reveal the efficacy of utilizing clinical information based on SHAP values for feature extraction, enhancing the performance of clustering analyses. Moreover, enrichment analysis of the identified gene signatures in different subtypes is also performed to further demonstrate the effectiveness of the proposed method.

Availability: The proposed method can be freely accessible at https://github.com/Tianyi-Shi-Tsukuba/Multi-omics-clustering-based-on-SHAP. Data will be made available on request.

查看原文本刊更多论文

基于可解释特征和潜在子空间学习的多组学聚类癌症亚型识别。

近年来，多组学聚类技术已成为癌症研究的有力工具，可全面透视各种癌症亚型固有的不同分子特征。然而，现有的多组学聚类方法大多直接整合来自不同组学的异构特征，可能难以处理多组学数据的噪声或冗余，导致聚类结果不佳。因此，我们提出了一种新颖的多组学聚类方法，在数据整合之前从不同的组学数据中提取可解释和可判别的特征。临床信息用于监督基于 SHAP（SHapley Additive exPlanation）值的特征提取过程。然后应用奇异值分解（SVD），通过构建一个潜在子空间来整合不同 omics 的提取特征。最后，我们在潜在表征上使用基于共享近邻的光谱聚类来获得聚类结果。与几种最先进的多组学聚类方法相比，我们在多个癌症数据集上对所提出的方法进行了评估。对比结果表明，所提出的方法在癌症亚型的多组学数据分析中表现出色。此外，实验还揭示了利用基于 SHAP 值的临床信息进行特征提取的功效，从而提高了聚类分析的性能。此外，还对不同亚型中已识别的基因特征进行了富集分析，进一步证明了所提方法的有效性。可用性：建议的方法可在 https://github.com/Tianyi-Shi-Tsukuba/Multi-omics-clustering-based-on-SHAP 免费获取。数据将应要求提供。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Methods 生物-生化研究方法

CiteScore

9.80

自引率

2.10%

发文量

222

审稿时长

11.3 weeks

期刊介绍： Methods focuses on rapidly developing techniques in the experimental biological and medical sciences. Each topical issue, organized by a guest editor who is an expert in the area covered, consists solely of invited quality articles by specialist authors, many of them reviews. Issues are devoted to specific technical approaches with emphasis on clear detailed descriptions of protocols that allow them to be reproduced easily. The background information provided enables researchers to understand the principles underlying the methods; other helpful sections include comparisons of alternative methods giving the advantages and disadvantages of particular methods, guidance on avoiding potential pitfalls, and suggestions for troubleshooting.