Xiucai Ye, Tianyi Shi, Dong Huang, Tetsuya Sakurai
{"title":"整合大型语言模型临床特征的多组学聚类","authors":"Xiucai Ye, Tianyi Shi, Dong Huang, Tetsuya Sakurai","doi":"10.1016/j.ymeth.2025.03.017","DOIUrl":null,"url":null,"abstract":"<div><div>Multi-omics clustering has emerged as a powerful approach for understanding complex biological systems and enabling cancer subtyping by integrating diverse omics data. Existing methods primarily focus on the integration of different types of omics data, often overlooking the value of clinical context. In this study, we propose a novel framework that incorporates clinical features extracted from large language model (LLM) to enhance multi-omics clustering. Leveraging clinical data extracted from pathology reports using a BERT-based model, our framework converts unstructured medical text into structured clinical features. These features are integrated with omics data through an autoencoder, enriching the information content of each omics layer to improve feature extraction. The extracted features are then projected into a latent subspace using singular value decomposition (SVD), followed by spectral clustering to obtain the final clustering result. We evaluate the proposed framework on six cancer datasets on three omics levels, comparing it with several state-of-the-art methods. The experimental results demonstrate that the proposed framework outperforms existing methods in multi-omics clustering for cancer subtyping. Moreover, the results highlight the efficacy of integrating clinical features derived from LLM, significantly enhancing clustering performance. This work underscores the importance of clinical context in multi-omics analysis and showcases the transformative potential of LLM in advancing precision medicine.</div></div>","PeriodicalId":390,"journal":{"name":"Methods","volume":"239 ","pages":"Pages 64-71"},"PeriodicalIF":4.2000,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Multi-Omics clustering by integrating clinical features from large language model\",\"authors\":\"Xiucai Ye, Tianyi Shi, Dong Huang, Tetsuya Sakurai\",\"doi\":\"10.1016/j.ymeth.2025.03.017\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Multi-omics clustering has emerged as a powerful approach for understanding complex biological systems and enabling cancer subtyping by integrating diverse omics data. Existing methods primarily focus on the integration of different types of omics data, often overlooking the value of clinical context. In this study, we propose a novel framework that incorporates clinical features extracted from large language model (LLM) to enhance multi-omics clustering. Leveraging clinical data extracted from pathology reports using a BERT-based model, our framework converts unstructured medical text into structured clinical features. These features are integrated with omics data through an autoencoder, enriching the information content of each omics layer to improve feature extraction. The extracted features are then projected into a latent subspace using singular value decomposition (SVD), followed by spectral clustering to obtain the final clustering result. We evaluate the proposed framework on six cancer datasets on three omics levels, comparing it with several state-of-the-art methods. The experimental results demonstrate that the proposed framework outperforms existing methods in multi-omics clustering for cancer subtyping. Moreover, the results highlight the efficacy of integrating clinical features derived from LLM, significantly enhancing clustering performance. This work underscores the importance of clinical context in multi-omics analysis and showcases the transformative potential of LLM in advancing precision medicine.</div></div>\",\"PeriodicalId\":390,\"journal\":{\"name\":\"Methods\",\"volume\":\"239 \",\"pages\":\"Pages 64-71\"},\"PeriodicalIF\":4.2000,\"publicationDate\":\"2025-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Methods\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1046202325000830\",\"RegionNum\":3,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"BIOCHEMICAL RESEARCH METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Methods","FirstCategoryId":"99","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1046202325000830","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
Multi-Omics clustering by integrating clinical features from large language model
Multi-omics clustering has emerged as a powerful approach for understanding complex biological systems and enabling cancer subtyping by integrating diverse omics data. Existing methods primarily focus on the integration of different types of omics data, often overlooking the value of clinical context. In this study, we propose a novel framework that incorporates clinical features extracted from large language model (LLM) to enhance multi-omics clustering. Leveraging clinical data extracted from pathology reports using a BERT-based model, our framework converts unstructured medical text into structured clinical features. These features are integrated with omics data through an autoencoder, enriching the information content of each omics layer to improve feature extraction. The extracted features are then projected into a latent subspace using singular value decomposition (SVD), followed by spectral clustering to obtain the final clustering result. We evaluate the proposed framework on six cancer datasets on three omics levels, comparing it with several state-of-the-art methods. The experimental results demonstrate that the proposed framework outperforms existing methods in multi-omics clustering for cancer subtyping. Moreover, the results highlight the efficacy of integrating clinical features derived from LLM, significantly enhancing clustering performance. This work underscores the importance of clinical context in multi-omics analysis and showcases the transformative potential of LLM in advancing precision medicine.
期刊介绍:
Methods focuses on rapidly developing techniques in the experimental biological and medical sciences.
Each topical issue, organized by a guest editor who is an expert in the area covered, consists solely of invited quality articles by specialist authors, many of them reviews. Issues are devoted to specific technical approaches with emphasis on clear detailed descriptions of protocols that allow them to be reproduced easily. The background information provided enables researchers to understand the principles underlying the methods; other helpful sections include comparisons of alternative methods giving the advantages and disadvantages of particular methods, guidance on avoiding potential pitfalls, and suggestions for troubleshooting.