整合大型语言模型临床特征的多组学聚类

IF 4.3 3区生物学 Q1 BIOCHEMICAL RESEARCH METHODS

Methods Pub Date : 2025-04-01 DOI:10.1016/j.ymeth.2025.03.017

Xiucai Ye, Tianyi Shi, Dong Huang, Tetsuya Sakurai

{"title":"整合大型语言模型临床特征的多组学聚类","authors":"Xiucai Ye, Tianyi Shi, Dong Huang, Tetsuya Sakurai","doi":"10.1016/j.ymeth.2025.03.017","DOIUrl":null,"url":null,"abstract":"<div><div>Multi-omics clustering has emerged as a powerful approach for understanding complex biological systems and enabling cancer subtyping by integrating diverse omics data. Existing methods primarily focus on the integration of different types of omics data, often overlooking the value of clinical context. In this study, we propose a novel framework that incorporates clinical features extracted from large language model (LLM) to enhance multi-omics clustering. Leveraging clinical data extracted from pathology reports using a BERT-based model, our framework converts unstructured medical text into structured clinical features. These features are integrated with omics data through an autoencoder, enriching the information content of each omics layer to improve feature extraction. The extracted features are then projected into a latent subspace using singular value decomposition (SVD), followed by spectral clustering to obtain the final clustering result. We evaluate the proposed framework on six cancer datasets on three omics levels, comparing it with several state-of-the-art methods. The experimental results demonstrate that the proposed framework outperforms existing methods in multi-omics clustering for cancer subtyping. Moreover, the results highlight the efficacy of integrating clinical features derived from LLM, significantly enhancing clustering performance. This work underscores the importance of clinical context in multi-omics analysis and showcases the transformative potential of LLM in advancing precision medicine.</div></div>","PeriodicalId":390,"journal":{"name":"Methods","volume":"239 ","pages":"Pages 64-71"},"PeriodicalIF":4.3000,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Multi-Omics clustering by integrating clinical features from large language model\",\"authors\":\"Xiucai Ye, Tianyi Shi, Dong Huang, Tetsuya Sakurai\",\"doi\":\"10.1016/j.ymeth.2025.03.017\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Multi-omics clustering has emerged as a powerful approach for understanding complex biological systems and enabling cancer subtyping by integrating diverse omics data. Existing methods primarily focus on the integration of different types of omics data, often overlooking the value of clinical context. In this study, we propose a novel framework that incorporates clinical features extracted from large language model (LLM) to enhance multi-omics clustering. Leveraging clinical data extracted from pathology reports using a BERT-based model, our framework converts unstructured medical text into structured clinical features. These features are integrated with omics data through an autoencoder, enriching the information content of each omics layer to improve feature extraction. The extracted features are then projected into a latent subspace using singular value decomposition (SVD), followed by spectral clustering to obtain the final clustering result. We evaluate the proposed framework on six cancer datasets on three omics levels, comparing it with several state-of-the-art methods. The experimental results demonstrate that the proposed framework outperforms existing methods in multi-omics clustering for cancer subtyping. Moreover, the results highlight the efficacy of integrating clinical features derived from LLM, significantly enhancing clustering performance. This work underscores the importance of clinical context in multi-omics analysis and showcases the transformative potential of LLM in advancing precision medicine.</div></div>\",\"PeriodicalId\":390,\"journal\":{\"name\":\"Methods\",\"volume\":\"239 \",\"pages\":\"Pages 64-71\"},\"PeriodicalIF\":4.3000,\"publicationDate\":\"2025-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Methods\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1046202325000830\",\"RegionNum\":3,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"BIOCHEMICAL RESEARCH METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Methods","FirstCategoryId":"99","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1046202325000830","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}

引用次数: 0

摘要

多组学聚类已成为理解复杂生物系统和通过整合不同组学数据实现癌症亚型的有力方法。现有的方法主要侧重于不同类型组学数据的整合，往往忽视了临床背景的价值。在这项研究中，我们提出了一个新的框架，该框架结合了从大语言模型（LLM）中提取的临床特征来增强多组学聚类。利用基于bert的模型从病理报告中提取的临床数据，我们的框架将非结构化的医学文本转换为结构化的临床特征。这些特征通过自编码器与组学数据集成，丰富了组学各层的信息内容，提高了特征提取的效率。然后利用奇异值分解（SVD）将提取的特征投影到潜在子空间中，然后进行谱聚类，得到最终聚类结果。我们在三个组学水平的六个癌症数据集上评估了提出的框架，并将其与几种最先进的方法进行了比较。实验结果表明，该框架优于现有的癌症亚型多组学聚类方法。此外，结果突出了整合来自LLM的临床特征的有效性，显着提高了聚类性能。这项工作强调了临床环境在多组学分析中的重要性，并展示了法学硕士在推进精准医学方面的变革潜力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Multi-Omics clustering by integrating clinical features from large language model

Multi-omics clustering has emerged as a powerful approach for understanding complex biological systems and enabling cancer subtyping by integrating diverse omics data. Existing methods primarily focus on the integration of different types of omics data, often overlooking the value of clinical context. In this study, we propose a novel framework that incorporates clinical features extracted from large language model (LLM) to enhance multi-omics clustering. Leveraging clinical data extracted from pathology reports using a BERT-based model, our framework converts unstructured medical text into structured clinical features. These features are integrated with omics data through an autoencoder, enriching the information content of each omics layer to improve feature extraction. The extracted features are then projected into a latent subspace using singular value decomposition (SVD), followed by spectral clustering to obtain the final clustering result. We evaluate the proposed framework on six cancer datasets on three omics levels, comparing it with several state-of-the-art methods. The experimental results demonstrate that the proposed framework outperforms existing methods in multi-omics clustering for cancer subtyping. Moreover, the results highlight the efficacy of integrating clinical features derived from LLM, significantly enhancing clustering performance. This work underscores the importance of clinical context in multi-omics analysis and showcases the transformative potential of LLM in advancing precision medicine.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Methods 生物-生化研究方法

CiteScore

9.80

自引率

2.10%

发文量

222

审稿时长

11.3 weeks

期刊介绍： Methods focuses on rapidly developing techniques in the experimental biological and medical sciences. Each topical issue, organized by a guest editor who is an expert in the area covered, consists solely of invited quality articles by specialist authors, many of them reviews. Issues are devoted to specific technical approaches with emphasis on clear detailed descriptions of protocols that allow them to be reproduced easily. The background information provided enables researchers to understand the principles underlying the methods; other helpful sections include comparisons of alternative methods giving the advantages and disadvantages of particular methods, guidance on avoiding potential pitfalls, and suggestions for troubleshooting.