Unsupervised cell line embedding using pairwise drug response correlation.

IF 4.1 2区生物学 Q2 BIOCHEMISTRY & MOLECULAR BIOLOGY

Computational and structural biotechnology journal Pub Date : 2025-06-11 eCollection Date: 2025-01-01 DOI:10.1016/j.csbj.2025.06.018

Yutae Kim, Doheon Lee

{"title":"Unsupervised cell line embedding using pairwise drug response correlation.","authors":"Yutae Kim, Doheon Lee","doi":"10.1016/j.csbj.2025.06.018","DOIUrl":null,"url":null,"abstract":"<p><p>Human cell line models are essential for understanding diseases and cellular functions. They are particularly emphasized in drug discovery because these models enable the systematic screening of chemical compounds and their effects. However, the heterogeneity in measurement techniques and the fragmented characterization of cell lines in chemical screening and omics data pose significant challenges to their optimal utilization. To address this, we introduce an unsupervised deep learning model based on contrastive learning that integrates heterogeneous drug response screening data into a unified cell line embedding. Utilizing the resulting embedding enhances the performance of drug-cell line-related downstream machine learning tasks to varying degrees. We used drug response data from 1,136 cell lines to train an embedding model and subsequently embedded 537 additional cell lines that were not included in the training, thereby completing the full set of 1,673 cancer cell lines from the Cancer Dependency Map (DepMap) that have corresponding gene expression data. We demonstrate that incorporating the embedding into various drug response-related tasks improves machine learning performance, including predicting drug synergy and drug response in cell lines. Furthermore, we applied SHapley additive explanations (SHAP) to identify genes with significant contributions to the embedding and found that these genes are strongly associated with drug resistance of various cancers and multiple types of cancer.</p>","PeriodicalId":10715,"journal":{"name":"Computational and structural biotechnology journal","volume":"27 ","pages":"2566-2573"},"PeriodicalIF":4.1000,"publicationDate":"2025-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12205321/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational and structural biotechnology journal","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1016/j.csbj.2025.06.018","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Human cell line models are essential for understanding diseases and cellular functions. They are particularly emphasized in drug discovery because these models enable the systematic screening of chemical compounds and their effects. However, the heterogeneity in measurement techniques and the fragmented characterization of cell lines in chemical screening and omics data pose significant challenges to their optimal utilization. To address this, we introduce an unsupervised deep learning model based on contrastive learning that integrates heterogeneous drug response screening data into a unified cell line embedding. Utilizing the resulting embedding enhances the performance of drug-cell line-related downstream machine learning tasks to varying degrees. We used drug response data from 1,136 cell lines to train an embedding model and subsequently embedded 537 additional cell lines that were not included in the training, thereby completing the full set of 1,673 cancer cell lines from the Cancer Dependency Map (DepMap) that have corresponding gene expression data. We demonstrate that incorporating the embedding into various drug response-related tasks improves machine learning performance, including predicting drug synergy and drug response in cell lines. Furthermore, we applied SHapley additive explanations (SHAP) to identify genes with significant contributions to the embedding and found that these genes are strongly associated with drug resistance of various cancers and multiple types of cancer.

查看原文本刊更多论文

使用成对药物反应相关的无监督细胞系嵌入。

人类细胞系模型对于理解疾病和细胞功能至关重要。它们在药物发现中特别强调，因为这些模型能够系统地筛选化合物及其作用。然而，化学筛选和组学数据中测量技术的异质性和细胞系特征的碎片化对其优化利用构成了重大挑战。为了解决这个问题，我们引入了一种基于对比学习的无监督深度学习模型，该模型将异质药物反应筛选数据集成到统一的细胞系嵌入中。利用由此产生的嵌入在不同程度上提高了与药物细胞系相关的下游机器学习任务的性能。我们使用来自1136个细胞系的药物反应数据来训练一个嵌入模型，随后嵌入了537个未包括在训练中的额外细胞系，从而完成了来自癌症依赖图谱（DepMap）中具有相应基因表达数据的1,673个癌细胞系的完整集合。我们证明，将嵌入纳入各种药物反应相关任务可以提高机器学习性能，包括预测细胞系中的药物协同作用和药物反应。此外，我们应用SHapley加性解释（SHAP）鉴定了对嵌入有重要贡献的基因，发现这些基因与多种癌症和多种类型癌症的耐药密切相关。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Computational and structural biotechnology journal Biochemistry, Genetics and Molecular Biology-Biophysics

CiteScore

9.30

自引率

3.30%

发文量

540

审稿时长

6 weeks

期刊介绍： Computational and Structural Biotechnology Journal (CSBJ) is an online gold open access journal publishing research articles and reviews after full peer review. All articles are published, without barriers to access, immediately upon acceptance. The journal places a strong emphasis on functional and mechanistic understanding of how molecular components in a biological process work together through the application of computational methods. Structural data may provide such insights, but they are not a pre-requisite for publication in the journal. Specific areas of interest include, but are not limited to: Structure and function of proteins, nucleic acids and other macromolecules Structure and function of multi-component complexes Protein folding, processing and degradation Enzymology Computational and structural studies of plant systems Microbial Informatics Genomics Proteomics Metabolomics Algorithms and Hypothesis in Bioinformatics Mathematical and Theoretical Biology Computational Chemistry and Drug Discovery Microscopy and Molecular Imaging Nanotechnology Systems and Synthetic Biology