{"title":"Unsupervised cell line embedding using pairwise drug response correlation.","authors":"Yutae Kim, Doheon Lee","doi":"10.1016/j.csbj.2025.06.018","DOIUrl":null,"url":null,"abstract":"<p><p>Human cell line models are essential for understanding diseases and cellular functions. They are particularly emphasized in drug discovery because these models enable the systematic screening of chemical compounds and their effects. However, the heterogeneity in measurement techniques and the fragmented characterization of cell lines in chemical screening and omics data pose significant challenges to their optimal utilization. To address this, we introduce an unsupervised deep learning model based on contrastive learning that integrates heterogeneous drug response screening data into a unified cell line embedding. Utilizing the resulting embedding enhances the performance of drug-cell line-related downstream machine learning tasks to varying degrees. We used drug response data from 1,136 cell lines to train an embedding model and subsequently embedded 537 additional cell lines that were not included in the training, thereby completing the full set of 1,673 cancer cell lines from the Cancer Dependency Map (DepMap) that have corresponding gene expression data. We demonstrate that incorporating the embedding into various drug response-related tasks improves machine learning performance, including predicting drug synergy and drug response in cell lines. Furthermore, we applied SHapley additive explanations (SHAP) to identify genes with significant contributions to the embedding and found that these genes are strongly associated with drug resistance of various cancers and multiple types of cancer.</p>","PeriodicalId":10715,"journal":{"name":"Computational and structural biotechnology journal","volume":"27 ","pages":"2566-2573"},"PeriodicalIF":4.1000,"publicationDate":"2025-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12205321/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational and structural biotechnology journal","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1016/j.csbj.2025.06.018","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Human cell line models are essential for understanding diseases and cellular functions. They are particularly emphasized in drug discovery because these models enable the systematic screening of chemical compounds and their effects. However, the heterogeneity in measurement techniques and the fragmented characterization of cell lines in chemical screening and omics data pose significant challenges to their optimal utilization. To address this, we introduce an unsupervised deep learning model based on contrastive learning that integrates heterogeneous drug response screening data into a unified cell line embedding. Utilizing the resulting embedding enhances the performance of drug-cell line-related downstream machine learning tasks to varying degrees. We used drug response data from 1,136 cell lines to train an embedding model and subsequently embedded 537 additional cell lines that were not included in the training, thereby completing the full set of 1,673 cancer cell lines from the Cancer Dependency Map (DepMap) that have corresponding gene expression data. We demonstrate that incorporating the embedding into various drug response-related tasks improves machine learning performance, including predicting drug synergy and drug response in cell lines. Furthermore, we applied SHapley additive explanations (SHAP) to identify genes with significant contributions to the embedding and found that these genes are strongly associated with drug resistance of various cancers and multiple types of cancer.
期刊介绍:
Computational and Structural Biotechnology Journal (CSBJ) is an online gold open access journal publishing research articles and reviews after full peer review. All articles are published, without barriers to access, immediately upon acceptance. The journal places a strong emphasis on functional and mechanistic understanding of how molecular components in a biological process work together through the application of computational methods. Structural data may provide such insights, but they are not a pre-requisite for publication in the journal. Specific areas of interest include, but are not limited to:
Structure and function of proteins, nucleic acids and other macromolecules
Structure and function of multi-component complexes
Protein folding, processing and degradation
Enzymology
Computational and structural studies of plant systems
Microbial Informatics
Genomics
Proteomics
Metabolomics
Algorithms and Hypothesis in Bioinformatics
Mathematical and Theoretical Biology
Computational Chemistry and Drug Discovery
Microscopy and Molecular Imaging
Nanotechnology
Systems and Synthetic Biology