Estimating the effect of tissue- and blood-derived cell reference matrices on deconvolving bulk transcriptomic datasets.

IF 4.1 2区 生物学 Q2 BIOCHEMISTRY & MOLECULAR BIOLOGY
Computational and structural biotechnology journal Pub Date : 2025-08-05 eCollection Date: 2025-01-01 DOI:10.1016/j.csbj.2025.07.058
Siqi Sun, Shweta Yadav, Mulini Pingili, Dan Chang, Jing Wang
{"title":"Estimating the effect of tissue- and blood-derived cell reference matrices on deconvolving bulk transcriptomic datasets.","authors":"Siqi Sun, Shweta Yadav, Mulini Pingili, Dan Chang, Jing Wang","doi":"10.1016/j.csbj.2025.07.058","DOIUrl":null,"url":null,"abstract":"<p><p>Cell deconvolution is a widely used method to characterize the composition of the mixed cell population in bulk transcriptomic datasets. While tissue- and blood-derived cell reference matrices (CRMs) are commonly used, their impact on deconvolution accuracy has yet to be systematically evaluated. In this study, we developed tissue- and blood-derived CRMs using single-cell RNA sequencing (scRNA-seq) data from inflammatory bowel disease (IBD). Three publicly available blood-derived CRMs (IRIS, LM22, and ImmunoStates) were incorporated for benchmarking. Deconvolution performance was evaluated using both public bulk transcriptomic datasets and simulated pseudobulk samples by goodness-of-fit and cell fractions correlation. Two infliximab-treated bulk datasets were used to identify treatment-related cell types. In addition, lung adenocarcinoma (LUAD) single-cell and bulk transcriptomic datasets were also used for deconvolution evaluation. We found tissue-derived CRMs consistently outperformed blood-derived CRMs in deconvolving bulk tissue transcriptomes, exhibiting higher goodness-of-fit and more accurate cellular proportion estimates, particularly for immune and stromal cells. They also revealed more treatment-related cell types. In contrast, all CRMs performed similarly when applied to blood bulk transcriptomics. These trends also were shown in the LUAD datasets. Our results emphasize the importance of selecting appropriate CRMs for cell deconvolution in bulk tissue transcriptomes, particularly in immunology and oncology. Such considerations can be extended to encompass other disease implications. The R package (DeconvRef) for building user-defined CRMs is available at https://github.com/alohasiqi/DeconvRef.</p>","PeriodicalId":10715,"journal":{"name":"Computational and structural biotechnology journal","volume":"27 ","pages":"3579-3588"},"PeriodicalIF":4.1000,"publicationDate":"2025-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12356330/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational and structural biotechnology journal","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1016/j.csbj.2025.07.058","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Cell deconvolution is a widely used method to characterize the composition of the mixed cell population in bulk transcriptomic datasets. While tissue- and blood-derived cell reference matrices (CRMs) are commonly used, their impact on deconvolution accuracy has yet to be systematically evaluated. In this study, we developed tissue- and blood-derived CRMs using single-cell RNA sequencing (scRNA-seq) data from inflammatory bowel disease (IBD). Three publicly available blood-derived CRMs (IRIS, LM22, and ImmunoStates) were incorporated for benchmarking. Deconvolution performance was evaluated using both public bulk transcriptomic datasets and simulated pseudobulk samples by goodness-of-fit and cell fractions correlation. Two infliximab-treated bulk datasets were used to identify treatment-related cell types. In addition, lung adenocarcinoma (LUAD) single-cell and bulk transcriptomic datasets were also used for deconvolution evaluation. We found tissue-derived CRMs consistently outperformed blood-derived CRMs in deconvolving bulk tissue transcriptomes, exhibiting higher goodness-of-fit and more accurate cellular proportion estimates, particularly for immune and stromal cells. They also revealed more treatment-related cell types. In contrast, all CRMs performed similarly when applied to blood bulk transcriptomics. These trends also were shown in the LUAD datasets. Our results emphasize the importance of selecting appropriate CRMs for cell deconvolution in bulk tissue transcriptomes, particularly in immunology and oncology. Such considerations can be extended to encompass other disease implications. The R package (DeconvRef) for building user-defined CRMs is available at https://github.com/alohasiqi/DeconvRef.

估计组织和血液来源的细胞参考基质对反卷积大量转录组数据集的影响。
细胞反褶积是一种广泛使用的方法来表征混合细胞群的组成在大量转录组数据集。虽然组织和血源性细胞参考基质(CRMs)被广泛使用,但它们对反褶积精度的影响尚未被系统地评估。在这项研究中,我们利用来自炎症性肠病(IBD)的单细胞RNA测序(scRNA-seq)数据开发了组织和血液来源的crm。三种公开可用的血液来源crm (IRIS, LM22和ImmunoStates)被纳入基准。通过拟合优度和细胞分数相关性来评估公共批量转录组数据集和模拟伪批量样本的反褶积性能。使用两个英夫利昔单抗处理的大量数据集来识别治疗相关的细胞类型。此外,肺腺癌(LUAD)单细胞和大量转录组数据集也用于反褶积评估。我们发现,组织来源的CRMs在解卷积大块组织转录组方面始终优于血液来源的CRMs,表现出更高的拟合优度和更准确的细胞比例估计,特别是对于免疫细胞和基质细胞。他们还发现了更多与治疗相关的细胞类型。相比之下,当应用于血容量转录组学时,所有的crm表现相似。这些趋势也显示在LUAD数据集中。我们的研究结果强调了在大量组织转录组中,特别是在免疫学和肿瘤学中,选择合适的细胞反褶积的crm的重要性。这种考虑可以扩展到包括其他疾病的影响。用于构建用户定义crm的R包(DeconvRef)可在https://github.com/alohasiqi/DeconvRef获得。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Computational and structural biotechnology journal
Computational and structural biotechnology journal Biochemistry, Genetics and Molecular Biology-Biophysics
CiteScore
9.30
自引率
3.30%
发文量
540
审稿时长
6 weeks
期刊介绍: Computational and Structural Biotechnology Journal (CSBJ) is an online gold open access journal publishing research articles and reviews after full peer review. All articles are published, without barriers to access, immediately upon acceptance. The journal places a strong emphasis on functional and mechanistic understanding of how molecular components in a biological process work together through the application of computational methods. Structural data may provide such insights, but they are not a pre-requisite for publication in the journal. Specific areas of interest include, but are not limited to: Structure and function of proteins, nucleic acids and other macromolecules Structure and function of multi-component complexes Protein folding, processing and degradation Enzymology Computational and structural studies of plant systems Microbial Informatics Genomics Proteomics Metabolomics Algorithms and Hypothesis in Bioinformatics Mathematical and Theoretical Biology Computational Chemistry and Drug Discovery Microscopy and Molecular Imaging Nanotechnology Systems and Synthetic Biology
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信