A benchmarking study of copy number variation inference methods using single-cell RNA-sequencing data.

IF 5 4区医学 Q1 MEDICINE, RESEARCH & EXPERIMENTAL

Precision Clinical Medicine Pub Date : 2025-06-04 eCollection Date: 2025-06-01 DOI:10.1093/pcmedi/pbaf011

Xin Chen, Li Tai Fang, Zhong Chen, Wanqiu Chen, Hongjin Wu, Bin Zhu, Malcolm Moos, Andrew Farmer, Xiaowen Zhang, Wei Xiong, Shusheng Gong, Wendell Jones, Christopher E Mason, Shixiu Wu, Chunlin Xiao, Charles Wang

{"title":"A benchmarking study of copy number variation inference methods using single-cell RNA-sequencing data.","authors":"Xin Chen, Li Tai Fang, Zhong Chen, Wanqiu Chen, Hongjin Wu, Bin Zhu, Malcolm Moos, Andrew Farmer, Xiaowen Zhang, Wei Xiong, Shusheng Gong, Wendell Jones, Christopher E Mason, Shixiu Wu, Chunlin Xiao, Charles Wang","doi":"10.1093/pcmedi/pbaf011","DOIUrl":null,"url":null,"abstract":"Background: Single-cell RNA-sequencing (scRNA-seq) has emerged as a powerful tool for cancer research, enabling in-depth characterization of tumor heterogeneity at the single-cell level. Recently, several scRNA-seq copy number variation (scCNV) inference methods have been developed, expanding the application of scRNA-seq to study genetic heterogeneity in cancer using transcriptomic data. However, the fidelity of these methods has not been investigated systematically.Methods: We benchmarked five commonly used scCNV inference methods: HoneyBADGER, CopyKAT, CaSpER, inferCNV, and sciCNV. We evaluated their performance across four different scRNA-seq platforms using data from our previous multicenter study. We evaluated scCNV performance further using scRNA-seq datasets derived from mixed samples consisting of five human lung adenocarcinoma cell lines and also sequenced tissues from a small cell lung cancer patient and used the data to validate our findings with a clinical scRNA-seq dataset.Results: We found that the sensitivity and specificity of the five scCNV inference methods varied, depending on the selection of reference data, sequencing depth, and read length. CopyKAT and CaSpER outperformed other methods overall, while inferCNV, sciCNV, and CopyKAT performed better than other methods in subclone identification. We found that batch effects significantly affected the performance of subclone identification in mixed datasets in most methods we tested.Conclusion: Our benchmarking study revealed the strengths and weaknesses of each of these scCNV inference methods and provided guidance for selecting the optimal CNV inference method using scRNA-seq data.","PeriodicalId":33608,"journal":{"name":"Precision Clinical Medicine","volume":"8 2","pages":"pbaf011"},"PeriodicalIF":5.0000,"publicationDate":"2025-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12204187/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Precision Clinical Medicine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1093/pcmedi/pbaf011","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/6/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"MEDICINE, RESEARCH & EXPERIMENTAL","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Single-cell RNA-sequencing (scRNA-seq) has emerged as a powerful tool for cancer research, enabling in-depth characterization of tumor heterogeneity at the single-cell level. Recently, several scRNA-seq copy number variation (scCNV) inference methods have been developed, expanding the application of scRNA-seq to study genetic heterogeneity in cancer using transcriptomic data. However, the fidelity of these methods has not been investigated systematically.

Methods: We benchmarked five commonly used scCNV inference methods: HoneyBADGER, CopyKAT, CaSpER, inferCNV, and sciCNV. We evaluated their performance across four different scRNA-seq platforms using data from our previous multicenter study. We evaluated scCNV performance further using scRNA-seq datasets derived from mixed samples consisting of five human lung adenocarcinoma cell lines and also sequenced tissues from a small cell lung cancer patient and used the data to validate our findings with a clinical scRNA-seq dataset.

Results: We found that the sensitivity and specificity of the five scCNV inference methods varied, depending on the selection of reference data, sequencing depth, and read length. CopyKAT and CaSpER outperformed other methods overall, while inferCNV, sciCNV, and CopyKAT performed better than other methods in subclone identification. We found that batch effects significantly affected the performance of subclone identification in mixed datasets in most methods we tested.

Conclusion: Our benchmarking study revealed the strengths and weaknesses of each of these scCNV inference methods and provided guidance for selecting the optimal CNV inference method using scRNA-seq data.

Abstract Image

查看原文本刊更多论文

使用单细胞rna测序数据的拷贝数变异推断方法的基准研究。

背景：单细胞rna测序（scRNA-seq）已经成为癌症研究的有力工具，可以在单细胞水平上深入表征肿瘤异质性。近年来，一些scRNA-seq拷贝数变异（scCNV）推断方法的发展，扩大了scRNA-seq在利用转录组学数据研究癌症遗传异质性方面的应用。然而，这些方法的保真度尚未得到系统的研究。方法：我们对五种常用的scCNV推理方法：HoneyBADGER、CopyKAT、CaSpER、intercnv和scinv进行了基准测试。我们使用之前多中心研究的数据评估了它们在四种不同scRNA-seq平台上的表现。我们使用来自五种人肺腺癌细胞系混合样本的scRNA-seq数据集进一步评估了scCNV的性能，并对来自小细胞肺癌患者的组织进行了测序，并使用临床scRNA-seq数据集验证了我们的发现。结果：我们发现5种scCNV推断方法的敏感性和特异性不同，这取决于参考数据的选择、测序深度和读取长度。总体而言，CopyKAT和CaSpER的亚克隆鉴定效果优于其他方法，而intercnv、sciicnv和CopyKAT的亚克隆鉴定效果优于其他方法。我们发现，在我们测试的大多数方法中，批处理效应显著影响混合数据集的亚克隆鉴定性能。结论：我们的对标研究揭示了每种scCNV推断方法的优缺点，并为使用scRNA-seq数据选择最佳的CNV推断方法提供了指导。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Precision Clinical Medicine MEDICINE, RESEARCH & EXPERIMENTAL-

CiteScore

10.80

自引率

0.00%

发文量

审稿时长

5 weeks

期刊介绍： Precision Clinical Medicine (PCM) is an international, peer-reviewed, open access journal that provides timely publication of original research articles, case reports, reviews, editorials, and perspectives across the spectrum of precision medicine. The journal's mission is to deliver new theories, methods, and evidence that enhance disease diagnosis, treatment, prevention, and prognosis, thereby establishing a vital communication platform for clinicians and researchers that has the potential to transform medical practice. PCM encompasses all facets of precision medicine, which involves personalized approaches to diagnosis, treatment, and prevention, tailored to individual patients or patient subgroups based on their unique genetic, phenotypic, or psychosocial profiles. The clinical conditions addressed by the journal include a wide range of areas such as cancer, infectious diseases, inherited diseases, complex diseases, and rare diseases.