{"title":"Batch correction methods used in single-cell RNA sequencing analyses are often poorly calibrated","authors":"Sindri Emmanúel Antonsson, Páll Melsted","doi":"10.1101/gr.279886.124","DOIUrl":null,"url":null,"abstract":"As the number of experiments that employ single-cell RNA sequencing (scRNA-seq) grows, it opens up the possibility of combining results across experiments or processing cells from the same experiment assayed in separate sequencing runs. The gain in the number of cells that can be compared comes at the cost of batch effects that may be present. Several methods have been proposed to combat this for scRNA-seq data sets. We compare eight widely used methods used for batch correction of scRNA-seq data sets. We present a novel approach to measure the degree to which the methods alter the data in the process of batch correction, both at the fine scale, comparing distances between cells, as well as measuring effects observed across clusters of cells. We demonstrate that many of the published methods are poorly calibrated in the sense that the process of correction creates measurable artifacts in the data. In particular, MNN, SCVI, and LIGER perform poorly in our tests, often altering the data considerably. Batch correction with Combat, ComBat-seq, BBKNN, and Seurat introduces artifacts that could be detected in our setup. However, we find that Harmony is the only method that consistently performs well in all the testing methodology we present. Therefore, Harmony is the only method we recommend using when performing batch correction of scRNA-seq data.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"21 1","pages":""},"PeriodicalIF":5.5000,"publicationDate":"2025-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Genome research","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1101/gr.279886.124","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
As the number of experiments that employ single-cell RNA sequencing (scRNA-seq) grows, it opens up the possibility of combining results across experiments or processing cells from the same experiment assayed in separate sequencing runs. The gain in the number of cells that can be compared comes at the cost of batch effects that may be present. Several methods have been proposed to combat this for scRNA-seq data sets. We compare eight widely used methods used for batch correction of scRNA-seq data sets. We present a novel approach to measure the degree to which the methods alter the data in the process of batch correction, both at the fine scale, comparing distances between cells, as well as measuring effects observed across clusters of cells. We demonstrate that many of the published methods are poorly calibrated in the sense that the process of correction creates measurable artifacts in the data. In particular, MNN, SCVI, and LIGER perform poorly in our tests, often altering the data considerably. Batch correction with Combat, ComBat-seq, BBKNN, and Seurat introduces artifacts that could be detected in our setup. However, we find that Harmony is the only method that consistently performs well in all the testing methodology we present. Therefore, Harmony is the only method we recommend using when performing batch correction of scRNA-seq data.
期刊介绍:
Launched in 1995, Genome Research is an international, continuously published, peer-reviewed journal that focuses on research that provides novel insights into the genome biology of all organisms, including advances in genomic medicine.
Among the topics considered by the journal are genome structure and function, comparative genomics, molecular evolution, genome-scale quantitative and population genetics, proteomics, epigenomics, and systems biology. The journal also features exciting gene discoveries and reports of cutting-edge computational biology and high-throughput methodologies.
New data in these areas are published as research papers, or methods and resource reports that provide novel information on technologies or tools that will be of interest to a broad readership. Complete data sets are presented electronically on the journal''s web site where appropriate. The journal also provides Reviews, Perspectives, and Insight/Outlook articles, which present commentary on the latest advances published both here and elsewhere, placing such progress in its broader biological context.