Shruti Eswar, Zachary T Koenig, Amanda R Tursi, José Cobeña-Reyes, Tamara Tilburgs, Sandra Andorf
{"title":"CytoBatchFlagR: A Comprehensive Framework to Objectively Assess High-Parameter Cytometry Data for Batch Effects.","authors":"Shruti Eswar, Zachary T Koenig, Amanda R Tursi, José Cobeña-Reyes, Tamara Tilburgs, Sandra Andorf","doi":"10.1002/cyto.a.70024","DOIUrl":null,"url":null,"abstract":"<p><p>Rapid advancements in mass and flow cytometry technologies have allowed researchers to generate and analyze high-dimensional single cell datasets, often utilizing upwards of 40 protein markers. Such high-parameter cytometry is increasingly used in longitudinal immunological studies, but technical variations across experimental batch runs can confound biological signals. To mitigate the impact on downstream analyses, many studies include reference control samples in every run, and several approaches exist to adjust for batch effects. However, tools that objectively identify problematic batches and markers present within a dataset are limited. We introduce CytoBatchFlagR, a comprehensive and interpretable tool designed to flag batch-related problems at the marker and cell cluster level based on robust statistical evaluations. Batch and marker variations are assessed based on median signal intensities of negative and positive cell populations and positive cell frequencies, along with Earth Mover's Distance (EMD) of signal intensity distributions. Additionally, CytoBatchFlagR identifies cell type specific batch problems via unsupervised clustering. The tool is suitable for mass and flow cytometry datasets where it objectively detects distinct types of batch issues. We developed and tested CytoBatchFlagR using three cytometry datasets to demonstrate its utility and performance. We also demonstrated CytoBatchFlagR's effectiveness in assessing datasets that include or lack reference controls. CytoBatchFlagR improves quality control by enabling objective identification of technical variations that may impact downstream analysis in high-parameter cytometry data. The tool uses a series of complementary metrics to identify potential batch-related problems at the marker and cell population level and presents the results through interpretable visualizations. This allows users to make informed decisions about whether to apply batch correction or exclude specific batches or markers from downstream analyses. CytoBatchFlagR is freely available as R scripts, with documentation and a tutorial to help users get started.</p>","PeriodicalId":11068,"journal":{"name":"Cytometry Part A","volume":" ","pages":""},"PeriodicalIF":2.1000,"publicationDate":"2026-04-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cytometry Part A","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1002/cyto.a.70024","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0
Abstract
Rapid advancements in mass and flow cytometry technologies have allowed researchers to generate and analyze high-dimensional single cell datasets, often utilizing upwards of 40 protein markers. Such high-parameter cytometry is increasingly used in longitudinal immunological studies, but technical variations across experimental batch runs can confound biological signals. To mitigate the impact on downstream analyses, many studies include reference control samples in every run, and several approaches exist to adjust for batch effects. However, tools that objectively identify problematic batches and markers present within a dataset are limited. We introduce CytoBatchFlagR, a comprehensive and interpretable tool designed to flag batch-related problems at the marker and cell cluster level based on robust statistical evaluations. Batch and marker variations are assessed based on median signal intensities of negative and positive cell populations and positive cell frequencies, along with Earth Mover's Distance (EMD) of signal intensity distributions. Additionally, CytoBatchFlagR identifies cell type specific batch problems via unsupervised clustering. The tool is suitable for mass and flow cytometry datasets where it objectively detects distinct types of batch issues. We developed and tested CytoBatchFlagR using three cytometry datasets to demonstrate its utility and performance. We also demonstrated CytoBatchFlagR's effectiveness in assessing datasets that include or lack reference controls. CytoBatchFlagR improves quality control by enabling objective identification of technical variations that may impact downstream analysis in high-parameter cytometry data. The tool uses a series of complementary metrics to identify potential batch-related problems at the marker and cell population level and presents the results through interpretable visualizations. This allows users to make informed decisions about whether to apply batch correction or exclude specific batches or markers from downstream analyses. CytoBatchFlagR is freely available as R scripts, with documentation and a tutorial to help users get started.
期刊介绍:
Cytometry Part A, the journal of quantitative single-cell analysis, features original research reports and reviews of innovative scientific studies employing quantitative single-cell measurement, separation, manipulation, and modeling techniques, as well as original articles on mechanisms of molecular and cellular functions obtained by cytometry techniques.
The journal welcomes submissions from multiple research fields that fully embrace the study of the cytome:
Biomedical Instrumentation Engineering
Biophotonics
Bioinformatics
Cell Biology
Computational Biology
Data Science
Immunology
Parasitology
Microbiology
Neuroscience
Cancer
Stem Cells
Tissue Regeneration.