{"title":"TEAM: A multiple testing algorithm on the aggregation tree for flow cytometry analysis","authors":"J. Pura, Xuechan Li, Cliburn Chan, Jichun Xie","doi":"10.1214/22-aoas1645","DOIUrl":null,"url":null,"abstract":"In immunology studies, flow cytometry is a commonly used multivariate single-cell assay. One key goal in flow cytometry analysis is to detect the immune cells responsive to certain stimuli. Statistically, this problem can be translated into comparing two protein expression probability density functions (pdfs) before and after the stimulus; the goal is to pinpoint the regions where these two pdfs di er. Further screening of these di erential regions can be performed to identify enriched sets of responsive cells. In this paper, we model identifying di erential density regions as a multiple testing problem. First, we partition the sample space into small bins. In each bin, we form a hypothesis to test the existence of di erential pdfs. Second, we develop a novel multiple testing method, called TEAM (Testing on the Aggregation tree Method), to identify those bins that harbor di erential pdfs while controlling the false discovery rate (FDR) under the desired level. TEAM embeds the testing procedure into an aggregation tree to test from fineto coarse-resolution. The procedure achieves the statistical goal of pinpointing density di erences to the smallest possible regions. TEAM is computationally e cient, capable of analyzing large flow cytometry data sets in much shorter time compared with competing methods. We applied TEAM and competing methods on a flow cytometry data set to identify T cells responsive to the cytomegalovirus (CMV)-pp65 antigen stimulation. With additional downstream screening, TEAM successfully identified enriched sets containing monofunctional, bifunctional, and polyfunctional T cells. Competing methods either did not finish in a reasonable time frame or provided less interpretable results. Numerical simulations and theoretical justifications demonstrate that TEAM has asymptotically valid, powerful, and robust performance. Overall, TEAM is a computationally e cient and statistically powerful algorithm that can yield meaningful biological insights in flow cytometry studies.","PeriodicalId":188068,"journal":{"name":"The Annals of Applied Statistics","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Annals of Applied Statistics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1214/22-aoas1645","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
TEAM: A multiple testing algorithm on the aggregation tree for flow cytometry analysis
In immunology studies, flow cytometry is a commonly used multivariate single-cell assay. One key goal in flow cytometry analysis is to detect the immune cells responsive to certain stimuli. Statistically, this problem can be translated into comparing two protein expression probability density functions (pdfs) before and after the stimulus; the goal is to pinpoint the regions where these two pdfs di er. Further screening of these di erential regions can be performed to identify enriched sets of responsive cells. In this paper, we model identifying di erential density regions as a multiple testing problem. First, we partition the sample space into small bins. In each bin, we form a hypothesis to test the existence of di erential pdfs. Second, we develop a novel multiple testing method, called TEAM (Testing on the Aggregation tree Method), to identify those bins that harbor di erential pdfs while controlling the false discovery rate (FDR) under the desired level. TEAM embeds the testing procedure into an aggregation tree to test from fineto coarse-resolution. The procedure achieves the statistical goal of pinpointing density di erences to the smallest possible regions. TEAM is computationally e cient, capable of analyzing large flow cytometry data sets in much shorter time compared with competing methods. We applied TEAM and competing methods on a flow cytometry data set to identify T cells responsive to the cytomegalovirus (CMV)-pp65 antigen stimulation. With additional downstream screening, TEAM successfully identified enriched sets containing monofunctional, bifunctional, and polyfunctional T cells. Competing methods either did not finish in a reasonable time frame or provided less interpretable results. Numerical simulations and theoretical justifications demonstrate that TEAM has asymptotically valid, powerful, and robust performance. Overall, TEAM is a computationally e cient and statistically powerful algorithm that can yield meaningful biological insights in flow cytometry studies.