Estelle Medous, C. Goga, A. Ruiz-Gazen, J. Beaumont, A. Dessertaine, Pauline Puech
{"title":"Many-to-One indirect sampling with application to the French postal traffic estimation","authors":"Estelle Medous, C. Goga, A. Ruiz-Gazen, J. Beaumont, A. Dessertaine, Pauline Puech","doi":"10.1214/22-aoas1653","DOIUrl":"https://doi.org/10.1214/22-aoas1653","url":null,"abstract":"In social and economic surveys, it can be difficult to directly reach units of the target population, and indirect sampling is often advocated to solve this issue. In indirect sampling, the sample is drawn from a frame population that is linked to the target population, and estimation of target population parameters is typically achieved through the Generalized Weight Share Method (GWSM). This method provides a weight, for every unit of the target population, that depends on the one hand, on the sampling weights in the frame population and, on the other hand, on the link weights between the frame population and the target population. In the present study, we focus on the situation in which the units from the frame population are linked to one and only one unit from the target population (Many-to-One case). This situation is encountered at the French postal service where addresses are sampled instead of postman rounds. We aim at understanding of the impact of the link weights on the efficiency of the GWSM estimators. We derive variance expressions and optimality results for a large class of sampling designs. Moreover, we note that the Many-to-One case can lead to too many links to observe. We alleviate the problem by introducing an intermediate population and double indirect sampling. The question of the loss of precision in this situation is discussed in detail through theoretical results and simulations. These findings help to explain the loss of precision of double GWSM estimators observed recently at the French postal service.","PeriodicalId":188068,"journal":{"name":"The Annals of Applied Statistics","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127903140","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Wang, J. Ning, Ying Xu, Y. Shih, Yu Shen, Liang Li
{"title":"An extension of estimating equations to model longitudinal medical cost trajectory with Medicare claims data linked to SEER cancer registry","authors":"S. Wang, J. Ning, Ying Xu, Y. Shih, Yu Shen, Liang Li","doi":"10.1214/22-aoas1659","DOIUrl":"https://doi.org/10.1214/22-aoas1659","url":null,"abstract":"","PeriodicalId":188068,"journal":{"name":"The Annals of Applied Statistics","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131667044","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Sequential sampling in prospective observational studies","authors":"Mary M Ryan, D. Gillen","doi":"10.1214/22-aoas1620","DOIUrl":"https://doi.org/10.1214/22-aoas1620","url":null,"abstract":"","PeriodicalId":188068,"journal":{"name":"The Annals of Applied Statistics","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134085590","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rui Duan, Jiayi Tong, Lifeng Lin, Lisa Levine, Mary Sammel, Joel Stoddard, Tianjing Li, Christopher H Schmid, Haitao Chu, Yong Chen
{"title":"PALM: Patient-centered treatment ranking via large-scale multivariate network meta-analysis","authors":"Rui Duan, Jiayi Tong, Lifeng Lin, Lisa Levine, Mary Sammel, Joel Stoddard, Tianjing Li, Christopher H Schmid, Haitao Chu, Yong Chen","doi":"10.1214/22-aoas1652","DOIUrl":"https://doi.org/10.1214/22-aoas1652","url":null,"abstract":"The growing number of available treatment options has led to urgent needs for reliable answers when choosing the best course of treatment for a patient. As it is often infeasible to compare a large number of treatments in a single randomized controlled trial, multivariate network meta-analyses (NMAs) are used to synthesize evidence from trials of a subset of the treatments, where both efficacy and safety related outcomes are considered simultaneously. However, these large-scale multiple-outcome NMAs have created challenges to existing methods due to the increasing complexity of the unknown correlations between outcomes and treatment comparisons. In this paper, we proposed a new framework for PAtient-centered treatment ranking via Large-scale Multivariate network meta-analysis, termed as PALM, which includes a parsimonious modeling approach, a fast algorithm for parameter estimation and inference, a novel visualization tool for presenting multivariate outcomes, termed as the origami plot, as well as personalized treatment ranking procedures taking into account the individual’s considerations on multiple outcomes. In application to an NMA that compares 14 treatment options for labor induction, we provided a comprehensive illustration of the proposed framework and demonstrated its computational efficiency and practicality, and we obtained new insights and evidence to support patient-centered clinical decision making.","PeriodicalId":188068,"journal":{"name":"The Annals of Applied Statistics","volume":"175 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136173522","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Subject-specific Dirichlet-multinomial regression for multi-district microbiota data analysis","authors":"M. Pedone, A. Amedei, F. Stingo","doi":"10.1214/22-aoas1641","DOIUrl":"https://doi.org/10.1214/22-aoas1641","url":null,"abstract":"","PeriodicalId":188068,"journal":{"name":"The Annals of Applied Statistics","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128972552","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Control charts for dynamic process monitoring with an application to air pollution surveillance","authors":"Xiulin Xie, P. Qiu","doi":"10.1214/22-aoas1615","DOIUrl":"https://doi.org/10.1214/22-aoas1615","url":null,"abstract":"Air pollution is a major global public health risk factor. Among all air pollutants, PM 2 . 5 is especially harmful. It has been well demonstrated that chronic exposure to PM 2 . 5 can cause many health problems, including asthma, lung cancer and cardiovascular diseases. To tackle problems caused by air pollution, governments have put a huge amount of resources to improve air quality and reduce the impact of air pollution on public health. In this effort, it is extremely important to develop an air pollution surveillance system to constantly monitor the air quality over time, and give a signal promptly once the air quality is found to deteriorate so that a timely government intervention can be implemented. To monitor a sequential process, a major statistical tool is the statistical process control (SPC) chart. However, traditional SPC charts are based on the assumptions that process observations at different time points are independent and identically distributed. These assumptions are rarely valid in environmental data because seasonality and serial correlation are common in such data. To overcome this difficulty, we suggest a new control chart in this paper, which can properly accommodate dynamic temporal pattern and serial correlation in a sequential process. Thus, it can be used for effective air pollution surveillance. This method is demonstrated by an application to monitor the daily average PM 2 . 5 levels in Beijing, and shown to be effective and reliable in detecting the increase of PM 2 . 5 levels.","PeriodicalId":188068,"journal":{"name":"The Annals of Applied Statistics","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129205101","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Generalized theme dictionary models for association pattern discovery","authors":"Yang Yang, Ke Deng","doi":"10.1214/22-aoas1626","DOIUrl":"https://doi.org/10.1214/22-aoas1626","url":null,"abstract":"","PeriodicalId":188068,"journal":{"name":"The Annals of Applied Statistics","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134364271","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"TEAM: A multiple testing algorithm on the aggregation tree for flow cytometry analysis","authors":"J. Pura, Xuechan Li, Cliburn Chan, Jichun Xie","doi":"10.1214/22-aoas1645","DOIUrl":"https://doi.org/10.1214/22-aoas1645","url":null,"abstract":"In immunology studies, flow cytometry is a commonly used multivariate single-cell assay. One key goal in flow cytometry analysis is to detect the immune cells responsive to certain stimuli. Statistically, this problem can be translated into comparing two protein expression probability density functions (pdfs) before and after the stimulus; the goal is to pinpoint the regions where these two pdfs di er. Further screening of these di erential regions can be performed to identify enriched sets of responsive cells. In this paper, we model identifying di erential density regions as a multiple testing problem. First, we partition the sample space into small bins. In each bin, we form a hypothesis to test the existence of di erential pdfs. Second, we develop a novel multiple testing method, called TEAM (Testing on the Aggregation tree Method), to identify those bins that harbor di erential pdfs while controlling the false discovery rate (FDR) under the desired level. TEAM embeds the testing procedure into an aggregation tree to test from fineto coarse-resolution. The procedure achieves the statistical goal of pinpointing density di erences to the smallest possible regions. TEAM is computationally e cient, capable of analyzing large flow cytometry data sets in much shorter time compared with competing methods. We applied TEAM and competing methods on a flow cytometry data set to identify T cells responsive to the cytomegalovirus (CMV)-pp65 antigen stimulation. With additional downstream screening, TEAM successfully identified enriched sets containing monofunctional, bifunctional, and polyfunctional T cells. Competing methods either did not finish in a reasonable time frame or provided less interpretable results. Numerical simulations and theoretical justifications demonstrate that TEAM has asymptotically valid, powerful, and robust performance. Overall, TEAM is a computationally e cient and statistically powerful algorithm that can yield meaningful biological insights in flow cytometry studies.","PeriodicalId":188068,"journal":{"name":"The Annals of Applied Statistics","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127558380","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Model selection uncertainty and stability in beta regression models: A study of bootstrap-based model averaging with an empirical application to clickstream data","authors":"Corban Allenbrand, Ben Sherwood","doi":"10.1214/22-aoas1647","DOIUrl":"https://doi.org/10.1214/22-aoas1647","url":null,"abstract":"Statistical model development is a central feature of many scientific investigations with a vast methodological landscape. However, uncertainty in the model development process has received less attention and is frequently resolved non-rigorously through beliefs about generalizability, practical usefulness, and computational ease. This is particularly problematic in settings of abundant data, such as clickstream data, as model selection routinely admits multiple models and imposes a source of uncertainty, unacknowledged and unknown by many, on all post-selection conclusions. Regression models based on the beta distribution are class of non-linear models, attractive because of their great flexibility and potential explanatory power, but have not been investigated from the standpoint of multi-model uncertainty and model averaging. For this reason, a formalized tool that can combine model selection uncertainty and beta regression modeling is presented in this work. The tool combines bootstrap model averaging, model selection, and asymptotic theory to yield a procedure that can perform joint modeling of the mean and precision parameters, capture sources of variability in the data, and achieve more accurate claims of estimate precision, variable importance, gen-eralization performance, and model stability. Practical utility of the tool is demonstrated through a study of model selection consistency and variable importance in average exit and bounce rate statistical models. This work emphasizes the necessity of a departure from the all-too-common practice of ignoring model selection uncertainty and introduces an accessible technique to handle frequently neglected aspects of the modeling pipeline.","PeriodicalId":188068,"journal":{"name":"The Annals of Applied Statistics","volume":"153 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114157175","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}