The Annals of Applied Statistics最新文献_第10页

Many-to-One indirect sampling with application to the French postal traffic estimation 多对一间接抽样及其在法国邮政流量估计中的应用

The Annals of Applied Statistics Pub Date : 2023-03-01 DOI: 10.1214/22-aoas1653

Estelle Medous, C. Goga, A. Ruiz-Gazen, J. Beaumont, A. Dessertaine, Pauline Puech

{"title":"Many-to-One indirect sampling with application to the French postal traffic estimation","authors":"Estelle Medous, C. Goga, A. Ruiz-Gazen, J. Beaumont, A. Dessertaine, Pauline Puech","doi":"10.1214/22-aoas1653","DOIUrl":"https://doi.org/10.1214/22-aoas1653","url":null,"abstract":"In social and economic surveys, it can be difﬁcult to directly reach units of the target population, and indirect sampling is often advocated to solve this issue. In indirect sampling, the sample is drawn from a frame population that is linked to the target population, and estimation of target population parameters is typically achieved through the Generalized Weight Share Method (GWSM). This method provides a weight, for every unit of the target population, that depends on the one hand, on the sampling weights in the frame population and, on the other hand, on the link weights between the frame population and the target population. In the present study, we focus on the situation in which the units from the frame population are linked to one and only one unit from the target population (Many-to-One case). This situation is encountered at the French postal service where addresses are sampled instead of postman rounds. We aim at understanding of the impact of the link weights on the efﬁciency of the GWSM estimators. We derive variance expressions and optimality results for a large class of sampling designs. Moreover, we note that the Many-to-One case can lead to too many links to observe. We alleviate the problem by introducing an intermediate population and double indirect sampling. The question of the loss of precision in this situation is discussed in detail through theoretical results and simulations. These ﬁndings help to explain the loss of precision of double GWSM estimators observed recently at the French postal service.","PeriodicalId":188068,"journal":{"name":"The Annals of Applied Statistics","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127903140","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An extension of estimating equations to model longitudinal medical cost trajectory with Medicare claims data linked to SEER cancer registry 估计方程的扩展，以模拟纵向医疗成本轨迹与医疗保险索赔数据与SEER癌症登记

The Annals of Applied Statistics Pub Date : 2023-03-01 DOI: 10.1214/22-aoas1659

S. Wang, J. Ning, Ying Xu, Y. Shih, Yu Shen, Liang Li

引用次数: 0

Sequential sampling in prospective observational studies 前瞻性观察性研究中的顺序抽样

The Annals of Applied Statistics Pub Date : 2023-03-01 DOI: 10.1214/22-aoas1620

Mary M Ryan, D. Gillen

引用次数: 0

PALM: Patient-centered treatment ranking via large-scale multivariate network meta-analysis PALM:通过大规模多变量网络meta分析进行以患者为中心的治疗排序

The Annals of Applied Statistics Pub Date : 2023-03-01 DOI: 10.1214/22-aoas1652

Rui Duan, Jiayi Tong, Lifeng Lin, Lisa Levine, Mary Sammel, Joel Stoddard, Tianjing Li, Christopher H Schmid, Haitao Chu, Yong Chen

{"title":"PALM: Patient-centered treatment ranking via large-scale multivariate network meta-analysis","authors":"Rui Duan, Jiayi Tong, Lifeng Lin, Lisa Levine, Mary Sammel, Joel Stoddard, Tianjing Li, Christopher H Schmid, Haitao Chu, Yong Chen","doi":"10.1214/22-aoas1652","DOIUrl":"https://doi.org/10.1214/22-aoas1652","url":null,"abstract":"The growing number of available treatment options has led to urgent needs for reliable answers when choosing the best course of treatment for a patient. As it is often infeasible to compare a large number of treatments in a single randomized controlled trial, multivariate network meta-analyses (NMAs) are used to synthesize evidence from trials of a subset of the treatments, where both efficacy and safety related outcomes are considered simultaneously. However, these large-scale multiple-outcome NMAs have created challenges to existing methods due to the increasing complexity of the unknown correlations between outcomes and treatment comparisons. In this paper, we proposed a new framework for PAtient-centered treatment ranking via Large-scale Multivariate network meta-analysis, termed as PALM, which includes a parsimonious modeling approach, a fast algorithm for parameter estimation and inference, a novel visualization tool for presenting multivariate outcomes, termed as the origami plot, as well as personalized treatment ranking procedures taking into account the individual’s considerations on multiple outcomes. In application to an NMA that compares 14 treatment options for labor induction, we provided a comprehensive illustration of the proposed framework and demonstrated its computational efficiency and practicality, and we obtained new insights and evidence to support patient-centered clinical decision making.","PeriodicalId":188068,"journal":{"name":"The Annals of Applied Statistics","volume":"175 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136173522","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Subject-specific Dirichlet-multinomial regression for multi-district microbiota data analysis 多区域微生物群数据分析的主题dirichlet -多项式回归

The Annals of Applied Statistics Pub Date : 2023-03-01 DOI: 10.1214/22-aoas1641

M. Pedone, A. Amedei, F. Stingo

引用次数: 0

Control charts for dynamic process monitoring with an application to air pollution surveillance 动态过程监测的控制图及其在空气污染监测中的应用

The Annals of Applied Statistics Pub Date : 2023-03-01 DOI: 10.1214/22-aoas1615

Xiulin Xie, P. Qiu

{"title":"Control charts for dynamic process monitoring with an application to air pollution surveillance","authors":"Xiulin Xie, P. Qiu","doi":"10.1214/22-aoas1615","DOIUrl":"https://doi.org/10.1214/22-aoas1615","url":null,"abstract":"Air pollution is a major global public health risk factor. Among all air pollutants, PM 2 . 5 is especially harmful. It has been well demonstrated that chronic exposure to PM 2 . 5 can cause many health problems, including asthma, lung cancer and cardiovascular diseases. To tackle problems caused by air pollution, governments have put a huge amount of resources to improve air quality and reduce the impact of air pollution on public health. In this effort, it is extremely important to develop an air pollution surveillance system to constantly monitor the air quality over time, and give a signal promptly once the air quality is found to deteriorate so that a timely government intervention can be implemented. To monitor a sequential process, a major statistical tool is the statistical process control (SPC) chart. However, traditional SPC charts are based on the assumptions that process observations at different time points are independent and identically distributed. These assumptions are rarely valid in environmental data because seasonality and serial correlation are common in such data. To overcome this difﬁculty, we suggest a new control chart in this paper, which can properly accommodate dynamic temporal pattern and serial correlation in a sequential process. Thus, it can be used for effective air pollution surveillance. This method is demonstrated by an application to monitor the daily average PM 2 . 5 levels in Beijing, and shown to be effective and reliable in detecting the increase of PM 2 . 5 levels.","PeriodicalId":188068,"journal":{"name":"The Annals of Applied Statistics","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129205101","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Generalized theme dictionary models for association pattern discovery 用于关联模式发现的广义主题字典模型

The Annals of Applied Statistics Pub Date : 2023-03-01 DOI: 10.1214/22-aoas1626

Yang Yang, Ke Deng

引用次数: 0

Bayesian clustering of spatial functional data with application to a human mobility study during COVID-19 空间功能数据的贝叶斯聚类及其在COVID-19期间人类流动性研究中的应用

The Annals of Applied Statistics Pub Date : 2023-03-01 DOI: 10.1214/22-aoas1643

Bohai Zhang, H. Sang, Z. Luo, Hui Huang

{"title":"Bayesian clustering of spatial functional data with application to a human mobility study during COVID-19","authors":"Bohai Zhang, H. Sang, Z. Luo, Hui Huang","doi":"10.1214/22-aoas1643","DOIUrl":"https://doi.org/10.1214/22-aoas1643","url":null,"abstract":"The coronavirus (COVID-19) global pandemic has made a significant impact on people's social activities. Cell phone mobility data provide unique and rich information on studying this impact. The motivating dataset of this study is the daily leaving-home index data at Harris County in Texas provided by SafeGraph. To study changes in daily leaving-home index and how they relate to public policy and sociodemographic variables, we propose a new Bayesian wavelet model for modeling and clustering spatial functional data, where domain partitioning is achieved by operating on the spanning trees. The resulting clusters can have arbitrary shapes and are spatially contiguous in the input domain. An efficient tailored reversible jump Markov chain Monte Carlo algorithm is proposed to implement the model. The method is applied to the spatial functional data of the daily percentages of people who left home. We focus on the time period covering both lockdown and phased reopening in Texas during the COVID-19 pandemic and study the changing behaviors of those functional curves. By linking the clustering results with the sociodemographic information, we identify several covariates of census blocks that have a noticeable impact on the clustering patterns of people's mobility behaviors. © Institute of Mathematical Statistics, 2023.","PeriodicalId":188068,"journal":{"name":"The Annals of Applied Statistics","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116039463","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

TEAM: A multiple testing algorithm on the aggregation tree for flow cytometry analysis

The Annals of Applied Statistics Pub Date : 2023-03-01 DOI: 10.1214/22-aoas1645

J. Pura, Xuechan Li, Cliburn Chan, Jichun Xie

{"title":"TEAM: A multiple testing algorithm on the aggregation tree for flow cytometry analysis","authors":"J. Pura, Xuechan Li, Cliburn Chan, Jichun Xie","doi":"10.1214/22-aoas1645","DOIUrl":"https://doi.org/10.1214/22-aoas1645","url":null,"abstract":"In immunology studies, flow cytometry is a commonly used multivariate single-cell assay. One key goal in flow cytometry analysis is to detect the immune cells responsive to certain stimuli. Statistically, this problem can be translated into comparing two protein expression probability density functions (pdfs) before and after the stimulus; the goal is to pinpoint the regions where these two pdfs di er. Further screening of these di erential regions can be performed to identify enriched sets of responsive cells. In this paper, we model identifying di erential density regions as a multiple testing problem. First, we partition the sample space into small bins. In each bin, we form a hypothesis to test the existence of di erential pdfs. Second, we develop a novel multiple testing method, called TEAM (Testing on the Aggregation tree Method), to identify those bins that harbor di erential pdfs while controlling the false discovery rate (FDR) under the desired level. TEAM embeds the testing procedure into an aggregation tree to test from fineto coarse-resolution. The procedure achieves the statistical goal of pinpointing density di erences to the smallest possible regions. TEAM is computationally e cient, capable of analyzing large flow cytometry data sets in much shorter time compared with competing methods. We applied TEAM and competing methods on a flow cytometry data set to identify T cells responsive to the cytomegalovirus (CMV)-pp65 antigen stimulation. With additional downstream screening, TEAM successfully identified enriched sets containing monofunctional, bifunctional, and polyfunctional T cells. Competing methods either did not finish in a reasonable time frame or provided less interpretable results. Numerical simulations and theoretical justifications demonstrate that TEAM has asymptotically valid, powerful, and robust performance. Overall, TEAM is a computationally e cient and statistically powerful algorithm that can yield meaningful biological insights in flow cytometry studies.","PeriodicalId":188068,"journal":{"name":"The Annals of Applied Statistics","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127558380","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Model selection uncertainty and stability in beta regression models: A study of bootstrap-based model averaging with an empirical application to clickstream data 模型选择的不确定性和β回归模型的稳定性:基于bootstrap的模型平均研究与经验应用于点击流数据

The Annals of Applied Statistics Pub Date : 2023-03-01 DOI: 10.1214/22-aoas1647

Corban Allenbrand, Ben Sherwood

{"title":"Model selection uncertainty and stability in beta regression models: A study of bootstrap-based model averaging with an empirical application to clickstream data","authors":"Corban Allenbrand, Ben Sherwood","doi":"10.1214/22-aoas1647","DOIUrl":"https://doi.org/10.1214/22-aoas1647","url":null,"abstract":"Statistical model development is a central feature of many scientific investigations with a vast methodological landscape. However, uncertainty in the model development process has received less attention and is frequently resolved non-rigorously through beliefs about generalizability, practical usefulness, and computational ease. This is particularly problematic in settings of abundant data, such as clickstream data, as model selection routinely admits multiple models and imposes a source of uncertainty, unacknowledged and unknown by many, on all post-selection conclusions. Regression models based on the beta distribution are class of non-linear models, attractive because of their great flexibility and potential explanatory power, but have not been investigated from the standpoint of multi-model uncertainty and model averaging. For this reason, a formalized tool that can combine model selection uncertainty and beta regression modeling is presented in this work. The tool combines bootstrap model averaging, model selection, and asymptotic theory to yield a procedure that can perform joint modeling of the mean and precision parameters, capture sources of variability in the data, and achieve more accurate claims of estimate precision, variable importance, gen-eralization performance, and model stability. Practical utility of the tool is demonstrated through a study of model selection consistency and variable importance in average exit and bounce rate statistical models. This work emphasizes the necessity of a departure from the all-too-common practice of ignoring model selection uncertainty and introduces an accessible technique to handle frequently neglected aspects of the modeling pipeline.","PeriodicalId":188068,"journal":{"name":"The Annals of Applied Statistics","volume":"153 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114157175","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1