{"title":"Score-Driven Modeling of Spatio-Temporal Data.","authors":"Francesca Gasperoni, Alessandra Luati, Lucia Paci, Enzo D'Innocenzo","doi":"10.1080/01621459.2021.1970571","DOIUrl":"https://doi.org/10.1080/01621459.2021.1970571","url":null,"abstract":"<p><p>A simultaneous autoregressive score-driven model with autoregressive disturbances is developed for spatio-temporal data that may exhibit heavy tails. The model specification rests on a signal plus noise decomposition of a spatially filtered process,where the signal can be approximated by a nonlinear function of the past variables and a set of explanatory variables, while the noise follows a multivariate Student-t distribution. The key feature of the model is that the dynamics of the space-time varying signal are driven by the score of the conditional likelihood function.When the distribution is heavy-tailed, the score provides a robust update of the space-time varying location. Consistency and asymptotic normality ofmaximum likelihood estimators are derived along with the stochastic properties of the model. The motivating application of the proposed model comes from brain scans recorded through functional magnetic resonance imaging when subjects are at rest and not expected to react to any controlled stimulus. We identify spontaneous activations in brain regions as extreme values of a possibly heavy-tailed distribution, by accounting for spatial and temporal dependence.</p>","PeriodicalId":17227,"journal":{"name":"Journal of the American Statistical Association","volume":"118 542","pages":"1066-1077"},"PeriodicalIF":3.7,"publicationDate":"2023-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7614622/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9590399","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Francesco Denti, Federico Camerlenghi, Michele Guindani, Antonietta Mira
{"title":"A Common Atoms Model for the Bayesian Nonparametric Analysis of Nested Data.","authors":"Francesco Denti, Federico Camerlenghi, Michele Guindani, Antonietta Mira","doi":"10.1080/01621459.2021.1933499","DOIUrl":"10.1080/01621459.2021.1933499","url":null,"abstract":"<p><p>The use of large datasets for targeted therapeutic interventions requires new ways to characterize the heterogeneity observed across subgroups of a specific population. In particular, models for partially exchangeable data are needed for inference on nested datasets, where the observations are assumed to be organized in different units and some sharing of information is required to learn distinctive features of the units. In this manuscript, we propose a nested common atoms model (CAM) that is particularly suited for the analysis of nested datasets where the distributions of the units are expected to differ only over a small fraction of the observations sampled from each unit. The proposed CAM allows a two-layered clustering at the distributional and observational level and is amenable to scalable posterior inference through the use of a computationally efficient nested slice sampler algorithm. We further discuss how to extend the proposed modeling framework to handle discrete measurements, and we conduct posterior inference on a real microbiome dataset from a diet swap study to investigate how the alterations in intestinal microbiota composition are associated with different eating habits. We further investigate the performance of our model in capturing true distributional structures in the population by means of a simulation study.</p>","PeriodicalId":17227,"journal":{"name":"Journal of the American Statistical Association","volume":"118 541","pages":"405-416"},"PeriodicalIF":3.7,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/01621459.2021.1933499","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9380283","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jiayin Zheng, Xinyuan Dong, Christina C Newton, Li Hsu
{"title":"A Generalized Integration Approach to Association Analysis with Multi-category Outcome: An Application to a Tumor Sequencing Study of Colorectal Cancer and Smoking.","authors":"Jiayin Zheng, Xinyuan Dong, Christina C Newton, Li Hsu","doi":"10.1080/01621459.2022.2105703","DOIUrl":"10.1080/01621459.2022.2105703","url":null,"abstract":"<p><p>Cancer is a heterogeneous disease, and rapid progress in sequencing and -omics technologies has enabled researchers to characterize tumors comprehensively. This has stimulated an intensive interest in studying how risk factors are associated with various tumor heterogeneous features. The Cancer Prevention Study-II (CPS-II) cohort is one of the largest prospective studies, particularly valuable for elucidating associations between cancer and risk factors. In this paper, we investigate the association of smoking with novel colorectal tumor markers obtained from targeted sequencing. However, due to cost and logistic difficulties, only a limited number of tumors can be assayed, which limits our capability for studying these associations. Meanwhile, there are extensive studies for assessing the association of smoking with overall cancer risk and established colorectal tumor markers. Importantly, such summary information is readily available from the literature. By linking this summary information to parameters of interest with proper constraints, we develop a generalized integration approach for polytomous logistic regression model with outcome characterized by tumor features. The proposed approach gains the efficiency through maximizing the joint likelihood of individual-level tumor data and external summary information under the constraints that narrow the parameter searching space. We apply the proposed method to the CPS-II data and identify the association of smoking with colorectal cancer risk differing by the mutational status of APC and RNF43 genes, neither of which is identified by the conventional analysis of CPS-II individual data only. These results help better understand the role of smoking in the etiology of colorectal cancer.</p>","PeriodicalId":17227,"journal":{"name":"Journal of the American Statistical Association","volume":"118 541","pages":"29-42"},"PeriodicalIF":3.7,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10168026/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9491224","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Statistical Inference for High-Dimensional Generalized Linear Models with Binary Outcomes.","authors":"T Tony Cai, Zijian Guo, Rong Ma","doi":"10.1080/01621459.2021.1990769","DOIUrl":"10.1080/01621459.2021.1990769","url":null,"abstract":"<p><p>This paper develops a unified statistical inference framework for high-dimensional binary generalized linear models (GLMs) with general link functions. Both unknown and known design distribution settings are considered. A two-step weighted bias-correction method is proposed for constructing confidence intervals and simultaneous hypothesis tests for individual components of the regression vector. Minimax lower bound for the expected length is established and the proposed confidence intervals are shown to be rate-optimal up to a logarithmic factor. The numerical performance of the proposed procedure is demonstrated through simulation studies and an analysis of a single cell RNA-seq data set, which yields interesting biological insights that integrate well into the current literature on the cellular immune response mechanisms as characterized by single-cell transcriptomics. The theoretical analysis provides important insights on the adaptivity of optimal confidence intervals with respect to the sparsity of the regression vector. New lower bound techniques are introduced and they can be of independent interest to solve other inference problems in high-dimensional binary GLMs.</p>","PeriodicalId":17227,"journal":{"name":"Journal of the American Statistical Association","volume":"118 542","pages":"1319-1332"},"PeriodicalIF":3.7,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10292730/pdf/nihms-1824949.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9716114","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"<i>iProMix</i>: A mixture model for studying the function of ACE2 based on bulk proteogenomic data.","authors":"Xiaoyu Song, Jiayi Ji, Pei Wang","doi":"10.1080/01621459.2022.2110876","DOIUrl":"https://doi.org/10.1080/01621459.2022.2110876","url":null,"abstract":"<p><p>Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has caused over six million deaths in the ongoing COVID-19 pandemic. SARS-CoV-2 uses ACE2 protein to enter human cells, raising a pressing need to characterize proteins/pathways interacted with ACE2. Large-scale proteomic profiling technology is not mature at single-cell resolution to examine the protein activities in disease-relevant cell types. We propose <i>iProMix</i>, a novel statistical framework to identify epithelial-cell specific associations between ACE2 and other proteins/pathways with bulk proteomic data. <i>iProMix</i> decomposes the data and models cell-type-specific conditional joint distribution of proteins through a mixture model. It improves cell-type composition estimation from prior input, and utilizes a non-parametric inference framework to account for uncertainty of cell-type proportion estimates in hypothesis test. Simulations demonstrate <i>iProMix</i> has well-controlled false discovery rates and favorable powers in non-asymptotic settings. We apply <i>iProMix</i> to the proteomic data of 110 (tumor adjacent) normal lung tissue samples from the Clinical Proteomic Tumor Analysis Consortium lung adenocarcinoma study, and identify interferon <i>α</i>/<i>γ</i> response pathways as the most significant pathways associated with ACE2 protein abundances in epithelial cells. Strikingly, the association direction is sex-specific. This result casts light on the sex difference of COVID-19 incidences and outcomes, and motivates sex-specific evaluation for interferon therapies.</p>","PeriodicalId":17227,"journal":{"name":"Journal of the American Statistical Association","volume":"118 541","pages":"43-55"},"PeriodicalIF":3.7,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10321538/pdf/nihms-1841220.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9859882","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yuxin Chen, Jianqing Fan, Bingyan Wang, Yuling Yan
{"title":"Convex and Nonconvex Optimization Are Both Minimax-Optimal for Noisy Blind Deconvolution under Random Designs.","authors":"Yuxin Chen, Jianqing Fan, Bingyan Wang, Yuling Yan","doi":"10.1080/01621459.2021.1956501","DOIUrl":"https://doi.org/10.1080/01621459.2021.1956501","url":null,"abstract":"<p><p>We investigate the effectiveness of convex relaxation and nonconvex optimization in solving bilinear systems of equations under two different designs (i.e. a sort of random Fourier design and Gaussian design). Despite the wide applicability, the theoretical understanding about these two paradigms remains largely inadequate in the presence of random noise. The current paper makes two contributions by demonstrating that: (1) a two-stage nonconvex algorithm attains minimax-optimal accuracy within a logarithmic number of iterations, and (2) convex relaxation also achieves minimax-optimal statistical accuracy vis-à-vis random noise. Both results significantly improve upon the state-of-the-art theoretical guarantees.</p>","PeriodicalId":17227,"journal":{"name":"Journal of the American Statistical Association","volume":"118 542","pages":"858-868"},"PeriodicalIF":3.7,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/01621459.2021.1956501","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10094943","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Filtering the rejection set while preserving false discovery rate control.","authors":"Eugene Katsevich, Chiara Sabatti, Marina Bogomolov","doi":"10.1080/01621459.2021.1920958","DOIUrl":"10.1080/01621459.2021.1920958","url":null,"abstract":"<p><p>Scientific hypotheses in a variety of applications have domain-specific structures, such as the tree structure of the International Classification of Diseases (ICD), the directed acyclic graph structure of the Gene Ontology (GO), or the spatial structure in genome-wide association studies. In the context of multiple testing, the resulting relationships among hypotheses can create redundancies among rejections that hinder interpretability. This leads to the practice of filtering rejection sets obtained from multiple testing procedures, which may in turn invalidate their inferential guarantees. We propose Focused BH, a simple, flexible, and principled methodology to adjust for the application of any pre-specified filter. We prove that Focused BH controls the false discovery rate under various conditions, including when the filter satisfies an intuitive monotonicity property and the <i>p</i>-values are positively dependent. We demonstrate in simulations that Focused BH performs well across a variety of settings, and illustrate this method's practical utility via analyses of real datasets based on ICD and GO.</p>","PeriodicalId":17227,"journal":{"name":"Journal of the American Statistical Association","volume":"118 541","pages":"165-176"},"PeriodicalIF":3.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10281705/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9702573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Discussion of \"LESA: Longitudinal Elastic Shape Analysis of Brain Subcortical Structures\".","authors":"Daiwei Zhang, Jian Kang","doi":"10.1080/01621459.2022.2123334","DOIUrl":"10.1080/01621459.2022.2123334","url":null,"abstract":"","PeriodicalId":17227,"journal":{"name":"Journal of the American Statistical Association","volume":"118 541","pages":"22-24"},"PeriodicalIF":3.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10085561/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9977528","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Evaluating Association Between Two Event Times with Observations Subject to Informative Censoring.","authors":"Dongdong Li, X Joan Hu, Rui Wang","doi":"10.1080/01621459.2021.1990766","DOIUrl":"10.1080/01621459.2021.1990766","url":null,"abstract":"<p><p>This article is concerned with evaluating the association between two event times without specifying the joint distribution parametrically. This is particularly challenging when the observations on the event times are subject to informative censoring due to a terminating event such as death. There are few methods suitable for assessing covariate effects on association in this context. We link the joint distribution of the two event times and the informative censoring time using a nested copula function. We use flexible functional forms to specify the covariate effects on both the marginal and joint distributions. In a semiparametric model for the bivariate event time, we estimate simultaneously the association parameters, the marginal survival functions, and the covariate effects. A byproduct of the approach is a consistent estimator for the induced marginal survival function of each event time conditional on the covariates. We develop an easy-to-implement pseudolikelihood-based inference procedure, derive the asymptotic properties of the estimators, and conduct simulation studies to examine the finite-sample performance of the proposed approach. For illustration, we apply our method to analyze data from the breast cancer survivorship study that motivated this research. Supplementary materials for this article are available online.</p>","PeriodicalId":17227,"journal":{"name":"Journal of the American Statistical Association","volume":"118 542","pages":"1282-1294"},"PeriodicalIF":3.7,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10259842/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10011535","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ines Wilms, Sumanta Basu, Jacob Bien, David S Matteson
{"title":"Sparse Identification and Estimation of Large-Scale Vector AutoRegressive Moving Averages.","authors":"Ines Wilms, Sumanta Basu, Jacob Bien, David S Matteson","doi":"10.1080/01621459.2021.1942013","DOIUrl":"10.1080/01621459.2021.1942013","url":null,"abstract":"<p><p>The Vector AutoRegressive Moving Average (VARMA) model is fundamental to the theory of multivariate time series; however, identifiability issues have led practitioners to abandon it in favor of the simpler but more restrictive Vector AutoRegressive (VAR) model. We narrow this gap with a new optimization-based approach to VARMA identification built upon the principle of parsimony. Among all equivalent data-generating models, we use convex optimization to seek the parameterization that is simplest in a certain sense. A user-specified strongly convex penalty is used to measure model simplicity, and that same penalty is then used to define an estimator that can be efficiently computed. We establish consistency of our estimators in a double-asymptotic regime. Our non-asymptotic error bound analysis accommodates both model specification and parameter estimation steps, a feature that is crucial for studying large-scale VARMA algorithms. Our analysis also provides new results on penalized estimation of infinite-order VAR, and elastic net regression under a singular covariance structure of regressors, which may be of independent interest. We illustrate the advantage of our method over VAR alternatives on three real data examples.</p>","PeriodicalId":17227,"journal":{"name":"Journal of the American Statistical Association","volume":"118 541","pages":"571-582"},"PeriodicalIF":3.7,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/01621459.2021.1942013","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9702571","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}