Zhengwu Zhang, Yuexuan Wu, D. Xiong, J. G. Ibrahim, A. Srivastava, Hongtu Zhu
{"title":"Rejoinder: LESA: Longitudinal Elastic Shape Analysis of Brain Subcortical Structures","authors":"Zhengwu Zhang, Yuexuan Wu, D. Xiong, J. G. Ibrahim, A. Srivastava, Hongtu Zhu","doi":"10.1080/01621459.2022.2139264","DOIUrl":"https://doi.org/10.1080/01621459.2022.2139264","url":null,"abstract":"The brain surfaces including both cortical and subcortical structures including hippocampus have been analyzed for more than a decade using publicly available software packages such as FreeSurfer (Dale & Fischl 1999) and SurfStat (Worsley et al. 2009, Chung et al. 2010). Zhang et al. (2022) proposes an elastic shape metric based method for perform-ing longitudinal shape analysis on brain subcortical structures. However, the demonstrated applications are limited to global summary measures such as the total surface area and prin-ciple component (PC) scores significantly limiting the impact of the study. For analyzing total surface area, we do not even need to align structures using LESA. PC scores loose richer vertex-based local information and it is unclear what parts of the hippocampus are responsible for longitudinal change. A more effective approach is to perform local shape analysis using the deformation-based morphometry (DBM) and tensor-based morphometry (TBM) after obtaining deformation in LESA (Ashburner et al. 1998, 2000, Thompson et al. 2000). Considering elastic methods put severe constraints on the Jacobian determinant of image deformation (Chung et al. 2001), it is not clear LESA can be effectively used in local shape analysis. We contrast shape analysis done in Zhang et al. (2022) against DBM and TBM in a longitudinal hippocampus study (Chung et al. 2011). The deformation-based morphometry (DBM) utilizes the deformation field obtained","PeriodicalId":17227,"journal":{"name":"Journal of the American Statistical Association","volume":"118 1","pages":"25 - 28"},"PeriodicalIF":3.7,"publicationDate":"2023-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43933828","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Editorial: What Makes for a Great Applications and Case Studies Paper?","authors":"M. Stein","doi":"10.1080/01621459.2023.2173458","DOIUrl":"https://doi.org/10.1080/01621459.2023.2173458","url":null,"abstract":"One of the most common reasons a submission to JASA Applications and Case Studies is rejected is that it is deemed inappropriate for this section of the journal. If we look for guidance in the journal’s instructions for authors, the opening sentence of the section on Applications and Case Studies states, “The Applications and Case Studies section publishes original articles that cogently demonstrate statistical usage in applications from any research area.” In contrast, the instructions for Theory and Methods papers say, “The research reported should be motivated by a scientific or practical problem and, ideally, illustrated by application of the proposed methodology to that problem. Illustration of techniques with real data is especially welcomed and strongly encouraged.” Many potential authors may find the distinctions between these two statements difficult to discern, which perhaps partly explains the high frequency of submissions rejected for being inappropriate. This editorial is an attempt to clarify what I think these distinctions are. In particular, in what ways is a paper with new methodology motivated by a “scientific or practical problem” and illustrated with “real data” not necessarily appropriate for Applications and Case Studies? First, let me be clear that the views expressed here are my own and are not part of official journal policy. Every editor is the final arbiter of what papers should be published in the journal and I think it is appropriate and even desirable that different editors use somewhat different criteria in making these decisions. Nevertheless, it is my hope that authors, referees, associate editors, and future editors will find it helpful for me to spell out in greater detail than is appropriate for a journal’s website some of the things I look for when evaluating a submission. There is not and should not be a clear and wide dividing line between Applications and Case Studies papers and Theory and Methods papers. Nevertheless, the use of the word “illustration” in the instructions for Theory and Methods papers points at a key distinction. An illustrative example possesses a feature that a proposed methodology is meant to address. The resulting data analysis may be rather brief, focusing on how the methodology can handle this feature better than previously proposed methods. The example thus serves in a supporting role, with the novel methodology and possibly accompanying theory being the main research contributions. In contrast, the specific application plays a much more prominent role in an Applications and Case Studies paper. A typical Applications and Case Studies paper begins with a description of the applied problem, generally one of current scientific or policy interest, which then leads into a discussion of the proposed methodology. Note that while most Applications and Case Studies papers include novel methodology, that is not a requirement for publication. For example, a paper that adapts existing methodolo","PeriodicalId":17227,"journal":{"name":"Journal of the American Statistical Association","volume":"2008 25","pages":"1 - 2"},"PeriodicalIF":3.7,"publicationDate":"2023-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41262583","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Francesco Denti, Federico Camerlenghi, Michele Guindani, Antonietta Mira
{"title":"A Common Atoms Model for the Bayesian Nonparametric Analysis of Nested Data.","authors":"Francesco Denti, Federico Camerlenghi, Michele Guindani, Antonietta Mira","doi":"10.1080/01621459.2021.1933499","DOIUrl":"10.1080/01621459.2021.1933499","url":null,"abstract":"<p><p>The use of large datasets for targeted therapeutic interventions requires new ways to characterize the heterogeneity observed across subgroups of a specific population. In particular, models for partially exchangeable data are needed for inference on nested datasets, where the observations are assumed to be organized in different units and some sharing of information is required to learn distinctive features of the units. In this manuscript, we propose a nested common atoms model (CAM) that is particularly suited for the analysis of nested datasets where the distributions of the units are expected to differ only over a small fraction of the observations sampled from each unit. The proposed CAM allows a two-layered clustering at the distributional and observational level and is amenable to scalable posterior inference through the use of a computationally efficient nested slice sampler algorithm. We further discuss how to extend the proposed modeling framework to handle discrete measurements, and we conduct posterior inference on a real microbiome dataset from a diet swap study to investigate how the alterations in intestinal microbiota composition are associated with different eating habits. We further investigate the performance of our model in capturing true distributional structures in the population by means of a simulation study.</p>","PeriodicalId":17227,"journal":{"name":"Journal of the American Statistical Association","volume":"118 541","pages":"405-416"},"PeriodicalIF":3.7,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/01621459.2021.1933499","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9380283","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jiayin Zheng, Xinyuan Dong, Christina C Newton, Li Hsu
{"title":"A Generalized Integration Approach to Association Analysis with Multi-category Outcome: An Application to a Tumor Sequencing Study of Colorectal Cancer and Smoking.","authors":"Jiayin Zheng, Xinyuan Dong, Christina C Newton, Li Hsu","doi":"10.1080/01621459.2022.2105703","DOIUrl":"10.1080/01621459.2022.2105703","url":null,"abstract":"<p><p>Cancer is a heterogeneous disease, and rapid progress in sequencing and -omics technologies has enabled researchers to characterize tumors comprehensively. This has stimulated an intensive interest in studying how risk factors are associated with various tumor heterogeneous features. The Cancer Prevention Study-II (CPS-II) cohort is one of the largest prospective studies, particularly valuable for elucidating associations between cancer and risk factors. In this paper, we investigate the association of smoking with novel colorectal tumor markers obtained from targeted sequencing. However, due to cost and logistic difficulties, only a limited number of tumors can be assayed, which limits our capability for studying these associations. Meanwhile, there are extensive studies for assessing the association of smoking with overall cancer risk and established colorectal tumor markers. Importantly, such summary information is readily available from the literature. By linking this summary information to parameters of interest with proper constraints, we develop a generalized integration approach for polytomous logistic regression model with outcome characterized by tumor features. The proposed approach gains the efficiency through maximizing the joint likelihood of individual-level tumor data and external summary information under the constraints that narrow the parameter searching space. We apply the proposed method to the CPS-II data and identify the association of smoking with colorectal cancer risk differing by the mutational status of APC and RNF43 genes, neither of which is identified by the conventional analysis of CPS-II individual data only. These results help better understand the role of smoking in the etiology of colorectal cancer.</p>","PeriodicalId":17227,"journal":{"name":"Journal of the American Statistical Association","volume":"118 541","pages":"29-42"},"PeriodicalIF":3.7,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10168026/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9491224","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Statistical Inference for High-Dimensional Generalized Linear Models with Binary Outcomes.","authors":"T Tony Cai, Zijian Guo, Rong Ma","doi":"10.1080/01621459.2021.1990769","DOIUrl":"10.1080/01621459.2021.1990769","url":null,"abstract":"<p><p>This paper develops a unified statistical inference framework for high-dimensional binary generalized linear models (GLMs) with general link functions. Both unknown and known design distribution settings are considered. A two-step weighted bias-correction method is proposed for constructing confidence intervals and simultaneous hypothesis tests for individual components of the regression vector. Minimax lower bound for the expected length is established and the proposed confidence intervals are shown to be rate-optimal up to a logarithmic factor. The numerical performance of the proposed procedure is demonstrated through simulation studies and an analysis of a single cell RNA-seq data set, which yields interesting biological insights that integrate well into the current literature on the cellular immune response mechanisms as characterized by single-cell transcriptomics. The theoretical analysis provides important insights on the adaptivity of optimal confidence intervals with respect to the sparsity of the regression vector. New lower bound techniques are introduced and they can be of independent interest to solve other inference problems in high-dimensional binary GLMs.</p>","PeriodicalId":17227,"journal":{"name":"Journal of the American Statistical Association","volume":"118 542","pages":"1319-1332"},"PeriodicalIF":3.7,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10292730/pdf/nihms-1824949.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9716114","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Causal Inference in Transcriptome-Wide Association Studies with Invalid Instruments and GWAS Summary Data.","authors":"Haoran Xue, Xiaotong Shen, Wei Pan","doi":"10.1080/01621459.2023.2183127","DOIUrl":"10.1080/01621459.2023.2183127","url":null,"abstract":"<p><p>Transcriptome-wide association studies (TWAS) have recently emerged as a popular tool to discover (putative) causal genes by integrating an outcome GWAS dataset with another gene expression/transcriptome GWAS (called eQTL) dataset. In our motivating and target application, we'd like to identify causal genes for low-density lipoprotein cholesterol (LDL), which is crucial for developing new treatments for hyperlipidemia and cardiovascular diseases. The statistical principle underlying TWAS is (two-sample) two-stage least squares (2SLS) using multiple correlated SNPs as instrumental variables (IVs); it is closely related to typical (two-sample) Mendelian randomization (MR) using independent SNPs as IVs, which is expected to be impractical and lower-powered for TWAS (and some other) applications. However, often some of the SNPs used may not be valid IVs, e.g. due to the widespread pleiotropy of their direct effects on the outcome not mediated through the gene of interest, leading to false conclusions by TWAS (or MR). Building on recent advances in sparse regression, we propose a robust and efficient inferential method to account for both hidden confounding and some invalid IVs via two-stage constrained maximum likelihood (2ScML), an extension of 2SLS. We first develop the proposed method with individual-level data, then extend it both theoretically and computationally to GWAS summary data for the most popular two-sample TWAS design, to which almost all existing robust IV regression methods are however not applicable. We show that the proposed method achieves asymptotically valid statistical inference on causal effects, demonstrating its wider applicability and superior finite-sample performance over the standard 2SLS/TWAS (and MR). We apply the methods to identify putative causal genes for LDL by integrating large-scale lipid GWAS summary data with eQTL data.</p>","PeriodicalId":17227,"journal":{"name":"Journal of the American Statistical Association","volume":"118 543","pages":"1525-1537"},"PeriodicalIF":3.7,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10557939/pdf/nihms-1877198.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41116082","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Tukey's Depth for Object Data.","authors":"Xiongtao Dai, Sara Lopez-Pintado","doi":"10.1080/01621459.2021.2011298","DOIUrl":"10.1080/01621459.2021.2011298","url":null,"abstract":"<p><p>We develop a novel exploratory tool for non-Euclidean object data based on data depth, extending celebrated Tukey's depth for Euclidean data. The proposed metric halfspace depth, applicable to data objects in a general metric space, assigns to data points depth values that characterize the centrality of these points with respect to the distribution and provides an interpretable center-outward ranking. Desirable theoretical properties that generalize standard depth properties postulated for Euclidean data are established for the metric halfspace depth. The depth median, defined as the deepest point, is shown to have high robustness as a location descriptor both in theory and in simulation. We propose an efficient algorithm to approximate the metric halfspace depth and illustrate its ability to adapt to the intrinsic data geometry. The metric halfspace depth was applied to an Alzheimer's disease study, revealing group differences in the brain connectivity, modeled as covariance matrices, for subjects in different stages of dementia. Based on phylogenetic trees of 7 pathogenic parasites, our proposed metric halfspace depth was also used to construct a meaningful consensus estimate of the evolutionary history and to identify potential outlier trees.</p>","PeriodicalId":17227,"journal":{"name":"Journal of the American Statistical Association","volume":"118 543","pages":"1760-1772"},"PeriodicalIF":3.7,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10545316/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41148821","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"<i>iProMix</i>: A mixture model for studying the function of ACE2 based on bulk proteogenomic data.","authors":"Xiaoyu Song, Jiayi Ji, Pei Wang","doi":"10.1080/01621459.2022.2110876","DOIUrl":"https://doi.org/10.1080/01621459.2022.2110876","url":null,"abstract":"<p><p>Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has caused over six million deaths in the ongoing COVID-19 pandemic. SARS-CoV-2 uses ACE2 protein to enter human cells, raising a pressing need to characterize proteins/pathways interacted with ACE2. Large-scale proteomic profiling technology is not mature at single-cell resolution to examine the protein activities in disease-relevant cell types. We propose <i>iProMix</i>, a novel statistical framework to identify epithelial-cell specific associations between ACE2 and other proteins/pathways with bulk proteomic data. <i>iProMix</i> decomposes the data and models cell-type-specific conditional joint distribution of proteins through a mixture model. It improves cell-type composition estimation from prior input, and utilizes a non-parametric inference framework to account for uncertainty of cell-type proportion estimates in hypothesis test. Simulations demonstrate <i>iProMix</i> has well-controlled false discovery rates and favorable powers in non-asymptotic settings. We apply <i>iProMix</i> to the proteomic data of 110 (tumor adjacent) normal lung tissue samples from the Clinical Proteomic Tumor Analysis Consortium lung adenocarcinoma study, and identify interferon <i>α</i>/<i>γ</i> response pathways as the most significant pathways associated with ACE2 protein abundances in epithelial cells. Strikingly, the association direction is sex-specific. This result casts light on the sex difference of COVID-19 incidences and outcomes, and motivates sex-specific evaluation for interferon therapies.</p>","PeriodicalId":17227,"journal":{"name":"Journal of the American Statistical Association","volume":"118 541","pages":"43-55"},"PeriodicalIF":3.7,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10321538/pdf/nihms-1841220.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9859882","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yuxin Chen, Jianqing Fan, Bingyan Wang, Yuling Yan
{"title":"Convex and Nonconvex Optimization Are Both Minimax-Optimal for Noisy Blind Deconvolution under Random Designs.","authors":"Yuxin Chen, Jianqing Fan, Bingyan Wang, Yuling Yan","doi":"10.1080/01621459.2021.1956501","DOIUrl":"https://doi.org/10.1080/01621459.2021.1956501","url":null,"abstract":"<p><p>We investigate the effectiveness of convex relaxation and nonconvex optimization in solving bilinear systems of equations under two different designs (i.e. a sort of random Fourier design and Gaussian design). Despite the wide applicability, the theoretical understanding about these two paradigms remains largely inadequate in the presence of random noise. The current paper makes two contributions by demonstrating that: (1) a two-stage nonconvex algorithm attains minimax-optimal accuracy within a logarithmic number of iterations, and (2) convex relaxation also achieves minimax-optimal statistical accuracy vis-à-vis random noise. Both results significantly improve upon the state-of-the-art theoretical guarantees.</p>","PeriodicalId":17227,"journal":{"name":"Journal of the American Statistical Association","volume":"118 542","pages":"858-868"},"PeriodicalIF":3.7,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/01621459.2021.1956501","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10094943","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Filtering the rejection set while preserving false discovery rate control.","authors":"Eugene Katsevich, Chiara Sabatti, Marina Bogomolov","doi":"10.1080/01621459.2021.1920958","DOIUrl":"10.1080/01621459.2021.1920958","url":null,"abstract":"<p><p>Scientific hypotheses in a variety of applications have domain-specific structures, such as the tree structure of the International Classification of Diseases (ICD), the directed acyclic graph structure of the Gene Ontology (GO), or the spatial structure in genome-wide association studies. In the context of multiple testing, the resulting relationships among hypotheses can create redundancies among rejections that hinder interpretability. This leads to the practice of filtering rejection sets obtained from multiple testing procedures, which may in turn invalidate their inferential guarantees. We propose Focused BH, a simple, flexible, and principled methodology to adjust for the application of any pre-specified filter. We prove that Focused BH controls the false discovery rate under various conditions, including when the filter satisfies an intuitive monotonicity property and the <i>p</i>-values are positively dependent. We demonstrate in simulations that Focused BH performs well across a variety of settings, and illustrate this method's practical utility via analyses of real datasets based on ICD and GO.</p>","PeriodicalId":17227,"journal":{"name":"Journal of the American Statistical Association","volume":"118 541","pages":"165-176"},"PeriodicalIF":3.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10281705/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9702573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}