{"title":"A Stability Framework for Parameter Selection in the Minimum Covariance Determinant Problem","authors":"Qiang Heng, Hui Shen, Kenneth Lange","doi":"10.1080/10618600.2025.2495780","DOIUrl":"https://doi.org/10.1080/10618600.2025.2495780","url":null,"abstract":"","PeriodicalId":15422,"journal":{"name":"Journal of Computational and Graphical Statistics","volume":"7 1","pages":""},"PeriodicalIF":2.4,"publicationDate":"2025-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144193804","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Sensitivity Analysis for Binary Outcome Misclassification in Randomization Tests via Integer Programming.","authors":"Siyu Heng, Pamela A Shaw","doi":"10.1080/10618600.2025.2461222","DOIUrl":"https://doi.org/10.1080/10618600.2025.2461222","url":null,"abstract":"<p><p>Conducting a randomization test is a common method for testing causal null hypotheses in randomized experiments. The popularity of randomization tests is largely because their statistical validity only depends on the randomization design, and no distributional or modeling assumption on the outcome variable is needed. However, randomization tests may still suffer from other sources of bias, among which outcome misclassification is a significant one. We propose a model-free and finite-population sensitivity analysis approach for binary outcome misclassification in randomization tests. A central quantity in our framework is \"warning accuracy,\" defined as the threshold such that a randomization test result based on the measured outcomes may differ from that based on the true outcomes if the outcome measurement accuracy did not surpass that threshold. We show how learning the warning accuracy and related concepts can amplify analyses of randomization tests subject to outcome misclassification without adding additional assumptions. We show that the warning accuracy can be computed efficiently for large data sets by adaptively reformulating a large-scale integer program with respect to the randomization design. We apply the proposed approach to the Prostate Cancer Prevention Trial (PCPT). We also developed an open-source R package for implementation of our approach.</p>","PeriodicalId":15422,"journal":{"name":"Journal of Computational and Graphical Statistics","volume":" ","pages":""},"PeriodicalIF":1.8,"publicationDate":"2025-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12377470/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144955843","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Powerful significance testing for unbalanced clusters.","authors":"Thomas H Keefe, J S Marron","doi":"10.1080/10618600.2025.2469756","DOIUrl":"https://doi.org/10.1080/10618600.2025.2469756","url":null,"abstract":"<p><p>Clustering methods are popular for revealing structure in data, particularly in the high-dimensional setting common to contemporary data science. A central <i>statistical</i> question is \"are the clusters really there?\" One pioneering method in statistical cluster validation is <i>SigClust</i>, but it is severely underpowered in the important setting where the candidate clusters have unbalanced sizes, such as in rare subtypes of disease. We show why this is the case and propose a remedy that is powerful in both the unbalanced and balanced settings, using a novel generalization of <math><mi>k</mi></math> -means clustering. We illustrate the value of our method using a high-dimensional dataset of gene expression in kidney cancer patients. A Python implementation is available at https://github.com/thomaskeefe/sigclust.</p>","PeriodicalId":15422,"journal":{"name":"Journal of Computational and Graphical Statistics","volume":" ","pages":""},"PeriodicalIF":1.8,"publicationDate":"2025-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12338451/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144955862","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"ProSpar-GP: Scalable Gaussian Process Modeling with Massive Nonstationary Datasets","authors":"Kevin Li, Simon Mak","doi":"10.1080/10618600.2025.2490264","DOIUrl":"https://doi.org/10.1080/10618600.2025.2490264","url":null,"abstract":"","PeriodicalId":15422,"journal":{"name":"Journal of Computational and Graphical Statistics","volume":"58 3 1","pages":""},"PeriodicalIF":2.4,"publicationDate":"2025-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144193805","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Dependence-based fuzzy clustering of functional time series","authors":"Ángel López-Oriona, Ying Sun, Han Lin Shang","doi":"10.1080/10618600.2025.2489537","DOIUrl":"https://doi.org/10.1080/10618600.2025.2489537","url":null,"abstract":"","PeriodicalId":15422,"journal":{"name":"Journal of Computational and Graphical Statistics","volume":"137 1","pages":""},"PeriodicalIF":2.4,"publicationDate":"2025-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144193807","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Clustering Time-Evolving Networks Using Temporal Exponential-Family Random Graph Models with Conditional Dyadic Independence and Dynamic Latent Blocks","authors":"Amal Agarwal, Kevin H. Lee, Lingzhou Xue","doi":"10.1080/10618600.2025.2484011","DOIUrl":"https://doi.org/10.1080/10618600.2025.2484011","url":null,"abstract":"","PeriodicalId":15422,"journal":{"name":"Journal of Computational and Graphical Statistics","volume":"142 1","pages":""},"PeriodicalIF":2.4,"publicationDate":"2025-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144193428","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Extrapolation before imputation reduces bias when imputing censored covariates.","authors":"Sarah C Lotspeich, Tanya P Garcia","doi":"10.1080/10618600.2024.2444323","DOIUrl":"10.1080/10618600.2024.2444323","url":null,"abstract":"<p><p>Modeling symptom progression to identify ideal subjects for a Huntington's disease clinical trial is problematic since time to diagnosis, a key covariate, can be heavily censored. Imputation is an appealing strategy that replaces the censored covariate with its conditional mean, but existing methods saw over 200% bias under heavy censoring. Calculating conditional means well requires estimating and then integrating over the survival function of the censored covariate from the censored value to infinity. To estimate the survival function flexibly, existing methods use the semiparametric Cox model with Breslow's estimator, leaving the integrand for the conditional means (the survival function) undefined beyond the observed data. The integral is then estimated up to the largest observed covariate value, and this approximation can cut off the tail of the survival function and lead to severe bias. We combine the semiparametric survival estimator with a parametric extension to approximate the integral up to infinity. In simulations, our proposed extrapolation-before-imputation approach substantially reduces the bias seen with existing imputation methods, sometimes even when the parametric extension was misspecified. We further demonstrate how imputing with corrected conditional means can prioritize subjects for clinical trials. The R code to reproduce results is available in the Supplementary Material.</p>","PeriodicalId":15422,"journal":{"name":"Journal of Computational and Graphical Statistics","volume":" ","pages":""},"PeriodicalIF":1.8,"publicationDate":"2025-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12435536/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145075381","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Majorization-Minimization Gauss-Newton Method for 1-Bit Matrix Completion.","authors":"Xiaoqian Liu, Xu Han, Eric C Chi, Boaz Nadler","doi":"10.1080/10618600.2024.2428610","DOIUrl":"https://doi.org/10.1080/10618600.2024.2428610","url":null,"abstract":"<p><p>In 1-bit matrix completion, the aim is to estimate an underlying low-rank matrix from a partial set of binary observations. We propose a novel method for 1-bit matrix completion called Majorization-Minimization Gauss-Newton (MMGN). Our method is based on the majorization-minimization principle, which converts the original optimization problem into a sequence of standard low-rank matrix completion problems. We solve each of these subproblems by a factorization approach that explicitly enforces the assumed low-rank structure and then apply a Gauss-Newton method. Using simulations and a real data example, we illustrate that in comparison to existing 1-bit matrix completion methods, MMGN outputs comparable if not more accurate estimates. In addition, it is often significantly faster, and less sensitive to the spikiness of the underlying matrix. In comparison with three standard generic optimization approaches that directly minimize the original objective, MMGN also exhibits a clear computational advantage, especially when the fraction of observed entries is small.</p>","PeriodicalId":15422,"journal":{"name":"Journal of Computational and Graphical Statistics","volume":" ","pages":""},"PeriodicalIF":1.8,"publicationDate":"2025-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12327443/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144955805","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"High-Dimensional Block Diagonal Covariance Structure Detection Using Singular Vectors","authors":"Jan O. Bauer","doi":"10.1080/10618600.2024.2422985","DOIUrl":"https://doi.org/10.1080/10618600.2024.2422985","url":null,"abstract":"The assumption of independent subvectors arises in many aspects of multivariate analysis. In most real-world applications, however, we lack prior knowledge about the number of subvectors and the sp...","PeriodicalId":15422,"journal":{"name":"Journal of Computational and Graphical Statistics","volume":"12 1","pages":""},"PeriodicalIF":2.4,"publicationDate":"2024-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142589065","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multi-task Learning for Gaussian Graphical Regressions with High Dimensional Covariates","authors":"Jingfei Zhang, Yi Li","doi":"10.1080/10618600.2024.2421246","DOIUrl":"https://doi.org/10.1080/10618600.2024.2421246","url":null,"abstract":"Gaussian graphical regression is a powerful approach for regressing the precision matrix of a Gaussian graphical model on covariates, which permits the response variables and covariates to outnumbe...","PeriodicalId":15422,"journal":{"name":"Journal of Computational and Graphical Statistics","volume":"5 1","pages":""},"PeriodicalIF":2.4,"publicationDate":"2024-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142589068","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}