Konstantinos Pantazis, Avanti Athreya, Jesús Arroyo, William N Frost, Evan S Hill, Vince Lyzinski
{"title":"The Importance of Being Correlated: Implications of Dependence in Joint Spectral Inference across Multiple Networks.","authors":"Konstantinos Pantazis, Avanti Athreya, Jesús Arroyo, William N Frost, Evan S Hill, Vince Lyzinski","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Spectral inference on multiple networks is a rapidly-developing subfield of graph statistics. Recent work has demonstrated that joint, or simultaneous, spectral embedding of multiple independent networks can deliver more accurate estimation than individual spectral decompositions of those same networks. Such inference procedures typically rely heavily on independence assumptions across the multiple network realizations, and even in this case, little attention has been paid to the induced network correlation that can be a consequence of such joint embeddings. In this paper, we present a <i>generalized omnibus</i> embedding methodology and we provide a detailed analysis of this embedding across both independent and correlated networks, the latter of which significantly extends the reach of such procedures, and we describe how this omnibus embedding can itself induce correlation. This leads us to distinguish between <i>inherent</i> correlation-that is, the correlation that arises naturally in multisample network data-and <i>induced</i> correlation, which is an artifice of the joint embedding methodology. We show that the generalized omnibus embedding procedure is flexible and robust, and we prove both consistency and a central limit theorem for the embedded points. We examine how induced and inherent correlation can impact inference for network time series data, and we provide network analogues of classical questions such as the effective sample size for more generally correlated data. Further, we show how an appropriately calibrated generalized omnibus embedding can detect changes in real biological networks that previous embedding procedures could not discern, confirming that the effect of inherent and induced correlation can be subtle and transformative. By allowing for and deconstructing both forms of correlation, our methodology widens the scope of spectral techniques for network inference, with import in theory and practice.</p>","PeriodicalId":50161,"journal":{"name":"Journal of Machine Learning Research","volume":"23 ","pages":""},"PeriodicalIF":6.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10465120/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10127031","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Generalized Sparse Additive Models.","authors":"Asad Haris, Noah Simon, Ali Shojaie","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>We present a unified framework for estimation and analysis of generalized additive models in high dimensions. The framework defines a large class of penalized regression estimators, encompassing many existing methods. An efficient computational algorithm for this class is presented that easily scales to thousands of observations and features. We prove minimax optimal convergence bounds for this class under a weak compatibility condition. In addition, we characterize the rate of convergence when this compatibility condition is not met. Finally, we also show that the optimal penalty parameters for structure and sparsity penalties in our framework are linked, allowing cross-validation to be conducted over only a single tuning parameter. We complement our theoretical results with empirical studies comparing some existing methods within this framework.</p>","PeriodicalId":50161,"journal":{"name":"Journal of Machine Learning Research","volume":"23 ","pages":""},"PeriodicalIF":6.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10593424/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49693499","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Muxuan Liang, Young-Geun Choi, Yang Ning, Maureen A Smith, Ying-Qi Zhao
{"title":"Estimation and inference on high-dimensional individualized treatment rule in observational data using split-and-pooled de-correlated score.","authors":"Muxuan Liang, Young-Geun Choi, Yang Ning, Maureen A Smith, Ying-Qi Zhao","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>With the increasing adoption of electronic health records, there is an increasing interest in developing individualized treatment rules, which recommend treatments according to patients' characteristics, from large observational data. However, there is a lack of valid inference procedures for such rules developed from this type of data in the presence of high-dimensional covariates. In this work, we develop a penalized doubly robust method to estimate the optimal individualized treatment rule from high-dimensional data. We propose a split-and-pooled de-correlated score to construct hypothesis tests and confidence intervals. Our proposal adopts the data splitting to conquer the slow convergence rate of nuisance parameter estimations, such as non-parametric methods for outcome regression or propensity models. We establish the limiting distributions of the split-and-pooled de-correlated score test and the corresponding one-step estimator in high-dimensional setting. Simulation and real data analysis are conducted to demonstrate the superiority of the proposed method.</p>","PeriodicalId":50161,"journal":{"name":"Journal of Machine Learning Research","volume":"23 ","pages":""},"PeriodicalIF":6.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10720606/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138811858","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Rethinking Nonlinear Instrumental Variable Models through Prediction Validity.","authors":"Chunxiao Li, Cynthia Rudin, Tyler H McCormick","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Instrumental variables (IV) are widely used in the social and health sciences in situations where a researcher would like to measure a causal effect but cannot perform an experiment. For valid causal inference in an IV model, there must be external (exogenous) variation that (i) has a sufficiently large impact on the variable of interest (called the <i>relevance assumption</i>) and where (ii) the only pathway through which the external variation impacts the outcome is via the variable of interest (called the <i>exclusion restriction</i>). For statistical inference, researchers must also make assumptions about the functional form of the relationship between the three variables. Current practice assumes (i) and (ii) are met, then postulates a functional form with limited input from the data. In this paper, we describe a framework that leverages machine learning to validate these typically unchecked but consequential assumptions in the IV framework, providing the researcher empirical evidence about the quality of the instrument given the data at hand. Central to the proposed approach is the idea of <i>prediction validity</i>. Prediction validity checks that error terms - which should be independent from the instrument - cannot be modeled with machine learning any better than a model that is identically zero. We use prediction validity to develop both one-stage and two-stage approaches for IV, and demonstrate their performance on an example relevant to climate change policy.</p>","PeriodicalId":50161,"journal":{"name":"Journal of Machine Learning Research","volume":"23 ","pages":""},"PeriodicalIF":4.3,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11539950/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142591746","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Prior Adaptive Semi-supervised Learning with Application to EHR Phenotyping.","authors":"Yichi Zhang, Molei Liu, Matey Neykov, Tianxi Cai","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Electronic Health Record (EHR) data, a rich source for biomedical research, have been successfully used to gain novel insight into a wide range of diseases. Despite its potential, EHR is currently underutilized for discovery research due to its major limitation in the lack of precise phenotype information. To overcome such difficulties, recent efforts have been devoted to developing supervised algorithms to accurately predict phenotypes based on relatively small training datasets with gold standard labels extracted via chart review. However, supervised methods typically require a sizable training set to yield generalizable algorithms, especially when the number of candidate features, <math><mi>p</mi></math>, is large. In this paper, we propose a semi-supervised (SS) EHR phenotyping method that borrows information from both a small, labeled dataset (where both the label <math><mi>Y</mi></math> and the feature set <math><mi>X</mi></math> are observed) and a much larger, weakly-labeled dataset in which the feature set <math><mi>X</mi></math> is accompanied only by a surrogate label <math><mi>S</mi></math> that is available to all patients. Under a <i>working</i> prior assumption that <math><mi>S</mi></math> is related to <math><mi>X</mi></math> only through <math><mi>Y</mi></math> and allowing it to hold <i>approximately</i>, we propose a prior adaptive semi-supervised (PASS) estimator that incorporates the prior knowledge by shrinking the estimator towards a direction derived under the prior. We derive asymptotic theory for the proposed estimator and justify its efficiency and robustness to prior information of poor quality. We also demonstrate its superiority over existing estimators under various scenarios via simulation studies and on three real-world EHR phenotyping studies at a large tertiary hospital.</p>","PeriodicalId":50161,"journal":{"name":"Journal of Machine Learning Research","volume":"23 ","pages":""},"PeriodicalIF":6.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10653017/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136400046","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yang Ni, Francesco C Stingo, Veerabhadran Baladandayuthapani
{"title":"Bayesian Covariate-Dependent Gaussian Graphical Models with Varying Structure.","authors":"Yang Ni, Francesco C Stingo, Veerabhadran Baladandayuthapani","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>We introduce Bayesian Gaussian graphical models with covariates (GGMx), a class of multivariate Gaussian distributions with covariate-dependent sparse precision matrix. We propose a general construction of a functional mapping from the covariate space to the cone of sparse positive definite matrices, which encompasses many existing graphical models for heterogeneous settings. Our methodology is based on a novel mixture prior for precision matrices with a non-local component that admits attractive theoretical and empirical properties. The flexible formulation of GGMx allows both the strength and the sparsity pattern of the precision matrix (hence the graph structure) change with the covariates. Posterior inference is carried out with a carefully designed Markov chain Monte Carlo algorithm, which ensures the positive definiteness of sparse precision matrices at any given covariates' values. Extensive simulations and a case study in cancer genomics demonstrate the utility of the proposed model.</p>","PeriodicalId":50161,"journal":{"name":"Journal of Machine Learning Research","volume":"23 242","pages":""},"PeriodicalIF":6.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10552903/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41161813","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"D-GCCA: Decomposition-based Generalized Canonical Correlation Analysis for Multi-view High-dimensional Data.","authors":"Hai Shu, Zhe Qu, Hongtu Zhu","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Modern biomedical studies often collect multi-view data, that is, multiple types of data measured on the same set of objects. A popular model in high-dimensional multi-view data analysis is to decompose each view's data matrix into a low-rank common-source matrix generated by latent factors common across all data views, a low-rank distinctive-source matrix corresponding to each view, and an additive noise matrix. We propose a novel decomposition method for this model, called decomposition-based generalized canonical correlation analysis (D-GCCA). The D-GCCA rigorously defines the decomposition on the <math> <mrow><msup><mi>L</mi> <mn>2</mn></msup> </mrow> </math> space of random variables in contrast to the Euclidean dot product space used by most existing methods, thereby being able to provide the estimation consistency for the low-rank matrix recovery. Moreover, to well calibrate common latent factors, we impose a desirable orthogonality constraint on distinctive latent factors. Existing methods, however, inadequately consider such orthogonality and may thus suffer from substantial loss of undetected common-source variation. Our D-GCCA takes one step further than generalized canonical correlation analysis by separating common and distinctive components among canonical variables, while enjoying an appealing interpretation from the perspective of principal component analysis. Furthermore, we propose to use the variable-level proportion of signal variance explained by common or distinctive latent factors for selecting the variables most influenced. Consistent estimators of our D-GCCA method are established with good finite-sample numerical performance, and have closed-form expressions leading to efficient computation especially for large-scale data. The superiority of D-GCCA over state-of-the-art methods is also corroborated in simulations and real-world data examples.</p>","PeriodicalId":50161,"journal":{"name":"Journal of Machine Learning Research","volume":"23 ","pages":""},"PeriodicalIF":4.3,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9380864/pdf/nihms-1815754.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10468609","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Non-asymptotic Properties of Individualized Treatment Rules from Sequentially Rule-Adaptive Trials.","authors":"Daiqi Gao, Yufeng Liu, Donglin Zeng","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Learning optimal individualized treatment rules (ITRs) has become increasingly important in the modern era of precision medicine. Many statistical and machine learning methods for learning optimal ITRs have been developed in the literature. However, most existing methods are based on data collected from traditional randomized controlled trials and thus cannot take advantage of the accumulative evidence when patients enter the trials sequentially. It is also ethically important that future patients should have a high probability to be treated optimally based on the updated knowledge so far. In this work, we propose a new design called sequentially rule-adaptive trials to learn optimal ITRs based on the contextual bandit framework, in contrast to the response-adaptive design in traditional adaptive trials. In our design, each entering patient will be allocated with a high probability to the current best treatment for this patient, which is estimated using the past data based on some machine learning algorithm (for example, outcome weighted learning in our implementation). We explore the tradeoff between training and test values of the estimated ITR in single-stage problems by proving theoretically that for a higher probability of following the estimated ITR, the training value converges to the optimal value at a faster rate, while the test value converges at a slower rate. This problem is different from traditional decision problems in the sense that the training data are generated sequentially and are dependent. We also develop a tool that combines martingale with empirical process to tackle the problem that cannot be solved by previous techniques for i.i.d. data. We show by numerical examples that without much loss of the test value, our proposed algorithm can improve the training value significantly as compared to existing methods. Finally, we use a real data study to illustrate the performance of the proposed method.</p>","PeriodicalId":50161,"journal":{"name":"Journal of Machine Learning Research","volume":"23 250","pages":""},"PeriodicalIF":6.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10419117/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10008225","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Interpretable Classification of Categorical Time Series Using the Spectral Envelope and Optimal Scalings.","authors":"Zeda Li, Scott A Bruce, Tian Cai","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>This article introduces a novel approach to the classification of categorical time series under the supervised learning paradigm. To construct meaningful features for categorical time series classification, we consider two relevant quantities: the spectral envelope and its corresponding set of optimal scalings. These quantities characterize oscillatory patterns in a categorical time series as the largest possible power at each frequency, or <i>spectral envelope</i>, obtained by assigning numerical values, or <i>scalings</i>, to categories that optimally emphasize oscillations at each frequency. Our procedure combines these two quantities to produce an interpretable and parsimonious feature-based classifier that can be used to accurately determine group membership for categorical time series. Classification consistency of the proposed method is investigated, and simulation studies are used to demonstrate accuracy in classifying categorical time series with various underlying group structures. Finally, we use the proposed method to explore key differences in oscillatory patterns of sleep stage time series for patients with different sleep disorders and accurately classify patients accordingly. The code for implementing the proposed method is available at https://github.com/zedali16/envsca.</p>","PeriodicalId":50161,"journal":{"name":"Journal of Machine Learning Research","volume":"23 299","pages":""},"PeriodicalIF":6.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10210597/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9529646","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Bayesian subset selection and variable importance for interpretable prediction and classification.","authors":"Daniel R Kowal","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Subset selection is a valuable tool for interpretable learning, scientific discovery, and data compression. However, classical subset selection is often avoided due to selection instability, lack of regularization, and difficulties with post-selection inference. We address these challenges from a Bayesian perspective. Given any Bayesian predictive model <math><mi>ℳ</mi></math>, we extract a <i>family</i> of near-optimal subsets of variables for linear prediction or classification. This strategy deemphasizes the role of a single \"best\" subset and instead advances the broader perspective that often many subsets are highly competitive. The <i>acceptable family</i> of subsets offers a new pathway for model interpretation and is neatly summarized by key members such as the smallest acceptable subset, along with new (co-) variable importance metrics based on whether variables (co-) appear in all, some, or no acceptable subsets. More broadly, we apply Bayesian decision analysis to derive the optimal linear coefficients for <i>any</i> subset of variables. These coefficients inherit both regularization and predictive uncertainty quantification via <math><mi>ℳ</mi></math>. For both simulated and real data, the proposed approach exhibits better prediction, interval estimation, and variable selection than competing Bayesian and frequentist selection methods. These tools are applied to a large education dataset with highly correlated covariates. Our analysis provides unique insights into the combination of environmental, socioeconomic, and demographic factors that predict educational outcomes, and identifies over 200 distinct subsets of variables that offer near-optimal out-of-sample predictive accuracy.</p>","PeriodicalId":50161,"journal":{"name":"Journal of Machine Learning Research","volume":"23 ","pages":""},"PeriodicalIF":6.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10723825/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138811860","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}