{"title":"Reinforcement Learning with Function Approximation: From Linear to Nonlinear","authors":"Jihao Long and Jiequn Han","doi":"10.4208/jml.230105","DOIUrl":"https://doi.org/10.4208/jml.230105","url":null,"abstract":"","PeriodicalId":50161,"journal":{"name":"Journal of Machine Learning Research","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135887743","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Why Self-Attention is Natural for Sequence-to-Sequence Problems? A Perspective from Symmetries","authors":"Chao Ma and Lexing Ying null","doi":"10.4208/jml.221206","DOIUrl":"https://doi.org/10.4208/jml.221206","url":null,"abstract":"","PeriodicalId":50161,"journal":{"name":"Journal of Machine Learning Research","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135142632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"<ArticleTitle xmlns:ns0=\"http://www.w3.org/1998/Math/MathML\">Selective inference for <ns0:math><ns0:mi>k</ns0:mi></ns0:math>-means clustering.","authors":"Yiqun T Chen, Daniela M Witten","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>We consider the problem of testing for a difference in means between clusters of observations identified via <math><mi>k</mi></math>-means clustering. In this setting, classical hypothesis tests lead to an inflated Type I error rate. In recent work, Gao et al. (2022) considered a related problem in the context of hierarchical clustering. Unfortunately, their solution is highly-tailored to the context of hierarchical clustering, and thus cannot be applied in the setting of <math><mi>k</mi></math>-means clustering. In this paper, we propose a p-value that conditions on all of the intermediate clustering assignments in the <math><mi>k</mi></math>-means algorithm. We show that the p-value controls the selective Type I error for a test of the difference in means between a pair of clusters obtained using <math><mi>k</mi></math>-means clustering in finite samples, and can be efficiently computed. We apply our proposal on hand-written digits data and on single-cell RNA-sequencing data.</p>","PeriodicalId":50161,"journal":{"name":"Journal of Machine Learning Research","volume":"24 ","pages":""},"PeriodicalIF":4.3,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10805457/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139543526","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Di Zhao, Weiming Li, Wengu Chen, Peng Song, and Han Wang null
{"title":"RNN-Attention Based Deep Learning for Solving Inverse Boundary Problems in Nonlinear Marshak Waves","authors":"Di Zhao, Weiming Li, Wengu Chen, Peng Song, and Han Wang null","doi":"10.4208/jml.221209","DOIUrl":"https://doi.org/10.4208/jml.221209","url":null,"abstract":". Radiative transfer, described by the radiative transfer equation (RTE), is one of the dominant energy exchange processes in the inertial confinement fusion (ICF) experiments. The Marshak wave problem is an important benchmark for time-dependent RTE. In this work, we present a neural network architecture termed RNN-attention deep learning (RADL) as a surrogate model to solve the inverse boundary problem of the nonlinear Marshak wave in a data-driven fashion. We train the surrogate model by numerical simulation data of the forward problem, and then solve the inverse problem by minimizing the distance between the target solution and the surrogate predicted solution concerning the boundary condition. This minimization is made efficient because the surrogate model by-passes the expensive numerical solution, and the model is differentiable so the gradient-based optimization algorithms are adopted. The effectiveness of our approach is demonstrated by solving the inverse boundary problems of the Marshak wave benchmark in two case studies: where the transport process is modeled by RTE and where it is modeled by its nonlinear diffusion approximation (DA). Last but not least, the importance of using both the RNN and the factor-attention blocks in the RADL model is illustrated, and the data efficiency of our model is investigated in this work.","PeriodicalId":50161,"journal":{"name":"Journal of Machine Learning Research","volume":"75 1","pages":""},"PeriodicalIF":6.0,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74640699","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Inference for Gaussian Processes with Matérn Covariogram on Compact Riemannian Manifolds.","authors":"Didong Li, Wenpin Tang, Sudipto Banerjee","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Gaussian processes are widely employed as versatile modelling and predictive tools in spatial statistics, functional data analysis, computer modelling and diverse applications of machine learning. They have been widely studied over Euclidean spaces, where they are specified using covariance functions or covariograms for modelling complex dependencies. There is a growing literature on Gaussian processes over Riemannian manifolds in order to develop richer and more flexible inferential frameworks for non-Euclidean data. While numerical approximations through graph representations have been well studied for the Matérn covariogram and heat kernel, the behaviour of asymptotic inference on the parameters of the covariogram has received relatively scant attention. We focus on asymptotic behaviour for Gaussian processes constructed over compact Riemannian manifolds. Building upon a recently introduced Matérn covariogram on a compact Riemannian manifold, we employ formal notions and conditions for the equivalence of two Matérn Gaussian random measures on compact manifolds to derive the parameter that is identifiable, also known as the microergodic parameter, and formally establish the consistency of the maximum likelihood estimate and the asymptotic optimality of the best linear unbiased predictor. The circle is studied as a specific example of compact Riemannian manifolds with numerical experiments to illustrate and corroborate the theory.</p>","PeriodicalId":50161,"journal":{"name":"Journal of Machine Learning Research","volume":"24 ","pages":""},"PeriodicalIF":6.0,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10361735/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9876354","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Bayesian Data Selection.","authors":"Eli N Weinstein, Jeffrey W Miller","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Insights into complex, high-dimensional data can be obtained by discovering features of the data that match or do not match a model of interest. To formalize this task, we introduce the \"data selection\" problem: finding a lower-dimensional statistic-such as a subset of variables-that is well fit by a given parametric model of interest. A fully Bayesian approach to data selection would be to parametrically model the value of the statistic, nonparametrically model the remaining \"background\" components of the data, and perform standard Bayesian model selection for the choice of statistic. However, fitting a nonparametric model to high-dimensional data tends to be highly inefficient, statistically and computationally. We propose a novel score for performing data selection, the \"Stein volume criterion (SVC)\", that does not require fitting a nonparametric model. The SVC takes the form of a generalized marginal likelihood with a kernelized Stein discrepancy in place of the Kullback-Leibler divergence. We prove that the SVC is consistent for data selection, and establish consistency and asymptotic normality of the corresponding generalized posterior on parameters. We apply the SVC to the analysis of single-cell RNA sequencing data sets using probabilistic principal components analysis and a spin glass model of gene regulation.</p>","PeriodicalId":50161,"journal":{"name":"Journal of Machine Learning Research","volume":"24 23","pages":""},"PeriodicalIF":6.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10194814/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9574086","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"DART: Distance Assisted Recursive Testing.","authors":"Xuechan Li, Anthony D Sung, Jichun Xie","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Multiple testing is a commonly used tool in modern data science. Sometimes, the hypotheses are embedded in a space; the distances between the hypotheses reflect their co-null/co-alternative patterns. Properly incorporating the distance information in testing will boost testing power. Hence, we developed a new multiple testing framework named Distance Assisted Recursive Testing (DART). DART features in joint artificial intelligence (AI) and statistics modeling. It has two stages. The first stage uses AI models to construct an aggregation tree that reflects the distance information. The second stage uses statistical models to embed the testing on the tree and control the false discovery rate. Theoretical analysis and numerical experiments demonstrated that DART generates valid, robust, and powerful results. We applied DART to a clinical trial in the allogeneic stem cell transplantation study to identify the gut microbiota whose abundance was impacted by post-transplant care.</p>","PeriodicalId":50161,"journal":{"name":"Journal of Machine Learning Research","volume":"24 ","pages":""},"PeriodicalIF":4.3,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11636646/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142819880","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Inference for a Large Directed Acyclic Graph with Unspecified Interventions.","authors":"Chunlin Li, Xiaotong Shen, Wei Pan","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Statistical inference of directed relations given some unspecified interventions (i.e., the intervention targets are unknown) is challenging. In this article, we test hypothesized directed relations with unspecified interventions. First, we derive conditions to yield an identifiable model. Unlike classical inference, testing directed relations requires to identify the ancestors and relevant interventions of hypothesis-specific primary variables. To this end, we propose a peeling algorithm based on nodewise regressions to establish a topological order of primary variables. Moreover, we prove that the peeling algorithm yields a consistent estimator in low-order polynomial time. Second, we propose a likelihood ratio test integrated with a data perturbation scheme to account for the uncertainty of identifying ancestors and interventions. Also, we show that the distribution of a data perturbation test statistic converges to the target distribution. Numerical examples demonstrate the utility and effectiveness of the proposed methods, including an application to infer gene regulatory networks. The R implementation is available at https://github.com/chunlinli/intdag.</p>","PeriodicalId":50161,"journal":{"name":"Journal of Machine Learning Research","volume":"24 ","pages":""},"PeriodicalIF":6.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10497226/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10242964","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fair Data Representation for Machine Learning at the Pareto Frontier.","authors":"Shizhou Xu, Thomas Strohmer","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>As machine learning powered decision-making becomes increasingly important in our daily lives, it is imperative to strive for fairness in the underlying data processing. We propose a pre-processing algorithm for fair data representation via which <math> <mrow><msup><mi>L</mi> <mn>2</mn></msup> <mo>(</mo> <mtext>ℙ</mtext> <mo>)</mo></mrow> </math> -objective supervised learning results in estimations of the Pareto frontier between prediction error and statistical disparity. Particularly, the present work applies the optimal affine transport to approach the post-processing Wasserstein barycenter characterization of the optimal fair <math> <mrow><msup><mi>L</mi> <mn>2</mn></msup> </mrow> </math> -objective supervised learning via a pre-processing data deformation. Furthermore, we show that the Wasserstein geodesics from learning outcome marginals to their barycenter characterizes the Pareto frontier between <math> <mrow><msup><mi>L</mi> <mn>2</mn></msup> </mrow> </math> -loss and total Wasserstein distance among the marginals. Numerical simulations underscore the advantages: (1) the pre-processing step is compositive with arbitrary conditional expectation estimation supervised learning methods and unseen data; (2) the fair representation protects the sensitive information by limiting the inference capability of the remaining data with respect to the sensitive data; (3) the optimal affine maps are computationally efficient even for high-dimensional data.</p>","PeriodicalId":50161,"journal":{"name":"Journal of Machine Learning Research","volume":"24 ","pages":""},"PeriodicalIF":4.3,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11494318/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142512129","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Minimax Estimation for Personalized Federated Learning: An Alternative between FedAvg and Local Training?","authors":"Shuxiao Chen, Qinqing Zheng, Qi Long, Weijie J Su","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>A widely recognized difficulty in federated learning arises from the statistical heterogeneity among clients: local datasets often originate from distinct yet not entirely unrelated probability distributions, and personalization is, therefore, necessary to achieve optimal results from each individual's perspective. In this paper, we show how the excess risks of personalized federated learning using a smooth, strongly convex loss depend on data heterogeneity from a minimax point of view, with a focus on the FedAvg algorithm (McMahan et al., 2017) and pure local training (i.e., clients solve empirical risk minimization problems on their local datasets without any communication). Our main result reveals an <i>approximate</i> alternative between these two baseline algorithms for federated learning: the former algorithm is minimax rate optimal over a collection of instances when data heterogeneity is small, whereas the latter is minimax rate optimal when data heterogeneity is large, and the threshold is sharp up to a constant. As an implication, our results show that from a worst-case point of view, a dichotomous strategy that makes a choice between the two baseline algorithms is rate-optimal. Another implication is that the popular FedAvg following by local fine tuning strategy is also minimax optimal under additional regularity conditions. Our analysis relies on a new notion of algorithmic stability that takes into account the nature of federated learning.</p>","PeriodicalId":50161,"journal":{"name":"Journal of Machine Learning Research","volume":"24 ","pages":""},"PeriodicalIF":4.3,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11299893/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141895178","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}