Guy Blanc, Alexandre Hayderi, Caleb Koch, Li-Yang Tan
{"title":"The Sample Complexity of Smooth Boosting and the Tightness of the Hardcore Theorem","authors":"Guy Blanc, Alexandre Hayderi, Caleb Koch, Li-Yang Tan","doi":"arxiv-2409.11597","DOIUrl":"https://doi.org/arxiv-2409.11597","url":null,"abstract":"Smooth boosters generate distributions that do not place too much weight on\u0000any given example. Originally introduced for their noise-tolerant properties,\u0000such boosters have also found applications in differential privacy,\u0000reproducibility, and quantum learning theory. We study and settle the sample\u0000complexity of smooth boosting: we exhibit a class that can be weak learned to\u0000$gamma$-advantage over smooth distributions with $m$ samples, for which strong\u0000learning over the uniform distribution requires\u0000$tilde{Omega}(1/gamma^2)cdot m$ samples. This matches the overhead of\u0000existing smooth boosters and provides the first separation from the setting of\u0000distribution-independent boosting, for which the corresponding overhead is\u0000$O(1/gamma)$. Our work also sheds new light on Impagliazzo's hardcore theorem from\u0000complexity theory, all known proofs of which can be cast in the framework of\u0000smooth boosting. For a function $f$ that is mildly hard against size-$s$\u0000circuits, the hardcore theorem provides a set of inputs on which $f$ is\u0000extremely hard against size-$s'$ circuits. A downside of this important result\u0000is the loss in circuit size, i.e. that $s' ll s$. Answering a question of\u0000Trevisan, we show that this size loss is necessary and in fact, the parameters\u0000achieved by known proofs are the best possible.","PeriodicalId":501340,"journal":{"name":"arXiv - STAT - Machine Learning","volume":"74 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261678","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fractional Naive Bayes (FNB): non-convex optimization for a parsimonious weighted selective naive Bayes classifier","authors":"Carine Hue, Marc Boullé","doi":"arxiv-2409.11100","DOIUrl":"https://doi.org/arxiv-2409.11100","url":null,"abstract":"We study supervised classification for datasets with a very large number of\u0000input variables. The na\"ive Bayes classifier is attractive for its simplicity,\u0000scalability and effectiveness in many real data applications. When the strong\u0000na\"ive Bayes assumption of conditional independence of the input variables\u0000given the target variable is not valid, variable selection and model averaging\u0000are two common ways to improve the performance. In the case of the na\"ive\u0000Bayes classifier, the resulting weighting scheme on the models reduces to a\u0000weighting scheme on the variables. Here we focus on direct estimation of\u0000variable weights in such a weighted na\"ive Bayes classifier. We propose a\u0000sparse regularization of the model log-likelihood, which takes into account\u0000prior penalization costs related to each input variable. Compared to averaging\u0000based classifiers used up until now, our main goal is to obtain parsimonious\u0000robust models with less variables and equivalent performance. The direct\u0000estimation of the variable weights amounts to a non-convex optimization problem\u0000for which we propose and compare several two-stage algorithms. First, the\u0000criterion obtained by convex relaxation is minimized using several variants of\u0000standard gradient methods. Then, the initial non-convex optimization problem is\u0000solved using local optimization methods initialized with the result of the\u0000first stage. The various proposed algorithms result in optimization-based\u0000weighted na\"ive Bayes classifiers, that are evaluated on benchmark datasets\u0000and positioned w.r.t. to a reference averaging-based classifier.","PeriodicalId":501340,"journal":{"name":"arXiv - STAT - Machine Learning","volume":"94 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261684","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Partially Observable Contextual Bandits with Linear Payoffs","authors":"Sihan Zeng, Sujay Bhatt, Alec Koppel, Sumitra Ganesh","doi":"arxiv-2409.11521","DOIUrl":"https://doi.org/arxiv-2409.11521","url":null,"abstract":"The standard contextual bandit framework assumes fully observable and\u0000actionable contexts. In this work, we consider a new bandit setting with\u0000partially observable, correlated contexts and linear payoffs, motivated by the\u0000applications in finance where decision making is based on market information\u0000that typically displays temporal correlation and is not fully observed. We make\u0000the following contributions marrying ideas from statistical signal processing\u0000with bandits: (i) We propose an algorithmic pipeline named EMKF-Bandit, which\u0000integrates system identification, filtering, and classic contextual bandit\u0000algorithms into an iterative method alternating between latent parameter\u0000estimation and decision making. (ii) We analyze EMKF-Bandit when we select\u0000Thompson sampling as the bandit algorithm and show that it incurs a sub-linear\u0000regret under conditions on filtering. (iii) We conduct numerical simulations\u0000that demonstrate the benefits and practical applicability of the proposed\u0000pipeline.","PeriodicalId":501340,"journal":{"name":"arXiv - STAT - Machine Learning","volume":"53 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261677","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Kolmogorov-Arnold Networks in Low-Data Regimes: A Comparative Study with Multilayer Perceptrons","authors":"Farhad Pourkamali-Anaraki","doi":"arxiv-2409.10463","DOIUrl":"https://doi.org/arxiv-2409.10463","url":null,"abstract":"Multilayer Perceptrons (MLPs) have long been a cornerstone in deep learning,\u0000known for their capacity to model complex relationships. Recently,\u0000Kolmogorov-Arnold Networks (KANs) have emerged as a compelling alternative,\u0000utilizing highly flexible learnable activation functions directly on network\u0000edges, a departure from the neuron-centric approach of MLPs. However, KANs\u0000significantly increase the number of learnable parameters, raising concerns\u0000about their effectiveness in data-scarce environments. This paper presents a\u0000comprehensive comparative study of MLPs and KANs from both algorithmic and\u0000experimental perspectives, with a focus on low-data regimes. We introduce an\u0000effective technique for designing MLPs with unique, parameterized activation\u0000functions for each neuron, enabling a more balanced comparison with KANs. Using\u0000empirical evaluations on simulated data and two real-world data sets from\u0000medicine and engineering, we explore the trade-offs between model complexity\u0000and accuracy, with particular attention to the role of network depth. Our\u0000findings show that MLPs with individualized activation functions achieve\u0000significantly higher predictive accuracy with only a modest increase in\u0000parameters, especially when the sample size is limited to around one hundred.\u0000For example, in a three-class classification problem within additive\u0000manufacturing, MLPs achieve a median accuracy of 0.91, significantly\u0000outperforming KANs, which only reach a median accuracy of 0.53 with default\u0000hyperparameters. These results offer valuable insights into the impact of\u0000activation function selection in neural networks.","PeriodicalId":501340,"journal":{"name":"arXiv - STAT - Machine Learning","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261691","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Huanbiao Zhu, Krish Desai, Mikael Kuusela, Vinicius Mikuni, Benjamin Nachman, Larry Wasserman
{"title":"Multidimensional Deconvolution with Profiling","authors":"Huanbiao Zhu, Krish Desai, Mikael Kuusela, Vinicius Mikuni, Benjamin Nachman, Larry Wasserman","doi":"arxiv-2409.10421","DOIUrl":"https://doi.org/arxiv-2409.10421","url":null,"abstract":"In many experimental contexts, it is necessary to statistically remove the\u0000impact of instrumental effects in order to physically interpret measurements.\u0000This task has been extensively studied in particle physics, where the\u0000deconvolution task is called unfolding. A number of recent methods have shown\u0000how to perform high-dimensional, unbinned unfolding using machine learning.\u0000However, one of the assumptions in all of these methods is that the detector\u0000response is accurately modeled in the Monte Carlo simulation. In practice, the\u0000detector response depends on a number of nuisance parameters that can be\u0000constrained with data. We propose a new algorithm called Profile OmniFold\u0000(POF), which works in a similar iterative manner as the OmniFold (OF) algorithm\u0000while being able to simultaneously profile the nuisance parameters. We\u0000illustrate the method with a Gaussian example as a proof of concept\u0000highlighting its promising capabilities.","PeriodicalId":501340,"journal":{"name":"arXiv - STAT - Machine Learning","volume":"18 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261692","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Variance-reduced first-order methods for deterministically constrained stochastic nonconvex optimization with strong convergence guarantees","authors":"Zhaosong Lu, Sanyou Mei, Yifeng Xiao","doi":"arxiv-2409.09906","DOIUrl":"https://doi.org/arxiv-2409.09906","url":null,"abstract":"In this paper, we study a class of deterministically constrained stochastic\u0000optimization problems. Existing methods typically aim to find an\u0000$epsilon$-stochastic stationary point, where the expected violations of both\u0000the constraints and first-order stationarity are within a prescribed accuracy\u0000of $epsilon$. However, in many practical applications, it is crucial that the\u0000constraints be nearly satisfied with certainty, making such an\u0000$epsilon$-stochastic stationary point potentially undesirable due to the risk\u0000of significant constraint violations. To address this issue, we propose\u0000single-loop variance-reduced stochastic first-order methods, where the\u0000stochastic gradient of the stochastic component is computed using either a\u0000truncated recursive momentum scheme or a truncated Polyak momentum scheme for\u0000variance reduction, while the gradient of the deterministic component is\u0000computed exactly. Under the error bound condition with a parameter $theta geq\u00001$ and other suitable assumptions, we establish that the proposed methods\u0000achieve a sample complexity and first-order operation complexity of $widetilde\u0000O(epsilon^{-max{4, 2theta}})$ for finding a stronger $epsilon$-stochastic\u0000stationary point, where the constraint violation is within $epsilon$ with\u0000certainty, and the expected violation of first-order stationarity is within\u0000$epsilon$. To the best of our knowledge, this is the first work to develop\u0000methods with provable complexity guarantees for finding an approximate\u0000stochastic stationary point of such problems that nearly satisfies all\u0000constraints with certainty.","PeriodicalId":501340,"journal":{"name":"arXiv - STAT - Machine Learning","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261740","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Robust Reinforcement Learning with Dynamic Distortion Risk Measures","authors":"Anthony Coache, Sebastian Jaimungal","doi":"arxiv-2409.10096","DOIUrl":"https://doi.org/arxiv-2409.10096","url":null,"abstract":"In a reinforcement learning (RL) setting, the agent's optimal strategy\u0000heavily depends on her risk preferences and the underlying model dynamics of\u0000the training environment. These two aspects influence the agent's ability to\u0000make well-informed and time-consistent decisions when facing testing\u0000environments. In this work, we devise a framework to solve robust risk-aware RL\u0000problems where we simultaneously account for environmental uncertainty and risk\u0000with a class of dynamic robust distortion risk measures. Robustness is\u0000introduced by considering all models within a Wasserstein ball around a\u0000reference model. We estimate such dynamic robust risk measures using neural\u0000networks by making use of strictly consistent scoring functions, derive policy\u0000gradient formulae using the quantile representation of distortion risk\u0000measures, and construct an actor-critic algorithm to solve this class of robust\u0000risk-aware RL problems. We demonstrate the performance of our algorithm on a\u0000portfolio allocation example.","PeriodicalId":501340,"journal":{"name":"arXiv - STAT - Machine Learning","volume":"47 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261738","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shengchao Liu, Divin Yan, Weitao Du, Weiyang Liu, Zhuoxinran Li, Hongyu Guo, Christian Borgs, Jennifer Chayes, Anima Anandkumar
{"title":"Manifold-Constrained Nucleus-Level Denoising Diffusion Model for Structure-Based Drug Design","authors":"Shengchao Liu, Divin Yan, Weitao Du, Weiyang Liu, Zhuoxinran Li, Hongyu Guo, Christian Borgs, Jennifer Chayes, Anima Anandkumar","doi":"arxiv-2409.10584","DOIUrl":"https://doi.org/arxiv-2409.10584","url":null,"abstract":"Artificial intelligence models have shown great potential in structure-based\u0000drug design, generating ligands with high binding affinities. However, existing\u0000models have often overlooked a crucial physical constraint: atoms must maintain\u0000a minimum pairwise distance to avoid separation violation, a phenomenon\u0000governed by the balance of attractive and repulsive forces. To mitigate such\u0000separation violations, we propose NucleusDiff. It models the interactions\u0000between atomic nuclei and their surrounding electron clouds by enforcing the\u0000distance constraint between the nuclei and manifolds. We quantitatively\u0000evaluate NucleusDiff using the CrossDocked2020 dataset and a COVID-19\u0000therapeutic target, demonstrating that NucleusDiff reduces violation rate by up\u0000to 100.00% and enhances binding affinity by up to 22.16%, surpassing\u0000state-of-the-art models for structure-based drug design. We also provide\u0000qualitative analysis through manifold sampling, visually confirming the\u0000effectiveness of NucleusDiff in reducing separation violations and improving\u0000binding affinities.","PeriodicalId":501340,"journal":{"name":"arXiv - STAT - Machine Learning","volume":"27 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261687","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Bayesian Interpretation of Adaptive Low-Rank Adaptation","authors":"Haolin Chen, Philip N. Garner","doi":"arxiv-2409.10673","DOIUrl":"https://doi.org/arxiv-2409.10673","url":null,"abstract":"Motivated by the sensitivity-based importance score of the adaptive low-rank\u0000adaptation (AdaLoRA), we utilize more theoretically supported metrics,\u0000including the signal-to-noise ratio (SNR), along with the Improved Variational\u0000Online Newton (IVON) optimizer, for adaptive parameter budget allocation. The\u0000resulting Bayesian counterpart not only has matched or surpassed the\u0000performance of using the sensitivity-based importance metric but is also a\u0000faster alternative to AdaLoRA with Adam. Our theoretical analysis reveals a\u0000significant connection between the two metrics, providing a Bayesian\u0000perspective on the efficacy of sensitivity as an importance score. Furthermore,\u0000our findings suggest that the magnitude, rather than the variance, is the\u0000primary indicator of the importance of parameters.","PeriodicalId":501340,"journal":{"name":"arXiv - STAT - Machine Learning","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261686","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Spatiotemporal Covariance Neural Networks","authors":"Andrea Cavallo, Mohammad Sabbaqi, Elvin Isufi","doi":"arxiv-2409.10068","DOIUrl":"https://doi.org/arxiv-2409.10068","url":null,"abstract":"Modeling spatiotemporal interactions in multivariate time series is key to\u0000their effective processing, but challenging because of their irregular and\u0000often unknown structure. Statistical properties of the data provide useful\u0000biases to model interdependencies and are leveraged by correlation and\u0000covariance-based networks as well as by processing pipelines relying on\u0000principal component analysis (PCA). However, PCA and its temporal extensions\u0000suffer instabilities in the covariance eigenvectors when the corresponding\u0000eigenvalues are close to each other, making their application to dynamic and\u0000streaming data settings challenging. To address these issues, we exploit the\u0000analogy between PCA and graph convolutional filters to introduce the\u0000SpatioTemporal coVariance Neural Network (STVNN), a relational learning model\u0000that operates on the sample covariance matrix of the time series and leverages\u0000joint spatiotemporal convolutions to model the data. To account for the\u0000streaming and non-stationary setting, we consider an online update of the\u0000parameters and sample covariance matrix. We prove the STVNN is stable to the\u0000uncertainties introduced by these online estimations, thus improving over\u0000temporal PCA-based methods. Experimental results corroborate our theoretical\u0000findings and show that STVNN is competitive for multivariate time series\u0000processing, it adapts to changes in the data distribution, and it is orders of\u0000magnitude more stable than online temporal PCA.","PeriodicalId":501340,"journal":{"name":"arXiv - STAT - Machine Learning","volume":"6 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261739","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}