Proceedings of the ... International Conference on Machine Learning. International Conference on Machine Learning最新文献

筛选
英文 中文
Can Neural Network Memorization Be Localized? 神经网络记忆可以局部化吗?
Pratyush Maini, M. Mozer, Hanie Sedghi, Zachary Chase Lipton, J. Z. Kolter, Chiyuan Zhang
{"title":"Can Neural Network Memorization Be Localized?","authors":"Pratyush Maini, M. Mozer, Hanie Sedghi, Zachary Chase Lipton, J. Z. Kolter, Chiyuan Zhang","doi":"10.48550/arXiv.2307.09542","DOIUrl":"https://doi.org/10.48550/arXiv.2307.09542","url":null,"abstract":"Recent efforts at explaining the interplay of memorization and generalization in deep overparametrized networks have posited that neural networks $textit{memorize}$\"hard\"examples in the final few layers of the model. Memorization refers to the ability to correctly predict on $textit{atypical}$ examples of the training set. In this work, we show that rather than being confined to individual layers, memorization is a phenomenon confined to a small set of neurons in various layers of the model. First, via three experimental sources of converging evidence, we find that most layers are redundant for the memorization of examples and the layers that contribute to example memorization are, in general, not the final layers. The three sources are $textit{gradient accounting}$ (measuring the contribution to the gradient norms from memorized and clean examples), $textit{layer rewinding}$ (replacing specific model weights of a converged model with previous training checkpoints), and $textit{retraining}$ (training rewound layers only on clean examples). Second, we ask a more generic question: can memorization be localized $textit{anywhere}$ in a model? We discover that memorization is often confined to a small number of neurons or channels (around 5) of the model. Based on these insights we propose a new form of dropout -- $textit{example-tied dropout}$ that enables us to direct the memorization of examples to an apriori determined set of neurons. By dropping out these neurons, we are able to reduce the accuracy on memorized examples from $100%to3%$, while also reducing the generalization gap.","PeriodicalId":74529,"journal":{"name":"Proceedings of the ... International Conference on Machine Learning. International Conference on Machine Learning","volume":"49 1","pages":"23536-23557"},"PeriodicalIF":0.0,"publicationDate":"2023-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74074256","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Autoregressive Diffusion Model for Graph Generation 图生成的自回归扩散模型
Lingkai Kong, Jiaming Cui, Haotian Sun, Yuchen Zhuang, B. Prakash, Chao Zhang
{"title":"Autoregressive Diffusion Model for Graph Generation","authors":"Lingkai Kong, Jiaming Cui, Haotian Sun, Yuchen Zhuang, B. Prakash, Chao Zhang","doi":"10.48550/arXiv.2307.08849","DOIUrl":"https://doi.org/10.48550/arXiv.2307.08849","url":null,"abstract":"Diffusion-based graph generative models have recently obtained promising results for graph generation. However, existing diffusion-based graph generative models are mostly one-shot generative models that apply Gaussian diffusion in the dequantized adjacency matrix space. Such a strategy can suffer from difficulty in model training, slow sampling speed, and incapability of incorporating constraints. We propose an emph{autoregressive diffusion} model for graph generation. Unlike existing methods, we define a node-absorbing diffusion process that operates directly in the discrete graph space. For forward diffusion, we design a emph{diffusion ordering network}, which learns a data-dependent node absorbing ordering from graph topology. For reverse generation, we design a emph{denoising network} that uses the reverse node ordering to efficiently reconstruct the graph by predicting the node type of the new node and its edges with previously denoised nodes at a time. Based on the permutation invariance of graph, we show that the two networks can be jointly trained by optimizing a simple lower bound of data likelihood. Our experiments on six diverse generic graph datasets and two molecule datasets show that our model achieves better or comparable generation performance with previous state-of-the-art, and meanwhile enjoys fast generation speed.","PeriodicalId":74529,"journal":{"name":"Proceedings of the ... International Conference on Machine Learning. International Conference on Machine Learning","volume":"8 1","pages":"17391-17408"},"PeriodicalIF":0.0,"publicationDate":"2023-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81534047","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Learning Expressive Priors for Generalization and Uncertainty Estimation in Neural Networks 神经网络泛化与不确定性估计的表达先验学习
Dominik Schnaus, Jongseok Lee, D. Cremers, Rudolph Triebel
{"title":"Learning Expressive Priors for Generalization and Uncertainty Estimation in Neural Networks","authors":"Dominik Schnaus, Jongseok Lee, D. Cremers, Rudolph Triebel","doi":"10.48550/arXiv.2307.07753","DOIUrl":"https://doi.org/10.48550/arXiv.2307.07753","url":null,"abstract":"In this work, we propose a novel prior learning method for advancing generalization and uncertainty estimation in deep neural networks. The key idea is to exploit scalable and structured posteriors of neural networks as informative priors with generalization guarantees. Our learned priors provide expressive probabilistic representations at large scale, like Bayesian counterparts of pre-trained models on ImageNet, and further produce non-vacuous generalization bounds. We also extend this idea to a continual learning framework, where the favorable properties of our priors are desirable. Major enablers are our technical contributions: (1) the sums-of-Kronecker-product computations, and (2) the derivations and optimizations of tractable objectives that lead to improved generalization bounds. Empirically, we exhaustively show the effectiveness of this method for uncertainty estimation and generalization.","PeriodicalId":74529,"journal":{"name":"Proceedings of the ... International Conference on Machine Learning. International Conference on Machine Learning","volume":"17 1","pages":"30252-30284"},"PeriodicalIF":0.0,"publicationDate":"2023-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88862562","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Drug Discovery under Covariate Shift with Domain-Informed Prior Distributions over Functions 协变量移位下的药物发现与域知情先验分布
Leo Klarner, Tim G. J. Rudner, M. Reutlinger, Torsten Schindler, G. Morris, C. Deane, Y. Teh
{"title":"Drug Discovery under Covariate Shift with Domain-Informed Prior Distributions over Functions","authors":"Leo Klarner, Tim G. J. Rudner, M. Reutlinger, Torsten Schindler, G. Morris, C. Deane, Y. Teh","doi":"10.48550/arXiv.2307.15073","DOIUrl":"https://doi.org/10.48550/arXiv.2307.15073","url":null,"abstract":"Accelerating the discovery of novel and more effective therapeutics is an important pharmaceutical problem in which deep learning is playing an increasingly significant role. However, real-world drug discovery tasks are often characterized by a scarcity of labeled data and significant covariate shift$unicode{x2013}unicode{x2013}$a setting that poses a challenge to standard deep learning methods. In this paper, we present Q-SAVI, a probabilistic model able to address these challenges by encoding explicit prior knowledge of the data-generating process into a prior distribution over functions, presenting researchers with a transparent and probabilistically principled way to encode data-driven modeling preferences. Building on a novel, gold-standard bioactivity dataset that facilitates a meaningful comparison of models in an extrapolative regime, we explore different approaches to induce data shift and construct a challenging evaluation setup. We then demonstrate that using Q-SAVI to integrate contextualized prior knowledge of drug-like chemical space into the modeling process affords substantial gains in predictive accuracy and calibration, outperforming a broad range of state-of-the-art self-supervised pre-training and domain adaptation techniques.","PeriodicalId":74529,"journal":{"name":"Proceedings of the ... International Conference on Machine Learning. International Conference on Machine Learning","volume":"31 1","pages":"17176-17197"},"PeriodicalIF":0.0,"publicationDate":"2023-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79998982","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Sequential Monte Carlo Learning for Time Series Structure Discovery 时序蒙特卡罗学习在时间序列结构发现中的应用
Feras A. Saad, Brian Patton, Matt Hoffman, R. Saurous, Vikash K. Mansinghka
{"title":"Sequential Monte Carlo Learning for Time Series Structure Discovery","authors":"Feras A. Saad, Brian Patton, Matt Hoffman, R. Saurous, Vikash K. Mansinghka","doi":"10.48550/arXiv.2307.09607","DOIUrl":"https://doi.org/10.48550/arXiv.2307.09607","url":null,"abstract":"This paper presents a new approach to automatically discovering accurate models of complex time series data. Working within a Bayesian nonparametric prior over a symbolic space of Gaussian process time series models, we present a novel structure learning algorithm that integrates sequential Monte Carlo (SMC) and involutive MCMC for highly effective posterior inference. Our method can be used both in\"online\"settings, where new data is incorporated sequentially in time, and in\"offline\"settings, by using nested subsets of historical data to anneal the posterior. Empirical measurements on real-world time series show that our method can deliver 10x--100x runtime speedups over previous MCMC and greedy-search structure learning algorithms targeting the same model family. We use our method to perform the first large-scale evaluation of Gaussian process time series structure learning on a prominent benchmark of 1,428 econometric datasets. The results show that our method discovers sensible models that deliver more accurate point forecasts and interval forecasts over multiple horizons as compared to widely used statistical and neural baselines that struggle on this challenging data.","PeriodicalId":74529,"journal":{"name":"Proceedings of the ... International Conference on Machine Learning. International Conference on Machine Learning","volume":"1 1","pages":"29473-29489"},"PeriodicalIF":0.0,"publicationDate":"2023-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76885590","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Trainability, Expressivity and Interpretability in Gated Neural ODEs 门控神经ode的可训练性、表达性和可解释性
T. Kim, T. Can, K. Krishnamurthy
{"title":"Trainability, Expressivity and Interpretability in Gated Neural ODEs","authors":"T. Kim, T. Can, K. Krishnamurthy","doi":"10.48550/arXiv.2307.06398","DOIUrl":"https://doi.org/10.48550/arXiv.2307.06398","url":null,"abstract":"Understanding how the dynamics in biological and artificial neural networks implement the computations required for a task is a salient open question in machine learning and neuroscience. In particular, computations requiring complex memory storage and retrieval pose a significant challenge for these networks to implement or learn. Recently, a family of models described by neural ordinary differential equations (nODEs) has emerged as powerful dynamical neural network models capable of capturing complex dynamics. Here, we extend nODEs by endowing them with adaptive timescales using gating interactions. We refer to these as gated neural ODEs (gnODEs). Using a task that requires memory of continuous quantities, we demonstrate the inductive bias of the gnODEs to learn (approximate) continuous attractors. We further show how reduced-dimensional gnODEs retain their modeling power while greatly improving interpretability, even allowing explicit visualization of the structure of learned attractors. We introduce a novel measure of expressivity which probes the capacity of a neural network to generate complex trajectories. Using this measure, we explore how the phase-space dimension of the nODEs and the complexity of the function modeling the flow field contribute to expressivity. We see that a more complex function for modeling the flow field allows a lower-dimensional nODE to capture a given target dynamics. Finally, we demonstrate the benefit of gating in nODEs on several real-world tasks.","PeriodicalId":74529,"journal":{"name":"Proceedings of the ... International Conference on Machine Learning. International Conference on Machine Learning","volume":"7 1","pages":"16393-16423"},"PeriodicalIF":0.0,"publicationDate":"2023-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78462711","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Diversity-enhancing Generative Network for Few-shot Hypothesis Adaptation 基于少镜头假设自适应的多样性增强生成网络
Ruijiang Dong, Feng Liu, Haoang Chi, Tongliang Liu, Mingming Gong, Gang Niu, Masashi Sugiyama, Bo Han
{"title":"Diversity-enhancing Generative Network for Few-shot Hypothesis Adaptation","authors":"Ruijiang Dong, Feng Liu, Haoang Chi, Tongliang Liu, Mingming Gong, Gang Niu, Masashi Sugiyama, Bo Han","doi":"10.48550/arXiv.2307.05948","DOIUrl":"https://doi.org/10.48550/arXiv.2307.05948","url":null,"abstract":"Generating unlabeled data has been recently shown to help address the few-shot hypothesis adaptation (FHA) problem, where we aim to train a classifier for the target domain with a few labeled target-domain data and a well-trained source-domain classifier (i.e., a source hypothesis), for the additional information of the highly-compatible unlabeled data. However, the generated data of the existing methods are extremely similar or even the same. The strong dependency among the generated data will lead the learning to fail. In this paper, we propose a diversity-enhancing generative network (DEG-Net) for the FHA problem, which can generate diverse unlabeled data with the help of a kernel independence measure: the Hilbert-Schmidt independence criterion (HSIC). Specifically, DEG-Net will generate data via minimizing the HSIC value (i.e., maximizing the independence) among the semantic features of the generated data. By DEG-Net, the generated unlabeled data are more diverse and more effective for addressing the FHA problem. Experimental results show that the DEG-Net outperforms existing FHA baselines and further verifies that generating diverse data plays a vital role in addressing the FHA problem","PeriodicalId":74529,"journal":{"name":"Proceedings of the ... International Conference on Machine Learning. International Conference on Machine Learning","volume":"89 1","pages":"8260-8275"},"PeriodicalIF":0.0,"publicationDate":"2023-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80304652","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Conformalization of Sparse Generalized Linear Models 稀疏广义线性模型的保形化
E. Guha, Eugène Ndiaye, X. Huo
{"title":"Conformalization of Sparse Generalized Linear Models","authors":"E. Guha, Eugène Ndiaye, X. Huo","doi":"10.48550/arXiv.2307.05109","DOIUrl":"https://doi.org/10.48550/arXiv.2307.05109","url":null,"abstract":"Given a sequence of observable variables ${(x_1, y_1), ldots, (x_n, y_n)}$, the conformal prediction method estimates a confidence set for $y_{n+1}$ given $x_{n+1}$ that is valid for any finite sample size by merely assuming that the joint distribution of the data is permutation invariant. Although attractive, computing such a set is computationally infeasible in most regression problems. Indeed, in these cases, the unknown variable $y_{n+1}$ can take an infinite number of possible candidate values, and generating conformal sets requires retraining a predictive model for each candidate. In this paper, we focus on a sparse linear model with only a subset of variables for prediction and use numerical continuation techniques to approximate the solution path efficiently. The critical property we exploit is that the set of selected variables is invariant under a small perturbation of the input data. Therefore, it is sufficient to enumerate and refit the model only at the change points of the set of active features and smoothly interpolate the rest of the solution via a Predictor-Corrector mechanism. We show how our path-following algorithm accurately approximates conformal prediction sets and illustrate its performance using synthetic and real data examples.","PeriodicalId":74529,"journal":{"name":"Proceedings of the ... International Conference on Machine Learning. International Conference on Machine Learning","volume":"5 1","pages":"11871-11887"},"PeriodicalIF":0.0,"publicationDate":"2023-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88597018","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Injecting Logical Constraints into Neural Networks via Straight-Through Estimators 通过直通估计器向神经网络注入逻辑约束
Zhun Yang, Joohyung Lee, Chi-youn Park
{"title":"Injecting Logical Constraints into Neural Networks via Straight-Through Estimators","authors":"Zhun Yang, Joohyung Lee, Chi-youn Park","doi":"10.48550/arXiv.2307.04347","DOIUrl":"https://doi.org/10.48550/arXiv.2307.04347","url":null,"abstract":"Injecting discrete logical constraints into neural network learning is one of the main challenges in neuro-symbolic AI. We find that a straight-through-estimator, a method introduced to train binary neural networks, could effectively be applied to incorporate logical constraints into neural network learning. More specifically, we design a systematic way to represent discrete logical constraints as a loss function; minimizing this loss using gradient descent via a straight-through-estimator updates the neural network's weights in the direction that the binarized outputs satisfy the logical constraints. The experimental results show that by leveraging GPUs and batch training, this method scales significantly better than existing neuro-symbolic methods that require heavy symbolic computation for computing gradients. Also, we demonstrate that our method applies to different types of neural networks, such as MLP, CNN, and GNN, making them learn with no or fewer labeled data by learning directly from known constraints.","PeriodicalId":74529,"journal":{"name":"Proceedings of the ... International Conference on Machine Learning. International Conference on Machine Learning","volume":"1 1","pages":"25096-25122"},"PeriodicalIF":0.0,"publicationDate":"2023-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76693262","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
TGRL: An Algorithm for Teacher Guided Reinforcement Learning TGRL:一种教师引导的强化学习算法
Idan Shenfeld, Zhang-Wei Hong, Aviv Tamar, Pulkit Agrawal
{"title":"TGRL: An Algorithm for Teacher Guided Reinforcement Learning","authors":"Idan Shenfeld, Zhang-Wei Hong, Aviv Tamar, Pulkit Agrawal","doi":"10.48550/arXiv.2307.03186","DOIUrl":"https://doi.org/10.48550/arXiv.2307.03186","url":null,"abstract":"Learning from rewards (i.e., reinforcement learning or RL) and learning to imitate a teacher (i.e., teacher-student learning) are two established approaches for solving sequential decision-making problems. To combine the benefits of these different forms of learning, it is common to train a policy to maximize a combination of reinforcement and teacher-student learning objectives. However, without a principled method to balance these objectives, prior work used heuristics and problem-specific hyperparameter searches to balance the two objectives. We present a $textit{principled}$ approach, along with an approximate implementation for $textit{dynamically}$ and $textit{automatically}$ balancing when to follow the teacher and when to use rewards. The main idea is to adjust the importance of teacher supervision by comparing the agent's performance to the counterfactual scenario of the agent learning without teacher supervision and only from rewards. If using teacher supervision improves performance, the importance of teacher supervision is increased and otherwise it is decreased. Our method, $textit{Teacher Guided Reinforcement Learning}$ (TGRL), outperforms strong baselines across diverse domains without hyper-parameter tuning.","PeriodicalId":74529,"journal":{"name":"Proceedings of the ... International Conference on Machine Learning. International Conference on Machine Learning","volume":"41 1","pages":"31077-31093"},"PeriodicalIF":0.0,"publicationDate":"2023-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86499350","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信