Trans. Mach. Learn. Res.最新文献

筛选
英文 中文
Revisiting adversarial training for the worst-performing class 对表现最差的班级重新进行对抗性训练
Trans. Mach. Learn. Res. Pub Date : 2023-02-17 DOI: 10.48550/arXiv.2302.08872
T. Pethick, Grigorios G. Chrysos, V. Cevher
{"title":"Revisiting adversarial training for the worst-performing class","authors":"T. Pethick, Grigorios G. Chrysos, V. Cevher","doi":"10.48550/arXiv.2302.08872","DOIUrl":"https://doi.org/10.48550/arXiv.2302.08872","url":null,"abstract":"Despite progress in adversarial training (AT), there is a substantial gap between the top-performing and worst-performing classes in many datasets. For example, on CIFAR10, the accuracies for the best and worst classes are 74% and 23%, respectively. We argue that this gap can be reduced by explicitly optimizing for the worst-performing class, resulting in a min-max-max optimization formulation. Our method, called class focused online learning (CFOL), includes high probability convergence guarantees for the worst class loss and can be easily integrated into existing training setups with minimal computational overhead. We demonstrate an improvement to 32% in the worst class accuracy on CIFAR10, and we observe consistent behavior across CIFAR100 and STL10. Our study highlights the importance of moving beyond average accuracy, which is particularly important in safety-critical applications.","PeriodicalId":432739,"journal":{"name":"Trans. Mach. Learn. Res.","volume":"91 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134472245","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
On a continuous time model of gradient descent dynamics and instability in deep learning 深度学习中梯度下降动力学和不稳定性的连续时间模型
Trans. Mach. Learn. Res. Pub Date : 2023-02-03 DOI: 10.48550/arXiv.2302.01952
Mihaela Rosca, Yan Wu, Chongli Qin, B. Dherin
{"title":"On a continuous time model of gradient descent dynamics and instability in deep learning","authors":"Mihaela Rosca, Yan Wu, Chongli Qin, B. Dherin","doi":"10.48550/arXiv.2302.01952","DOIUrl":"https://doi.org/10.48550/arXiv.2302.01952","url":null,"abstract":"The recipe behind the success of deep learning has been the combination of neural networks and gradient-based optimization. Understanding the behavior of gradient descent however, and particularly its instability, has lagged behind its empirical success. To add to the theoretical tools available to study gradient descent we propose the principal flow (PF), a continuous time flow that approximates gradient descent dynamics. To our knowledge, the PF is the only continuous flow that captures the divergent and oscillatory behaviors of gradient descent, including escaping local minima and saddle points. Through its dependence on the eigendecomposition of the Hessian the PF sheds light on the recently observed edge of stability phenomena in deep learning. Using our new understanding of instability we propose a learning rate adaptation method which enables us to control the trade-off between training stability and test set evaluation performance.","PeriodicalId":432739,"journal":{"name":"Trans. Mach. Learn. Res.","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124265023","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Dual PatchNorm 双重PatchNorm
Trans. Mach. Learn. Res. Pub Date : 2023-02-02 DOI: 10.48550/arXiv.2302.01327
Manoj Kumar, Mostafa Dehghani, N. Houlsby
{"title":"Dual PatchNorm","authors":"Manoj Kumar, Mostafa Dehghani, N. Houlsby","doi":"10.48550/arXiv.2302.01327","DOIUrl":"https://doi.org/10.48550/arXiv.2302.01327","url":null,"abstract":"We propose Dual PatchNorm: two Layer Normalization layers (LayerNorms), before and after the patch embedding layer in Vision Transformers. We demonstrate that Dual PatchNorm outperforms the result of exhaustive search for alternative LayerNorm placement strategies in the Transformer block itself. In our experiments, incorporating this trivial modification, often leads to improved accuracy over well-tuned Vision Transformers and never hurts.","PeriodicalId":432739,"journal":{"name":"Trans. Mach. Learn. Res.","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127276928","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Fourier Sensitivity and Regularization of Computer Vision Models 计算机视觉模型的傅里叶灵敏度和正则化
Trans. Mach. Learn. Res. Pub Date : 2023-01-31 DOI: 10.48550/arXiv.2301.13514
K. Krishnamachari, See-Kiong Ng, Chuan-Sheng Foo
{"title":"Fourier Sensitivity and Regularization of Computer Vision Models","authors":"K. Krishnamachari, See-Kiong Ng, Chuan-Sheng Foo","doi":"10.48550/arXiv.2301.13514","DOIUrl":"https://doi.org/10.48550/arXiv.2301.13514","url":null,"abstract":"Recent work has empirically shown that deep neural networks latch on to the Fourier statistics of training data and show increased sensitivity to Fourier-basis directions in the input. Understanding and modifying this Fourier-sensitivity of computer vision models may help improve their robustness. Hence, in this paper we study the frequency sensitivity characteristics of deep neural networks using a principled approach. We first propose a basis trick, proving that unitary transformations of the input-gradient of a function can be used to compute its gradient in the basis induced by the transformation. Using this result, we propose a general measure of any differentiable model's Fourier-sensitivity using the unitary Fourier-transform of its input-gradient. When applied to deep neural networks, we find that computer vision models are consistently sensitive to particular frequencies dependent on the dataset, training method and architecture. Based on this measure, we further propose a Fourier-regularization framework to modify the Fourier-sensitivities and frequency bias of models. Using our proposed regularizer-family, we demonstrate that deep neural networks obtain improved classification accuracy on robustness evaluations.","PeriodicalId":432739,"journal":{"name":"Trans. Mach. Learn. Res.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128895624","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
PRUDEX-Compass: Towards Systematic Evaluation of Reinforcement Learning in Financial Markets PRUDEX-Compass:迈向金融市场强化学习的系统评估
Trans. Mach. Learn. Res. Pub Date : 2023-01-14 DOI: 10.48550/arXiv.2302.00586
Shuo Sun, Molei Qin, Xinrun Wang, Bo An
{"title":"PRUDEX-Compass: Towards Systematic Evaluation of Reinforcement Learning in Financial Markets","authors":"Shuo Sun, Molei Qin, Xinrun Wang, Bo An","doi":"10.48550/arXiv.2302.00586","DOIUrl":"https://doi.org/10.48550/arXiv.2302.00586","url":null,"abstract":"The financial markets, which involve more than $90 trillion market capitals, attract the attention of innumerable investors around the world. Recently, reinforcement learning in financial markets (FinRL) has emerged as a promising direction to train agents for making profitable investment decisions. However, the evaluation of most FinRL methods only focuses on profit-related measures and ignores many critical axes, which are far from satisfactory for financial practitioners to deploy these methods into real-world financial markets. Therefore, we introduce PRUDEX-Compass, which has 6 axes, i.e., Profitability, Risk-control, Universality, Diversity, rEliability, and eXplainability, with a total of 17 measures for a systematic evaluation. Specifically, i) we propose AlphaMix+ as a strong FinRL baseline, which leverages mixture-of-experts (MoE) and risk-sensitive approaches to make diversified risk-aware investment decisions, ii) we evaluate 8 FinRL methods in 4 long-term real-world datasets of influential financial markets to demonstrate the usage of our PRUDEX-Compass, iii) PRUDEX-Compass together with 4 real-world datasets, standard implementation of 8 FinRL methods and a portfolio management environment is released as public resources to facilitate the design and comparison of new FinRL methods. We hope that PRUDEX-Compass can not only shed light on future FinRL research to prevent untrustworthy results from stagnating FinRL into successful industry deployment but also provide a new challenging algorithm evaluation scenario for the reinforcement learning (RL) community.","PeriodicalId":432739,"journal":{"name":"Trans. Mach. Learn. Res.","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131781116","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Risk Sensitive Dead-end Identification in Safety-Critical Offline Reinforcement Learning 安全关键离线强化学习中的风险敏感死角识别
Trans. Mach. Learn. Res. Pub Date : 2023-01-13 DOI: 10.48550/arXiv.2301.05664
Taylor W. Killian, S. Parbhoo, M. Ghassemi
{"title":"Risk Sensitive Dead-end Identification in Safety-Critical Offline Reinforcement Learning","authors":"Taylor W. Killian, S. Parbhoo, M. Ghassemi","doi":"10.48550/arXiv.2301.05664","DOIUrl":"https://doi.org/10.48550/arXiv.2301.05664","url":null,"abstract":"In safety-critical decision-making scenarios being able to identify worst-case outcomes, or dead-ends is crucial in order to develop safe and reliable policies in practice. These situations are typically rife with uncertainty due to unknown or stochastic characteristics of the environment as well as limited offline training data. As a result, the value of a decision at any time point should be based on the distribution of its anticipated effects. We propose a framework to identify worst-case decision points, by explicitly estimating distributions of the expected return of a decision. These estimates enable earlier indication of dead-ends in a manner that is tunable based on the risk tolerance of the designed task. We demonstrate the utility of Distributional Dead-end Discovery (DistDeD) in a toy domain as well as when assessing the risk of severely ill patients in the intensive care unit reaching a point where death is unavoidable. We find that DistDeD significantly improves over prior discovery approaches, providing indications of the risk 10 hours earlier on average as well as increasing detection by 20%.","PeriodicalId":432739,"journal":{"name":"Trans. Mach. Learn. Res.","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129070471","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A Stochastic Proximal Polyak Step Size 随机近端Polyak步长
Trans. Mach. Learn. Res. Pub Date : 2023-01-12 DOI: 10.48550/arXiv.2301.04935
Fabian Schaipp, Robert Mansel Gower, M. Ulbrich
{"title":"A Stochastic Proximal Polyak Step Size","authors":"Fabian Schaipp, Robert Mansel Gower, M. Ulbrich","doi":"10.48550/arXiv.2301.04935","DOIUrl":"https://doi.org/10.48550/arXiv.2301.04935","url":null,"abstract":"Recently, the stochastic Polyak step size (SPS) has emerged as a competitive adaptive step size scheme for stochastic gradient descent. Here we develop ProxSPS, a proximal variant of SPS that can handle regularization terms. Developing a proximal variant of SPS is particularly important, since SPS requires a lower bound of the objective function to work well. When the objective function is the sum of a loss and a regularizer, available estimates of a lower bound of the sum can be loose. In contrast, ProxSPS only requires a lower bound for the loss which is often readily available. As a consequence, we show that ProxSPS is easier to tune and more stable in the presence of regularization. Furthermore for image classification tasks, ProxSPS performs as well as AdamW with little to no tuning, and results in a network with smaller weight parameters. We also provide an extensive convergence analysis for ProxSPS that includes the non-smooth, smooth, weakly convex and strongly convex setting.","PeriodicalId":432739,"journal":{"name":"Trans. Mach. Learn. Res.","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124823941","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Exploring Efficient Few-shot Adaptation for Vision Transformers 探索视觉变形器的高效少镜头自适应
Trans. Mach. Learn. Res. Pub Date : 2023-01-06 DOI: 10.48550/arXiv.2301.02419
C. Xu
{"title":"Exploring Efficient Few-shot Adaptation for Vision Transformers","authors":"C. Xu","doi":"10.48550/arXiv.2301.02419","DOIUrl":"https://doi.org/10.48550/arXiv.2301.02419","url":null,"abstract":"The task of Few-shot Learning (FSL) aims to do the inference on novel categories containing only few labeled examples, with the help of knowledge learned from base categories containing abundant labeled training samples. While there are numerous works into FSL task, Vision Transformers (ViTs) have rarely been taken as the backbone to FSL with few trials focusing on naive finetuning of whole backbone or classification layer.} Essentially, despite ViTs have been shown to enjoy comparable or even better performance on other vision tasks, it is still very nontrivial to efficiently finetune the ViTs in real-world FSL scenarios. To this end, we propose a novel efficient Transformer Tuning (eTT) method that facilitates finetuning ViTs in the FSL tasks. The key novelties come from the newly presented Attentive Prefix Tuning (APT) and Domain Residual Adapter (DRA) for the task and backbone tuning, individually. Specifically, in APT, the prefix is projected to new key and value pairs that are attached to each self-attention layer to provide the model with task-specific information. Moreover, we design the DRA in the form of learnable offset vectors to handle the potential domain gaps between base and novel data. To ensure the APT would not deviate from the initial task-specific information much, we further propose a novel prototypical regularization, which maximizes the similarity between the projected distribution of prefix and initial prototypes, regularizing the update procedure. Our method receives outstanding performance on the challenging Meta-Dataset. We conduct extensive experiments to show the efficacy of our model.","PeriodicalId":432739,"journal":{"name":"Trans. Mach. Learn. Res.","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115320638","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Reusable Options through Gradient-based Meta Learning 基于梯度元学习的可重用选项
Trans. Mach. Learn. Res. Pub Date : 2022-12-22 DOI: 10.48550/arXiv.2212.11726
David Kuric, H. V. Hoof
{"title":"Reusable Options through Gradient-based Meta Learning","authors":"David Kuric, H. V. Hoof","doi":"10.48550/arXiv.2212.11726","DOIUrl":"https://doi.org/10.48550/arXiv.2212.11726","url":null,"abstract":"Hierarchical methods in reinforcement learning have the potential to reduce the amount of decisions that the agent needs to perform when learning new tasks. However, finding reusable useful temporal abstractions that facilitate fast learning remains a challenging problem. Recently, several deep learning approaches were proposed to learn such temporal abstractions in the form of options in an end-to-end manner. In this work, we point out several shortcomings of these methods and discuss their potential negative consequences. Subsequently, we formulate the desiderata for reusable options and use these to frame the problem of learning options as a gradient-based meta-learning problem. This allows us to formulate an objective that explicitly incentivizes options which allow a higher-level decision maker to adjust in few steps to different tasks. Experimentally, we show that our method is able to learn transferable components which accelerate learning and performs better than existing prior methods developed for this setting. Additionally, we perform ablations to quantify the impact of using gradient-based meta-learning as well as other proposed changes.","PeriodicalId":432739,"journal":{"name":"Trans. Mach. Learn. Res.","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127879475","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bridging Graph Position Encodings for Transformers with Weighted Graph-Walking Automata 加权图行走自动机变压器桥图位置编码
Trans. Mach. Learn. Res. Pub Date : 2022-12-13 DOI: 10.48550/arXiv.2212.06898
Patrick M. Soga, David Chiang
{"title":"Bridging Graph Position Encodings for Transformers with Weighted Graph-Walking Automata","authors":"Patrick M. Soga, David Chiang","doi":"10.48550/arXiv.2212.06898","DOIUrl":"https://doi.org/10.48550/arXiv.2212.06898","url":null,"abstract":"A current goal in the graph neural network literature is to enable transformers to operate on graph-structured data, given their success on language and vision tasks. Since the transformer's original sinusoidal positional encodings (PEs) are not applicable to graphs, recent work has focused on developing graph PEs, rooted in spectral graph theory or various spatial features of a graph. In this work, we introduce a new graph PE, Graph Automaton PE (GAPE), based on weighted graph-walking automata (a novel extension of graph-walking automata). We compare the performance of GAPE with other PE schemes on both machine translation and graph-structured tasks, and we show that it generalizes several other PEs. An additional contribution of this study is a theoretical and controlled experimental comparison of many recent PEs in graph transformers, independent of the use of edge features.","PeriodicalId":432739,"journal":{"name":"Trans. Mach. Learn. Res.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124206616","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信