Transactions on machine learning research最新文献

筛选
英文 中文
TapWeight: Reweighting Pretraining Objectives for Task-Adaptive Pretraining. TapWeight:任务自适应预训练的重新加权预训练目标。
Ruiyi Zhang, Sai Ashish Somayajula, Pengtao Xie
{"title":"TapWeight: Reweighting Pretraining Objectives for Task-Adaptive Pretraining.","authors":"Ruiyi Zhang, Sai Ashish Somayajula, Pengtao Xie","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Large-scale general domain pretraining followed by downstream-specific finetuning has become a predominant paradigm in machine learning. However, discrepancies between the pretraining and target domains can still lead to performance degradation in certain cases, underscoring the need for task-adaptive continued pretraining (TAP). TAP methods typically involve continued pretraining on task-specific unlabeled datasets or introducing additional unsupervised learning objectives to enhance model capabilities. While many TAP methods perform continued pretraining with multiple pretraining objectives, they often determine the tradeoff parameters between objectives manually, resulting in suboptimal outcomes and higher computational costs. In this paper, we propose TapWeight, a task-adaptive pretraining framework which automatically determines the optimal importance of each pretraining objective based on downstream feedback. TapWeight reweights each pretraining objective by solving a multi-level optimization problem. We applied TapWeight to both molecular property prediction and natural language processing tasks, significantly surpassing baseline methods. Experimental results validate the effectiveness and generalizability of TapWeight. Our code is available at https://github.com/ruz048/TapWeight.</p>","PeriodicalId":75238,"journal":{"name":"Transactions on machine learning research","volume":"2025 ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12377235/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144982041","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Downstream Task Guided Masking Learning in Masked Autoencoders Using Multi-Level Optimization. 基于多级优化的掩蔽自编码器的下游任务引导掩蔽学习。
Han Guo, Ramtin Hosseini, Ruiyi Zhang, Sai Ashish Somayajula, Ranak Roy Chowdhury, Rajesh K Gupta, Pengtao Xie
{"title":"Downstream Task Guided Masking Learning in Masked Autoencoders Using Multi-Level Optimization.","authors":"Han Guo, Ramtin Hosseini, Ruiyi Zhang, Sai Ashish Somayajula, Ranak Roy Chowdhury, Rajesh K Gupta, Pengtao Xie","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Masked Autoencoder (MAE) is a notable method for self-supervised pretraining in visual representation learning. It operates by randomly masking image patches and reconstructing these masked patches using the unmasked ones. A key limitation of MAE lies in its disregard for the varying informativeness of different patches, as it uniformly selects patches to mask. To overcome this, some approaches propose masking based on patch informativeness. However, these methods often do not consider the specific requirements of downstream tasks, potentially leading to suboptimal representations for these tasks. In response, we introduce the Multi-level Optimized Mask Autoencoder (MLO-MAE), a novel framework that leverages end-to-end feedback from downstream tasks to learn an optimal masking strategy during pretraining. Our experimental findings highlight MLO-MAE's significant advancements in visual representation learning. Compared to existing methods, it demonstrates remarkable improvements across diverse datasets and tasks, showcasing its adaptability and efficiency. Our code is available at https://github.com/Alexiland/MLO-MAE.</p>","PeriodicalId":75238,"journal":{"name":"Transactions on machine learning research","volume":"2025 ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12356090/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144877138","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Accelerating Learned Image Compression Through Modeling Neural Training Dynamics. 通过建模神经训练动力学加速学习图像压缩。
Yichi Zhang, Zhihao Duan, Yuning Huang, Fengqing Zhu
{"title":"Accelerating Learned Image Compression Through Modeling Neural Training Dynamics.","authors":"Yichi Zhang, Zhihao Duan, Yuning Huang, Fengqing Zhu","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>As learned image compression (LIC) methods become increasingly computationally demanding, enhancing their training efficiency is crucial. This paper takes a step forward in accelerating the training of LIC methods by modeling the neural training dynamics. We first propose a Sensitivity-aware True and Dummy Embedding Training mechanism (STDET) that clusters LIC model parameters into few separate modes where parameters are expressed as affine transformations of reference parameters within the same mode. By further utilizing the stable intra-mode correlations throughout training and parameter sensitivities, we gradually embed non-reference parameters, reducing the number of trainable parameters. Additionally, we incorporate a Sampling-then-Moving Average (SMA) technique, interpolating sampled weights from stochastic gradient descent (SGD) training to obtain the moving average weights, ensuring smooth temporal behavior and minimizing training state variances. Overall, our method significantly reduces training space dimensions and the number of trainable parameters without sacrificing model performance, thus accelerating model convergence. We also provide a theoretical analysis on the Noisy quadratic model, showing that the proposed method achieves a lower training variance than standard SGD. Our approach offers valuable insights for further developing efficient training methods for LICs.</p>","PeriodicalId":75238,"journal":{"name":"Transactions on machine learning research","volume":"2025 ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12129407/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144210455","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On the stability of gradient descent with second order dynamics for time-varying cost functions. 时变代价函数二阶动力学梯度下降的稳定性。
Travis E Gibson, Sawal Acharya, Anjali Parashar, Joseph E Gaudio, Anuradha M Annaswamy
{"title":"On the stability of gradient descent with second order dynamics for time-varying cost functions.","authors":"Travis E Gibson, Sawal Acharya, Anjali Parashar, Joseph E Gaudio, Anuradha M Annaswamy","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Gradient based optimization algorithms deployed in Machine Learning (ML) applications are often analyzed and compared by their convergence rates or regret bounds. While these rates and bounds convey valuable information they don't always directly translate to stability guarantees. Stability and similar concepts, like robustness, will become ever more important as we move towards deploying models in real-time and safety critical systems. In this work we build upon the results in Gaudio et al. 2021 and Moreu & Annaswamy 2022 for gradient descent with second order dynamics when applied to explicitly time varying cost functions and provide more general stability guarantees. These more general results can aid in the design and certification of these optimization schemes so as to help ensure safe and reliable deployment for real-time learning applications. We also hope that the techniques provided here will stimulate and cross-fertilize the analysis that occurs on the same algorithms from the online learning and stochastic optimization communities.</p>","PeriodicalId":75238,"journal":{"name":"Transactions on machine learning research","volume":"2025 ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12284918/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144700595","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Transformer Architecture Search for Improving Out-of-Domain Generalization in Machine Translation. 改进机器翻译领域外泛化的变压器体系结构搜索。
Yiheng He, Ruiyi Zhang, Sai Ashish Somayajula, Pengtao Xie
{"title":"Transformer Architecture Search for Improving Out-of-Domain Generalization in Machine Translation.","authors":"Yiheng He, Ruiyi Zhang, Sai Ashish Somayajula, Pengtao Xie","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Interest in automatically searching for Transformer neural architectures for machine translation (MT) has been increasing. Current methods show promising results in in-domain settings, where training and test data share the same distribution. However, in real-world MT applications, it is common that the test data has a different distribution than the training data. In these out-of-domain (OOD) situations, Transformer architectures optimized for the linguistic characteristics of the training sentences struggle to produce accurate translations for OOD sentences during testing. To tackle this issue, we propose a multi-level optimization based method to automatically search for neural architectures that possess robust OOD generalization capabilities. During the architecture search process, our method automatically synthesizes approximated OOD MT data, which is used to evaluate and improve the architectures' ability of generalizing to OOD scenarios. The generation of approximated OOD data and the search for optimal architectures are executed in an integrated, end-to-end manner. Evaluated across multiple datasets, our method demonstrates strong OOD generalization performance, surpassing state-of-the-art approaches. Our code is publicly available at https://github.com/yihenghe/transformer_nas.</p>","PeriodicalId":75238,"journal":{"name":"Transactions on machine learning research","volume":"2024 ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12356094/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144877137","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Selective Classification Under Distribution Shifts. 分布移位下的选择性分类。
Hengyue Liang, Le Peng, Ju Sun
{"title":"Selective Classification Under Distribution Shifts.","authors":"Hengyue Liang, Le Peng, Ju Sun","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>In selective classification (SC), a classifier abstains from making predictions that are likely to be wrong to avoid excessive errors. To deploy imperfect classifiers-either due to intrinsic statistical noise of data or for robustness issue of the classifier or beyond-in high-stakes scenarios, SC appears to be an attractive and necessary path to follow. Despite decades of research in SC, most previous SC methods still focus on the ideal statistical setting only, i.e., the data distribution at deployment is the same as that of training, although practical data can come from the wild. To bridge this gap, in this paper, we propose an SC framework that takes into account distribution shifts, termed <i>generalized selective classification</i>, that covers label-shifted (or out-of-distribution) and covariate-shifted samples, in addition to typical in-distribution samples, <i>the first of its kind</i> in the SC literature. We focus on non-training-based confidence-score functions for generalized SC on deep learning (DL) classifiers, and propose two novel margin-based score functions. Through extensive analysis and experiments, we show that our proposed score functions are more effective and reliable than the existing ones for generalized SC on a variety of classification tasks and DL classifiers. The code is available at https://github.com/sun-umn/sc_with_distshift.</p>","PeriodicalId":75238,"journal":{"name":"Transactions on machine learning research","volume":"2024 ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12470254/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145187750","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ModuLoRA: Finetuning 2-Bit LLMs on Consumer GPUs by Integrating with Modular Quantizers. ModuLoRA:通过集成模块化量化器对消费级gpu上的2位llm进行微调。
Transactions on machine learning research Pub Date : 2024-02-01 Epub Date: 2024-02-27
Junjie Yin, Jiahao Dong, Yingheng Wang, Christopher De Sa, Volodymyr Kuleshov
{"title":"ModuLoRA: Finetuning 2-Bit LLMs on Consumer GPUs by Integrating with Modular Quantizers.","authors":"Junjie Yin, Jiahao Dong, Yingheng Wang, Christopher De Sa, Volodymyr Kuleshov","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>We propose a memory-efficient finetuning algorithm for large language models (LLMs) that supports finetuning LLMs with 65B parameters in 2/3/4-bit precision on as little as one 24GB GPU. Our method, modular low-rank adaptation (ModuLoRA), integrates any user-specified weight quantizer with finetuning via low-rank adapters (LoRAs). Our approach relies on a simple quantization-agnostic backward pass that adaptively materializes low-precision LLM weights from a custom black-box quantization module. This approach enables finetuning 2-bit and 3-bit LLMs for the first time-leveraging state-of-the-art 2-bit QuIP# quantization and 3-bit OPTQ quantization-outperforming finetuning that relies on less sophisticated 4-bit and 8-bit methods. In our experiments, ModuLoRA attains competitive performance on text classification, natural language inference, and instruction following tasks using significantly less memory than existing approaches, and we also surpass the state-of-the-art ROUGE score on a popular summarization task. We release ModuLoRA together with a series of low-precision models as part of LLMTools, a user-friendly library for quantizing, running, and finetuning LLMs on consumer GPUs.</p>","PeriodicalId":75238,"journal":{"name":"Transactions on machine learning research","volume":"2024 ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12362356/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144981971","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Federated Learning with Convex Global and Local Constraints. 全局和局部约束条件下的联合学习
Transactions on machine learning research Pub Date : 2024-01-01 Epub Date: 2024-05-03
Chuan He, Le Peng, Ju Sun
{"title":"Federated Learning with Convex Global and Local Constraints.","authors":"Chuan He, Le Peng, Ju Sun","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>In practice, many machine learning (ML) problems come with constraints, and their applied domains involve distributed sensitive data that cannot be shared with others, e.g., in healthcare. Collaborative learning in such practical scenarios entails federated learning (FL) for ML problems with constraints, or <i>FL with constraints</i> for short. Despite the extensive developments of FL techniques in recent years, these techniques only deal with unconstrained FL problems or FL problems with simple constraints that are amenable to easy projections. There is little work dealing with FL problems with general constraints. To fill this gap, we take the first step toward building an algorithmic framework for solving FL problems with general constraints. In particular, we propose a new FL algorithm for constrained ML problems based on the proximal augmented Lagrangian (AL) method. Assuming convex objective and convex constraints plus other mild conditions, we establish the worst-case complexity of the proposed algorithm. Our numerical experiments show the effectiveness of our algorithm in performing Neyman-Pearson classification and fairness-aware learning with nonconvex constraints, in an FL setting.</p>","PeriodicalId":75238,"journal":{"name":"Transactions on machine learning research","volume":"2024 ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11295925/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141891198","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Online model selection by learning how compositional kernels evolve. 通过学习组成核的演变过程进行在线模型选择。
Eura Shin, Predrag Klasnja, Susan A Murphy, Finale Doshi-Velez
{"title":"Online model selection by learning how compositional kernels evolve.","authors":"Eura Shin, Predrag Klasnja, Susan A Murphy, Finale Doshi-Velez","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Motivated by the need for efficient, personalized learning in mobile health, we investigate the problem of online compositional kernel selection for multi-task Gaussian Process regression. Existing composition selection methods do not satisfy our strict criteria in health; selection must occur quickly, and the selected kernels must maintain the appropriate level of complexity, sparsity, and stability as data arrives online. We introduce the Kernel Evolution Model (KEM), a generative process on how to evolve kernel compositions in a way that manages the bias-variance trade-off as we observe more data about a user. Using pilot data, we learn a set of <i>kernel evolutions</i> that can be used to quickly select kernels for new test users. KEM reliably selects high-performing kernels for a range of synthetic and real data sets, including two health data sets.</p>","PeriodicalId":75238,"journal":{"name":"Transactions on machine learning research","volume":"2023 ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11142638/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141201438","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Reliable Active Learning via Influence Functions. 通过影响函数进行可靠的主动学习。
Meng Xia, Ricardo Henao
{"title":"Reliable Active Learning via Influence Functions.","authors":"Meng Xia, Ricardo Henao","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Due to the high cost and time-consuming nature of collecting labeled data, having insufficient labeled data is a common challenge that can negatively impact the performance of deep learning models when applied to real-world applications. Active learning (AL) aims to reduce the cost and time required for obtaining labeled data by selecting valuable samples during model training. However, recent works have pointed out the performance unreliability of existing AL algorithms for deep learning (DL) architectures under different scenarios, which manifests as their performance being comparable (or worse) to that of basic random selection. This behavior compromises the applicability of these approaches. We address this problem by proposing a theoretically motivated AL framework for DL architectures. We demonstrate that the most valuable samples for the model are those that, unsurprisingly, improve its performance on the entire dataset, most of which is unlabeled, and present a framework to efficiently estimate such performance (or loss) via influence functions, pseudo labels and diversity selection. Experimental results show that the proposed <i>reliable active learning via influence functions</i> (RALIF) can consistently outperform the random selection baseline as well as other existing and state-of-the art active learning approaches.</p>","PeriodicalId":75238,"journal":{"name":"Transactions on machine learning research","volume":"2023 ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12483530/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145208297","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信