Proceedings of machine learning research最新文献

筛选
英文 中文
Scaling Laws in Patchification: An Image Is Worth 50,176 Tokens And More. 补丁化中的缩放定律:一张图像价值50176个代币甚至更多。
Feng Wang, Yaodong Yu, Wei Shao, Yuyin Zhou, Alan Yuille, Cihang Xie
{"title":"Scaling Laws in Patchification: An Image Is Worth 50,176 Tokens And More.","authors":"Feng Wang, Yaodong Yu, Wei Shao, Yuyin Zhou, Alan Yuille, Cihang Xie","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Since the introduction of Vision Transformer (ViT), patchification has long been regarded as a <i>de facto</i> image tokenization approach for plain visual architectures. By compressing the spatial size of images, this approach can effectively shorten the token sequence and reduce the computational cost of ViT-like plain architectures. In this work, we aim to thoroughly examine the information loss caused by this patchification-based compressive encoding paradigm and how it affects visual understanding. We conduct extensive patch size scaling experiments and excitedly observe an intriguing scaling law in patchification: the models can consistently benefit from decreased patch sizes and attain improved predictive performance, until it reaches the minimum patch size of 1×1, <i>i.e</i>., pixel tokenization. This conclusion is broadly applicable across different vision tasks, various input scales, and diverse architectures such as ViT and the recent Mamba models. Moreover, as a by-product, we discover that with smaller patches, task-specific decoder heads become less critical for dense prediction. In the experiments, we successfully scale up the visual sequence to an exceptional length of 50,176 tokens, achieving a competitive test accuracy of 84.6% with a base-sized model on the ImageNet-1k benchmark. We hope this study can provide insights and theoretical foundations for future works of building non-compressive vision models. Code is available at https://github.com/wangf3014/Patch_Scaling.</p>","PeriodicalId":74504,"journal":{"name":"Proceedings of machine learning research","volume":"267 ","pages":"65278-65290"},"PeriodicalIF":0.0,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13021248/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147576803","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Test-Time Training Provably Improves Transformers as In-context Learners. 测试时间训练可证明提高变形金刚作为语境学习者。
Halil Alperen Gozeten, M Emrullah Ildiz, Xuechen Zhang, Mahdi Soltanolkotabi, Marco Mondelli, Samet Oymak
{"title":"Test-Time Training Provably Improves Transformers as In-context Learners.","authors":"Halil Alperen Gozeten, M Emrullah Ildiz, Xuechen Zhang, Mahdi Soltanolkotabi, Marco Mondelli, Samet Oymak","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Test-time training (TTT) methods explicitly update the weights of a model to adapt to the specific test instance, and they have found success in a variety of settings, including most recently language modeling and reasoning. To demystify this success, we investigate a gradient-based TTT algorithm for in-context learning, where we train a transformer model on the in-context demonstrations provided in the test prompt. Specifically, we provide a comprehensive theoretical characterization of linear transformers when the update rule is a single gradient step. Our theory <i>(i)</i> delineates the role of alignment between pretraining distribution and target task, <i>(ii)</i> demystifies how TTT can alleviate distribution shift, and <i>(iii)</i> quantifies the sample complexity of TTT including how it can significantly reduce the eventual sample size required for in-context learning. As our empirical contribution, we study the benefits of TTT for TabPFN, a tabular foundation model. In line with our theory, we demonstrate that TTT significantly reduces the required sample size for tabular classification (3 to 5 times fewer) unlocking substantial inference efficiency with a negligible training cost.</p>","PeriodicalId":74504,"journal":{"name":"Proceedings of machine learning research","volume":"267 ","pages":"20266-20295"},"PeriodicalIF":0.0,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12662752/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145650398","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dynamical Modeling of Behaviorally Relevant Spatiotemporal Patterns in Neural Imaging Data. 神经成像数据中行为相关时空模式的动态建模。
Mohammad Hosseini, Maryam M Shanechi
{"title":"Dynamical Modeling of Behaviorally Relevant Spatiotemporal Patterns in Neural Imaging Data.","authors":"Mohammad Hosseini, Maryam M Shanechi","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>High-dimensional imaging of neural activity, such as widefield calcium and functional ultrasound imaging, provide a rich source of information for understanding the relationship between brain activity and behavior. Accurately modeling neural dynamics in these modalities is crucial for understanding this relationship but is hindered by the high-dimensionality, complex spatiotemporal dependencies, and prevalent behaviorally irrelevant dynamics in these modalities. Existing dynamical models often employ preprocessing steps to obtain low-dimensional representations from neural image modalities. However, this process can discard behaviorally relevant information and miss spatiotemporal structure. We propose SBIND, a novel data-driven deep learning framework to model spatiotemporal dependencies in neural images and disentangle their behaviorally relevant dynamics from other neural dynamics. We validate SBIND on widefield imaging datasets, and show its extension to functional ultrasound imaging, a recent modality whose dynamical modeling has largely remained unexplored. We find that our model effectively identifies both local and long-range spatial dependencies across the brain while also dissociating behaviorally relevant neural dynamics. Doing so, SBIND outperforms existing models in neural-behavioral prediction. Overall, SBIND provides a versatile tool for investigating the neural mechanisms underlying behavior using imaging modalities.</p>","PeriodicalId":74504,"journal":{"name":"Proceedings of machine learning research","volume":"267 ","pages":"23846-23872"},"PeriodicalIF":0.0,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12662753/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145650400","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CACTI: Leveraging Copy Masking and Contextual Information to Improve Tabular Data Imputation. CACTI:利用复制屏蔽和上下文信息来改进表格数据输入。
Aditya Gorla, Ryan Wang, Zhengtong Liu, Ulzee An, Sriram Sankararaman
{"title":"CACTI: Leveraging Copy Masking and Contextual Information to Improve Tabular Data Imputation.","authors":"Aditya Gorla, Ryan Wang, Zhengtong Liu, Ulzee An, Sriram Sankararaman","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>We present CACTI, a masked autoencoding approach for imputing tabular data that leverages the structure in missingness patterns and contextual information. Our approach employs a novel median truncated copy masking training strategy that encourages the model to learn from empirical patterns of missingness while incorporating semantic relationships between features - captured by column names and text descriptions - to better represent feature dependence. These dual sources of inductive bias enable CACTI to outperform state-of-the-art methods - an average <i>R</i> <sup>2</sup> gain of 7.8% over the next best method (13.4%, 6.1%, and 5.3% under missing not at random, at random and completely at random, respectively) - across a diverse range of datasets and missingness conditions. Our results highlight the value of leveraging dataset-specific contextual information and missingness patterns to enhance imputation performance. Code is publicly available at github.com/sriramlab/CACTI.</p>","PeriodicalId":74504,"journal":{"name":"Proceedings of machine learning research","volume":"267 ","pages":"20187-20225"},"PeriodicalIF":0.0,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13060770/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147647869","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
"Why Is There a Tumor?": Tell Me the Reason, Show Me the Evidence. “为什么会有肿瘤?”告诉我原因,给我证据。
Mengmeng Ma, Tang Li, Yunxiang Peng, Lu Lin, Volkan Beylergil, Binsheng Zhao, Oguz Akin, Xi Peng
{"title":"\"Why Is There a Tumor?\": Tell Me the Reason, Show Me the Evidence.","authors":"Mengmeng Ma, Tang Li, Yunxiang Peng, Lu Lin, Volkan Beylergil, Binsheng Zhao, Oguz Akin, Xi Peng","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Medical AI models excel at tumor detection and segmentation. However, their latent representations often lack explicit ties to clinical semantics, producing outputs less trusted in clinical practice. Most of the existing models generate either segmentation masks/labels (localizing where without why) or textual justifications (explaining why without where), failing to ground clinical concepts in spatially localized evidence. To bridge this gap, we propose to develop models that can justify the segmentation or detection using clinically relevant terms and point to visual evidence. We address two core challenges: First, we curate a rationale dataset to tackle the lack of paired images, annotations, and textual rationales for training. The dataset includes 180K image-mask-rationale triples with quality evaluated by expert radiologists. Second, we design rationale-informed optimization that disentangles and localizes fine-grained clinical concepts in a self-supervised manner without requiring pixel-level concept annotations. Experiments across medical benchmarks show our model demonstrates superior performance in segmentation, detection, and beyond. The code is available at https://github.com/deep-real/MedRationale.</p>","PeriodicalId":74504,"journal":{"name":"Proceedings of machine learning research","volume":"267 ","pages":"41992-42008"},"PeriodicalIF":0.0,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13131034/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147824490","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Learning Survival Distributions with the Asymmetric Laplace Distribution. 用非对称拉普拉斯分布学习生存分布。
Deming Sheng, Ricardo Henao
{"title":"Learning Survival Distributions with the Asymmetric Laplace Distribution.","authors":"Deming Sheng, Ricardo Henao","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Probabilistic survival analysis models seek to estimate the distribution of the future occurrence (time) of an event given a set of covariates. In recent years, these models have preferred nonparametric specifications that avoid directly estimating survival distributions via discretization. Specifically, they estimate the probability of an individual event at fixed times or the time of an event at fixed probabilities (quantiles), using supervised learning. Borrowing ideas from the quantile regression literature, we propose a parametric survival analysis method based on the Asymmetric Laplace Distribution (ALD). This distribution allows for closed-form calculation of popular event summaries such as mean, median, mode, variation, and quantiles. The model is optimized by maximum likelihood to learn, at the individual level, the parameters (location, scale, and asymmetry) of the ALD distribution. Extensive results on synthetic and real-world data demonstrate that the proposed method outperforms parametric and nonparametric approaches in terms of accuracy, discrimination and calibration.</p>","PeriodicalId":74504,"journal":{"name":"Proceedings of machine learning research","volume":"267 ","pages":"54772-54809"},"PeriodicalIF":0.0,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12669900/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145673129","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Sidechain conditioning and modeling for full-atom protein sequence design with FAMPNN. 基于FAMPNN的全原子蛋白质序列侧链调节与建模。
Talal Widatalla, Richard W Shuai, Brian L Hie, Po-Ssu Huang
{"title":"Sidechain conditioning and modeling for full-atom protein sequence design with FAMPNN.","authors":"Talal Widatalla, Richard W Shuai, Brian L Hie, Po-Ssu Huang","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Leading deep learning-based methods for fixed-backbone protein sequence design do not model protein sidechain conformation during sequence generation despite the large role the three-dimensional arrangement of sidechain atoms play in protein conformation, stability, and overall protein function. Instead, these models implicitly reason about crucial sidechain interactions based on backbone geometry and known amino acid sequence labels. To address this, we present FAMPNN (Full-Atom MPNN), a sequence design method that explicitly models both sequence identity and sidechain conformation for each residue, where the per-token distribution of a residue's discrete amino acid identity and its continuous sidechain conformation are learned with a combined categorical cross-entropy and diffusion loss objective. We demonstrate that learning these distributions jointly is a highly synergistic task that both improves sequence recovery while achieving state-of-the-art sidechain packing. Furthermore, benefits from full-atom modeling generalize from sequence recovery to practical protein design applications, such as zero-shot prediction of experimental binding and stability measurements.</p>","PeriodicalId":74504,"journal":{"name":"Proceedings of machine learning research","volume":"267 ","pages":"66746-66771"},"PeriodicalIF":0.0,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12646570/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145643666","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Tuning-Free Coreset Markov Chain Monte Carlo via Hot DoG. 通过热狗调频免费核心马尔科夫链蒙特卡洛。
Proceedings of machine learning research Pub Date : 2025-07-01 Epub Date: 2025-07-21
Naitong Chen, Jonathan H Huggins, Trevor Campbell
{"title":"Tuning-Free Coreset Markov Chain Monte Carlo via Hot DoG.","authors":"Naitong Chen, Jonathan H Huggins, Trevor Campbell","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>A Bayesian coreset is a small, weighted subset of a data set that replaces the full data during inference to reduce computational cost. The state-of-the-art coreset construction algorithm, <i>Coreset Markov chain Monte Carlo</i> (Coreset MCMC), uses draws from an adaptive Markov chain targeting the coreset posterior to train the coreset weights via stochastic gradient optimization. However, the quality of the constructed coreset, and thus the quality of its posterior approximation, is sensitive to the stochastic optimization learning rate. In this work, we propose a learning-rate-free stochastic gradient optimization procedure, <i>Hot-start Distance over Gradient</i> (Hot DoG), for training coreset weights in Coreset MCMC without user tuning effort. We provide a theoretical analysis of the convergence of the coreset weights produced by Hot DoG. We also provide empirical results demonstrate that Hot DoG provides higher quality posterior approximations than other learning-rate-free stochastic gradient methods, and performs competitively to optimally-tuned ADAM.</p>","PeriodicalId":74504,"journal":{"name":"Proceedings of machine learning research","volume":"286 ","pages":"647-672"},"PeriodicalIF":0.0,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12704252/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145770231","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Learn Singularly Perturbed Solutions via Homotopy Dynamics. 通过同伦动力学学习奇摄动解。
Chuqi Chen, Yahong Yang, Yang Xiang, Wenrui Hao
{"title":"Learn Singularly Perturbed Solutions via Homotopy Dynamics.","authors":"Chuqi Chen, Yahong Yang, Yang Xiang, Wenrui Hao","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Solving partial differential equations (PDEs) using neural networks has become a central focus in scientific machine learning. Training neural networks for singularly perturbed problems is particularly challenging due to certain parameters in the PDEs that introduce near-singularities in the loss function. In this study, we overcome this challenge by introducing a novel method based on homotopy dynamics to effectively manipulate these parameters. From a theoretical perspective, we analyze the effects of these parameters on training difficulty in these singularly perturbed problems and establish the convergence of the proposed homotopy dynamics method. Experimentally, we demonstrate that our approach significantly accelerates convergence and improves the accuracy of these singularly perturbed problems. These findings present an efficient optimization strategy leveraging homotopy dynamics, offering a robust framework to extend the applicability of neural networks for solving singularly perturbed differential equations.</p>","PeriodicalId":74504,"journal":{"name":"Proceedings of machine learning research","volume":"267 ","pages":"9590-9613"},"PeriodicalIF":0.0,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12662737/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145650332","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
I2MoE: Interpretable Multimodal Interaction-aware Mixture-of-Experts. I2MoE:可解释的多模态交互感知混合专家。
Jiayi Xin, Sukwon Yun, Jie Peng, Inyoung Choi, Jenna L Ballard, Tianlong Chen, Qi Long
{"title":"I<sup>2</sup>MoE: Interpretable Multimodal Interaction-aware Mixture-of-Experts.","authors":"Jiayi Xin, Sukwon Yun, Jie Peng, Inyoung Choi, Jenna L Ballard, Tianlong Chen, Qi Long","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Modality fusion is a cornerstone of multimodal learning, enabling information integration from diverse data sources. However, vanilla fusion methods are limited by <b>(1)</b> inability to account for heterogeneous interactions between modalities and <b>(2)</b> lack of interpretability in uncovering the multimodal interactions inherent in the data. To this end, we propose I<sup>2</sup>MoE (<b>I</b>nterpretable Multimodal <b>I</b>nteraction-aware <b>M</b>ixture <b>o</b>f <b>E</b>xperts), an end-to-end MoE framework designed to enhance modality fusion by explicitly modeling diverse multimodal interactions, as well as providing interpretation on a local and global level. First, I<sup>2</sup>MoE utilizes different interaction experts with weakly supervised interaction losses to learn multimodal interactions in a data-driven way. Second, I<sup>2</sup>MoE deploys a reweighting model that assigns importance scores for the output of each interaction expert, which offers sample-level and dataset-level interpretation. Extensive evaluation of medical and general multimodal datasets shows that I<sup>2</sup>MoE is flexible enough to be combined with different fusion techniques, consistently improves task performance, and provides interpretation across various real-world scenarios. Code is available at https://github.com/Raina-Xin/I2MoE.</p>","PeriodicalId":74504,"journal":{"name":"Proceedings of machine learning research","volume":"267 ","pages":"68870-68888"},"PeriodicalIF":0.0,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13004609/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147500701","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信
小红书