{"title":"Structure-informed Risk Minimization for Robust Ensemble Learning.","authors":"Fengchun Qiao, Yanlin Chen, Xi Peng","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Ensemble learning is a powerful approach for improving generalization under distribution shifts, but its effectiveness heavily depends on how individual models are combined. Existing methods often optimize ensemble weights based on validation data, which may not represent unseen test distributions, leading to suboptimal performance in out-of-distribution (OoD) settings. Inspired by Distributionally Robust Optimization (DRO), we propose Structure-informed Risk Minimization (SRM), a principled framework that learns robust ensemble weights without access to test data. Unlike standard DRO, which defines uncertainty sets based on divergence metrics alone, SRM incorporates structural information of training distributions, ensuring that the uncertainty set aligns with plausible real-world shifts. This approach mitigates the over-pessimism of traditional worst-case optimization while maintaining robustness. We introduce a computationally efficient optimization algorithm with theoretical guarantees and demonstrate that SRM achieves superior OoD generalization compared to existing ensemble combination strategies across diverse benchmarks. Code is available at: https://github.com/deep-real/SRM.</p>","PeriodicalId":74504,"journal":{"name":"Proceedings of machine learning research","volume":"267 ","pages":"50130-50141"},"PeriodicalIF":0.0,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13131038/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147824469","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Accurate Identification of Communication Between Multiple Interacting Neural Populations.","authors":"Belle Liu, Jacob Sacks, Matthew D Golub","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Neural recording technologies now enable simultaneous recording of population activity across many brain regions, motivating the development of data-driven models of communication between brain regions. However, existing models can struggle to disentangle the sources that influence recorded neural populations, leading to inaccurate portraits of inter-regional communication. Here, we introduce Multi-Region Latent Factor Analysis via Dynamical Systems (MR-LFADS), a sequential variational autoencoder designed to disentangle inter-regional communication, inputs from unobserved regions, and local neural population dynamics. We show that MR-LFADS outperforms existing approaches at identifying communication across dozens of simulations of task-trained multi-region networks. When applied to large-scale electrophysiology, MR-LFADS predicts brain-wide effects of circuit perturbations that were held out during model fitting. These validations on synthetic and real neural data position MR-LFADS as a promising tool for discovering principles of brain-wide information processing.</p>","PeriodicalId":74504,"journal":{"name":"Proceedings of machine learning research","volume":"267 ","pages":"39381-39404"},"PeriodicalIF":0.0,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12715561/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145806697","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yash Shah, Camila Gonzalez, Mohammad H Abbasi, Qingyu Zhao, Kilian M Pohl, Ehsan Adeli
{"title":"Confounder-Free Continual Learning via Recursive Feature Normalization.","authors":"Yash Shah, Camila Gonzalez, Mohammad H Abbasi, Qingyu Zhao, Kilian M Pohl, Ehsan Adeli","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Confounders are extraneous variables that affect both the input and the target, resulting in spurious correlations and biased predictions. There are recent advances in dealing with or removing confounders in traditional models, such as metadata normalization (MDN), where the distribution of the learned features is adjusted based on the study confounders. However, in the context of continual learning, where a model learns continuously from new data over time without forgetting, learning feature representations that are invariant to confounders remains a significant challenge. To remove their influence from intermediate feature representations, we introduce the Recursive MDN (R-MDN) layer, which can be integrated into any deep learning architecture, including vision transformers, and at any model stage. R-MDN performs statistical regression via the recursive least squares algorithm to maintain and continually update an internal model <i>state</i> with respect to changing distributions of data and confounding variables. Our experiments demonstrate that R-MDN promotes equitable predictions across population groups, both within static learning and across different stages of continual learning, by reducing catastrophic forgetting caused by confounder effects changing over time.</p>","PeriodicalId":74504,"journal":{"name":"Proceedings of machine learning research","volume":"267 ","pages":"54112-54142"},"PeriodicalIF":0.0,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12823023/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146032030","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Osman Berke Guney, Ketan Suhaas Saichandran, Karim Elzokm, Ziming Zhang, Vijaya B Kolachalama
{"title":"Active Feature Acquisition Via Explainability-driven Ranking.","authors":"Osman Berke Guney, Ketan Suhaas Saichandran, Karim Elzokm, Ziming Zhang, Vijaya B Kolachalama","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>In many practical applications, including medicine, acquiring all relevant data for machine learning models is often infeasible due to constraints on time, cost, and resources. This makes it important to selectively acquire only the most informative features, yet traditional static feature selection methods fall short in scenarios where feature importance varies across instances. Here, we propose an active feature acquisition (AFA) framework, which dynamically selects features based on their importance to each individual case. Our method leverages local explanation techniques to generate instance-specific feature importance rankings. We then reframe the AFA problem as a feature prediction task, introducing a policy network grounded in a decision transformer architecture. This policy network is trained to select the next most informative feature by learning from the feature importance rankings. As a result, features are acquired sequentially, ordered by their predictive significance, leading to more efficient feature selection and acquisition. Extensive experiments on multiple datasets demonstrate that our approach outperforms current state-of-the-art AFA methods in predictive accuracy and feature acquisition efficiency. These findings highlight the promise of an explainability-driven AFA strategy in scenarios where feature acquisition is a concern.</p>","PeriodicalId":74504,"journal":{"name":"Proceedings of machine learning research","volume":"267 ","pages":"20748-20765"},"PeriodicalIF":0.0,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12661659/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145650405","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Neural Interpretable PDEs: Harmonizing Fourier Insights with Attention for Scalable and Interpretable Physics Discovery.","authors":"Ning Liu, Yue Yu","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Attention mechanisms have emerged as transformative tools in core AI domains such as natural language processing and computer vision. Yet, their largely untapped potential for modeling intricate physical systems presents a compelling frontier. Learning such systems often entails discovering operators that map between functional spaces using limited instances of function pairs-a task commonly framed as a severely ill-posed inverse PDE problem. In this work, we introduce Neural Interpretable PDEs (NIPS), a novel neural operator architecture that builds upon and enhances Nonlocal Attention Operators (NAO) in both predictive accuracy and computational efficiency. NIPS employs a linear attention mechanism to enable scalable learning and integrates a learnable kernel network that acts as a channel-independent convolution in Fourier space. As a consequence, NIPS eliminates the need to explicitly compute and store large pairwise interactions, effectively amortizing the cost of handling spatial interactions into the Fourier transform. Empirical evaluations demonstrate that NIPS consistently surpasses NAO and other baselines across diverse benchmarks, heralding a substantial leap in scalable, interpretable, and efficient physics learning. Our code and data accompanying this paper are available at https://github.com/fishmoon1234/Nonlocal-Attention-Operator.</p>","PeriodicalId":74504,"journal":{"name":"Proceedings of machine learning research","volume":"267 ","pages":"40085-40099"},"PeriodicalIF":0.0,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12981550/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147470090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The Diffusion Duality.","authors":"Subham Sekhar Sahoo, Justin Deschenaux, Aaron Gokaslan, Guanghan Wang, Justin Chiu, Volodymyr Kuleshov","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Uniform-state discrete diffusion models hold the promise of fast text generation due to their inherent ability to self-correct. However, they are typically outperformed by autoregressive models and masked diffusion models. In this work, we narrow this performance gap by leveraging a key insight: Uniform-state diffusion processes naturally emerge from an underlying Gaussian diffusion. Our method, Duo, transfers powerful techniques from Gaussian diffusion to improve both training and sampling. First, we introduce a curriculum learning strategy guided by the Gaussian process, <b>doubling training speed</b> by reducing variance. Models trained with curriculum learning surpass autoregressive models in zero-shot perplexity on 3 of 7 benchmarks. Second, we present Discrete Consistency Distillation, which adapts consistency distillation from the continuous to the discrete setting. This algorithm unlocks <b>few-step generation in diffusion language models</b> by accelerating sampling by two orders of magnitude. We provide the code and model checkpoints on the project page: https://s-sahoo.com/duo.</p>","PeriodicalId":74504,"journal":{"name":"Proceedings of machine learning research","volume":"267 ","pages":"52584-52619"},"PeriodicalIF":0.0,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12975121/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147438095","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Maria Despoina Siampou, Jialiang Li, John Krumm, Cyrus Shahabi, Hua Lu
{"title":"Poly2Vec: Polymorphic Fourier-Based Encoding of Geospatial Objects for GeoAI Applications.","authors":"Maria Despoina Siampou, Jialiang Li, John Krumm, Cyrus Shahabi, Hua Lu","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Encoding geospatial objects is fundamental for geospatial artificial intelligence (GeoAI) applications, which leverage machine learning (ML) models to analyze spatial information. Common approaches transform each object into known formats, like image and text, for compatibility with ML models. However, this process often discards crucial spatial information, such as the object's position relative to the entire space, reducing downstream task effectiveness. Alternative encoding methods that preserve some spatial properties are often devised for specific data objects (e.g., point encoders), making them unsuitable for tasks that involve different data types (i.e., points, polylines, and polygons). To this end, we propose Poly2Vec, a polymorphic Fourier-based encoding approach that unifies the representation of geospatial objects, while preserving the essential spatial properties. Poly2Vec incorporates a learned fusion module that adaptively integrates the magnitude and phase of the Fourier transform for different tasks and geometries. We evaluate Poly2Vec on five diverse tasks, organized into two categories. The first empirically demonstrates that Poly2Vec consistently outperforms object-specific baselines in preserving three key spatial relationships: topology, direction, and distance. The second shows that integrating Poly2Vec into a state-of-the-art GeoAI workflow improves the performance in two popular tasks: population prediction and land use inference.</p>","PeriodicalId":74504,"journal":{"name":"Proceedings of machine learning research","volume":"267 ","pages":"55511-55532"},"PeriodicalIF":0.0,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13078377/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147693992","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Harnessing Heterogeneous Statistical Strength for Personalized Federated Learning via Hierarchical Bayesian Inference.","authors":"Mahendra Singh Thapa, Rui Li","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Personalized federated learning (PFL) based on Bayesian approach tackle the challenges from statistical heterogeneity of client data by computing a personalized posterior distribution over the parameters of each client's local model and constructing a global distribution by aggregating the parameters of these personalized posteriors. However, the heuristic aggregation methods introduce strong biases and result in global models with poor generalization. We thus propose a novel hierarchical Bayesian inference framework for PFL by specifying a conjugate hyper-prior over the parameters of the personalized posteriors. This allows us to jointly compute a global posterior distribution for aggregation and the personalized ones at local level. This hierarchical Bayesian inference framework achieves elegant balance between local personalization and global model robustness. Extensive empirical study shows that by effectively sharing the heterogeneous statistical strength across the local models while retaining their distinctive characteristics, our framework yields state-of-the-art performance. We also show that existing Bayesian PFLs are special cases of our framework.</p>","PeriodicalId":74504,"journal":{"name":"Proceedings of machine learning research","volume":"267 ","pages":"59285-59303"},"PeriodicalIF":0.0,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13050239/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147624937","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shahriar Noroozizadeh, Pim Welle, Jeremy C Weiss, George H Chen
{"title":"The Impact of Medication Non-adherence on Adverse Outcomes: Evidence from Schizophrenia Patients via Survival Analysis.","authors":"Shahriar Noroozizadeh, Pim Welle, Jeremy C Weiss, George H Chen","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>This study quantifies the association between non-adherence to antipsychotic medications and adverse outcomes in individuals with schizophrenia. We frame the problem using survival analysis, focusing on the time to the earliest of several adverse events (early death, involuntary hospitalization, jail booking). We extend standard causal inference methods (T-learner, S-learner, nearest neighbor matching) to utilize various survival models to estimate individual and average treatment effects, where treatment corresponds to medication non-adherence. Analyses are repeated using different amounts of longitudinal information (3, 6, 9, and 12 months). Using data from Allegheny County in western Pennsylvania, we find strong evidence that non-adherence advances adverse outcomes by approximately 1 to 4 months. Ablation studies confirm that county-provided risk scores adjust for key confounders, as their removal amplifies the estimated effects. Subgroup analyses by medication formulation (injectable vs. oral) and medication type consistently show that non-adherence is associated with earlier adverse events. These findings highlight the clinical importance of adherence in delaying psychiatric crises and show that integrating survival analysis with causal inference tools can yield policy-relevant insights. We caution that although we apply causal inference, we only make associative claims and discuss assumptions needed for causal interpretation.</p>","PeriodicalId":74504,"journal":{"name":"Proceedings of machine learning research","volume":"287 ","pages":"573-609"},"PeriodicalIF":0.0,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12444782/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145115155","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Benchmarking Missing Data Imputation Methods for Time Series Using Real-World Test Cases.","authors":"Adedolapo Aishat Toye, Asuman Celik, Samantha Kleinberg","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Missing data is pervasive in healthcare. Many imputation methods exist to fill in missing values, yet most were evaluated using randomly deleted values rather than the actual mechanisms they were designed to address. We aimed to determine real-world accuracy for missing data imputation with three missing data mechanisms (missing completely at random, MCAR; missing at random, MAR; and not missing at random, NMAR) for state of the art and commonly used imputation methods. Using two time series data targets (continuous glucose monitoring, Loop dataset; heart rate, All of Us dataset) we simulated missingness by masking values for each mechanism, at a range of missingness percentages (5-30%) and tested 12 imputation methods. We evaluated accuracy with multiple metrics including root mean square error (RMSE) and bias. We found that overall, accuracy was significantly better on MCAR than on MAR and NMAR, despite many methods being developed for those mechanisms. Linear interpolation had the lowest RMSE with all mechanisms and for all demographic groups, with low bias. This study shows that current evaluation practices do not provide an accurate picture of real world performance with realistic patterns of missingness. Future research is needed to develop evaluation practices that better capture real-world accuracy, and methods that better address real-world mechanisms.</p>","PeriodicalId":74504,"journal":{"name":"Proceedings of machine learning research","volume":"287 ","pages":"480-501"},"PeriodicalIF":0.0,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12392262/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144981808","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}