{"title":"Intersecting the Markov Blankets of Endogenous and Exogenous Variables for Causal Discovery","authors":"Yiran Dong;Chuanhou Gao","doi":"10.1109/TPAMI.2025.3564584","DOIUrl":"10.1109/TPAMI.2025.3564584","url":null,"abstract":"Exogenous variables are specially used in Structural Causal Models (SCM), which, however, have some characteristics that are still useful under the property of the Bayesian network. In this paper, we propose a novel causal discovery learning algorithm called Endogenous and Exogenous Markov Blankets Intersection (EEMBI), which combines the properties of Bayesian networks and SCM. Through intersecting the Markov blankets of exogenous variables and endogenous variables (the original variables), EEMBI can remove the irrelevant connections and find the true causal structure theoretically. Furthermore, we propose an extended version of EEMBI, named EEMBI-PC, which integrates the last step of the PC algorithm into EEMBI. This extension enhances the algorithm's performance by leveraging the strengths of both approaches. Plenty of experiments are provided to prove that EEMBI have state-of-the-art performance on continuous datasets, and EEMBI-PC outperforms other algorithms on discrete datasets.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 8","pages":"6929-6945"},"PeriodicalIF":0.0,"publicationDate":"2025-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143876142","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Continual Unsupervised Generative Modeling","authors":"Fei Ye;Adrian G. Bors","doi":"10.1109/TPAMI.2025.3564188","DOIUrl":"10.1109/TPAMI.2025.3564188","url":null,"abstract":"Variational Autoencoders (VAEs), can achieve remarkable results in single tasks, by learning data representations, image generation, or image-to-image translation among others. However, VAEs suffer from loss of information when aiming to continuously learn a sequence of different data domains. This is caused by the catastrophic forgetting, which affects all machine learning methods. This paper addresses the problem of catastrophic forgetting by developing a new theoretical framework which derives an upper bound to the negative sample log-likelihood when continuously learning sequences of tasks. These theoretical derivations provide new insights into the forgetting behavior of learning models, showing that their optimal performance is achieved when a dynamic mixture expansion model adds new components whenever learning new tasks. In our approach we optimize the model size by introducing the Dynamic Expansion Graph Model (DEGM) that dynamically builds a graph structure promoting the positive knowledge transfer when learning new tasks. In addition, we propose a Dynamic Expansion Graph Adaptive Mechanism (DEGAM) that generates adaptive weights to regulate the graph structure, further improving the positive knowledge transfer effectiveness. Experimental results show that the proposed methodology performs better than other baselines in continual learning.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 8","pages":"6256-6273"},"PeriodicalIF":0.0,"publicationDate":"2025-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143876058","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Prompt-Based Multi-Interest Learning Method for Sequential Recommendation","authors":"Xue Dong;Xuemeng Song;Tongliang Liu;Weili Guan","doi":"10.1109/TPAMI.2025.3563663","DOIUrl":"10.1109/TPAMI.2025.3563663","url":null,"abstract":"Multi-interest learning method for sequential recommendation aims to predict the next item according to user multi-faceted interests given the user historical interactions. Existing methods mainly consist of a multi-interest extractor that embeds the user interactions into the user multi-interest embeddings, and a multi-interest aggregator that aggregates the learned multi-interest embeddings to the final user embedding, used for predicting the user rating to an item. Despite their effectiveness, existing methods have two key limitations: 1) they directly feed the user interactions into the multi-interest extractor and aggregator, while ignoring their different learning objectives, and 2) they merely consider the centrality of the user interactions to capture the user interests, while overlooking their dispersion. To tackle these limitations, we propose a prompt-based multi-interest learning method (PoMRec), where specific prompts are inserted into the inputted user interactions to make them adaptive to the multi-interest extractor and aggregator. Moreover, we utilize both the mean and variance embeddings of user interactions to embed the user multiple interests for the comprehensively user interest learning. We conduct extensive experiments on three public datasets, and the results verify that our proposed PoMRec outperforms the state-of-the-art multi-interest learning methods.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 8","pages":"6876-6887"},"PeriodicalIF":0.0,"publicationDate":"2025-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143867030","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chuanxiang Yang;Yuanfeng Zhou;Guangshun Wei;Long Ma;Junhui Hou;Yuan Liu;Wenping Wang
{"title":"Monge-Ampere Regularization for Learning Arbitrary Shapes From Point Clouds","authors":"Chuanxiang Yang;Yuanfeng Zhou;Guangshun Wei;Long Ma;Junhui Hou;Yuan Liu;Wenping Wang","doi":"10.1109/TPAMI.2025.3563601","DOIUrl":"10.1109/TPAMI.2025.3563601","url":null,"abstract":"As commonly used implicit geometry representations, the signed distance function (SDF) is limited to modeling watertight shapes, while the unsigned distance function (UDF) is capable of representing various surfaces. However, its inherent theoretical shortcoming, i.e., the non-differentiability at the zero-level set, would result in sub-optimal reconstruction quality. In this paper, we propose the scaled-squared distance function (S<sup>2</sup>DF), a novel implicit surface representation for modeling <italic>arbitrary</i> surface types. S<sup>2</sup>DF does not distinguish between inside and outside regions while effectively addressing the non-differentiability issue of UDF at the zero-level set. We demonstrate that S<sup>2</sup>DF satisfies a second-order partial differential equation of Monge-Ampere-type, allowing us to develop a learning pipeline that leverages a novel Monge-Ampere regularization to directly learn S<sup>2</sup>DF from raw unoriented point clouds <italic>without</i> supervision from ground-truth S<sup>2</sup>DF values. Extensive experiments across multiple datasets show that our method significantly outperforms state-of-the-art supervised approaches that require ground-truth surface information as supervision for training.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 8","pages":"6809-6822"},"PeriodicalIF":0.0,"publicationDate":"2025-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143867031","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Diff-Retinex++: Retinex-Driven Reinforced Diffusion Model for Low-Light Image Enhancement","authors":"Xunpeng Yi;Han Xu;Hao Zhang;Linfeng Tang;Jiayi Ma","doi":"10.1109/TPAMI.2025.3563612","DOIUrl":"10.1109/TPAMI.2025.3563612","url":null,"abstract":"This paper proposes a Retinex-driven reinforced diffusion model for low-light image enhancement, termed Diff-Retinex++, to address various degradations caused by low light. Our main approach integrates the diffusion model with Retinex-driven restoration to achieve physically-inspired generative enhancement, making it a pioneering effort. To be detailed, Diff-Retinex++ consists of two-stage view modules, including the Denoising Diffusion Model (DDM), and the Retinex-Driven Mixture of Experts Model (RMoE). First, DDM treats low-light image enhancement as one type of image generation task, benefiting from the powerful generation ability of diffusion model to handle the enhancement. Second, we design the Retinex theory into the plug-and-play supervision attention module. It leverages the latent features in the backbone and knowledge distillation to learn Retinex rules, and further regulates these latent features through the attention mechanism. In this way, it couples the relationship between Retinex decomposition and image enhancement in a new view, achieving dual improvement. In addition, the Low-Light Mixture of Experts preserves the vividness of the diffusion model and fidelity of the Retinex-driven restoration to the greatest extent. Ultimately, the iteration of DDM and RMoE achieves the goal of Retinex-driven reinforced diffusion model. Extensive experiments conducted on real-world low-light datasets qualitatively and quantitatively demonstrate the effectiveness, superiority, and generalization of the proposed method.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 8","pages":"6823-6841"},"PeriodicalIF":0.0,"publicationDate":"2025-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143867032","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dogyoon Lee;Donghyeong Kim;Jungho Lee;Minhyeok Lee;Seunghoon Lee;Sangyoun Lee
{"title":"Sparse-DeRF: Deblurred Neural Radiance Fields From Sparse View","authors":"Dogyoon Lee;Donghyeong Kim;Jungho Lee;Minhyeok Lee;Seunghoon Lee;Sangyoun Lee","doi":"10.1109/TPAMI.2025.3563398","DOIUrl":"10.1109/TPAMI.2025.3563398","url":null,"abstract":"Recent studies construct deblurred neural radiance fields (DeRF) using dozens of blurry images, which are not practical scenarios if only a limited number of blurry images are available. This paper focuses on constructing DeRF from sparse-view for more pragmatic real-world scenarios. As observed in our experiments, establishing DeRF from sparse views proves to be a more challenging problem due to the inherent complexity arising from the simultaneous optimization of blur kernels and NeRF from sparse view. Sparse-DeRF successfully regularizes the complicated joint optimization, presenting alleviated overfitting artifacts and enhanced quality on radiance fields. The regularization consists of three key components: Surface smoothness, helps the model accurately predict the scene structure utilizing unseen and additional hidden rays derived from the blur kernel based on statistical tendencies of real-world; Modulated gradient scaling, helps the model adjust the amount of the backpropagated gradient according to the arrangements of scene objects; Perceptual distillation improves the perceptual quality by overcoming the ill-posed multi-view inconsistency of image deblurring and distilling the pre-deblurred information, compensating for the lack of clean information in blurry images. We demonstrate the effectiveness of the Sparse-DeRF with extensive quantitative and qualitative experimental results by training DeRF from 2-view, 4-view, and 6-view blurry images.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 8","pages":"6842-6858"},"PeriodicalIF":0.0,"publicationDate":"2025-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143862110","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Diagnostic Captioning by Cooperative Task Interactions and Sample-Graph Consistency","authors":"Zhanyu Wang;Lei Wang;Xiu Li;Luping Zhou","doi":"10.1109/TPAMI.2025.3562866","DOIUrl":"10.1109/TPAMI.2025.3562866","url":null,"abstract":"Radiographic images are similar to each other, making it challenging for diagnostic captioning to narrate fine-grained visual differences of clinical importance. In this paper, we propose a self-boosting framework integrating two novel strategies to learn tightly correlated image and text features for diagnostic captioning. The first strategy explicitly aligns image and text features through training an auxiliary task of image-text matching (ITM) jointly with the main task of report generation (RG) as two branches of a network model. The ITM branch explicitly learns image-text alignment and provides highly correlated visual and textual features for the RG branch to generate high-quality reports. The high-quality reports generated by RG branch, in turn, are utilized as additional harder negative samples to push the ITM branch to evolve towards better image-text alignment. These two branches help improve each other progressively, so that the whole model is self-boosted without requiring external resources. The second strategy aligns image-sample space and report-sample space to achieve consistent image and text feature embeddings. To achieve this, the sample graph of the embedded ground-truth reports is built and used as the target to train the sample graph of the embedded images so that the fine discrepancy in the ground-truth reports could be captured by the learned visual feature embeddings. Our proposed framework demonstrates its superiority on two medical report generation benchmarks, including the largest dataset MIMIC-CXR.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 8","pages":"6585-6598"},"PeriodicalIF":0.0,"publicationDate":"2025-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143862348","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Unified Domain Adaptive Semantic Segmentation","authors":"Zhe Zhang;Gaochang Wu;Jing Zhang;Xiatian Zhu;Dacheng Tao;Tianyou Chai","doi":"10.1109/TPAMI.2025.3562999","DOIUrl":"10.1109/TPAMI.2025.3562999","url":null,"abstract":"Unsupervised Domain Adaptive Semantic Segmentation (UDA-SS) aims to transfer the supervision from a labeled source domain to an unlabeled and shifted target domain. The majority of existing UDA-SS works typically consider images whilst recent attempts have extended further to tackle videos by modeling the temporal dimension. Although two lines of research share the major challenges – overcoming the underlying domain distribution shift, their studies are largely independent. It causes several issues: (1) The insights gained from each line of research remain fragmented, leading to a lack of holistic understanding of the problem and potential solutions. (2) Preventing the unification of methods and best practices across two scenarios (images and videos) will lead to redundant efforts and missed opportunities for cross-pollination of ideas. (3) Without a unified approach, the knowledge and advancements made in one scenario may not be effectively transferred to the other, leading to suboptimal performance and slower progress. Under this observation, we advocate unifying the study of UDA-SS across video and image scenarios, enabling a more comprehensive understanding, synergistic advancements, and efficient knowledge sharing. To that end, we explore the unified UDA-SS from a general domain augmentation perspective, serving as a unifying framework, enabling improved generalization, and potential for cross-pollination, ultimately contributing to the practical impact and overall progress. Specifically, we propose a Quad-directional Mixup (QuadMix) method, characterized by tackling intra-domain discontinuity, fragmented gap bridging, and feature inconsistencies through four-directional paths designed for intra- and inter-domain mixing within an explicit feature space. To deal with temporal shifts within videos, we incorporate optical flow-guided feature aggregation across spatial and temporal dimensions for fine-grained domain alignment, which is extendable to image scenarios. Extensive experiments show that QuadMix outperforms the state-of-the-art works by large margins on four challenging UDA-SS benchmarks.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 8","pages":"6731-6748"},"PeriodicalIF":0.0,"publicationDate":"2025-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143858050","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Accelerate Presolve in Large-Scale Linear Programming via Reinforcement Learning","authors":"Yufei Kuang;Xijun Li;Jie Wang;Fangzhou Zhu;Meng Lu;Zhihai Wang;Jia Zeng;Houqiang Li;Yongdong Zhang;Feng Wu","doi":"10.1109/TPAMI.2025.3562286","DOIUrl":"10.1109/TPAMI.2025.3562286","url":null,"abstract":"As one of the most critical components in modern LP solvers, presolve in linear programming (LP) employs a rich set of presolvers to remove different types of redundancy in input problems by equivalent transformations. We found from extensive experiments that the presolve routine—that is, the method determining (P1) which presolvers to select, (P2) in what order to execute, and (P3) when to stop—significantly impacts the efficiency of solving LPs. However, designing high-quality presolve routines is highly challenging due to the enormous search space, and further optimizing the routines on different tasks for high performance demands extensive domain knowledge and manual tuning. To tackle this problem, we propose the <italic>first</i> learning based framework—that is, reinforcement learning for presolve (RL4Presolve)—to learn high-quality presolve routines. An appealing feature is that we employ a novel adaptive action sequence that learns complex routines efficiently by generating combinations of presolvers automatically at each step. Extensive experiments demonstrate that RL4Presolve achieves significant improvement (up to roughly 90% ) in the efficiency of solving LPs. Furthermore, we extract routines from learned policies for simple and efficient deployment without GPU resources to Huawei's supply chain, where extensive manual tuning for each separate task was required previously due to the high economic value.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 8","pages":"6660-6672"},"PeriodicalIF":0.0,"publicationDate":"2025-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143858053","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hao Zhou;Lu Qi;Tiancheng Shen;Hai Huang;Xu Yang;Xiangtai Li;Ming-Hsuan Yang
{"title":"Rethinking Evaluation Metrics of Open-Vocabulary Segmentation","authors":"Hao Zhou;Lu Qi;Tiancheng Shen;Hai Huang;Xu Yang;Xiangtai Li;Ming-Hsuan Yang","doi":"10.1109/TPAMI.2025.3562930","DOIUrl":"10.1109/TPAMI.2025.3562930","url":null,"abstract":"This paper highlights a problem of evaluation metrics adopted in the open-vocabulary segmentation. The evaluation process relies heavily on closed-set metrics on zero-shot or cross-dataset pipelines without considering the similarity between predicted and ground truth categories. We first survey eleven similarity measurements between two categorical words using WordNet linguistics statistics, text embedding, or language models by comprehensive quantitative analysis and user study to tackle this issue. Based on those explored measurements, we design novel evaluation metrics, Open mIoU, Open AP, and Open PQ, tailored for three open-vocabulary segmentation tasks. We benchmark the proposed evaluation metrics on twelve open-vocabulary methods in three segmentation tasks. Despite the relative subjectivity of similarity distance, we demonstrate that our metrics can still well evaluate the open ability of the existing open-vocabulary segmentation methods. We hope our work can bring the community new thinking about evaluating model ability for open-vocabulary segmentation.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 8","pages":"6780-6796"},"PeriodicalIF":0.0,"publicationDate":"2025-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143858052","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}