{"title":"TSI-GCN: Translation and scaling invariant GCN for 3D point cloud analysis","authors":"Zijin Du , Jiye Liang , Kaixuan Yao , Feilong Cao","doi":"10.1016/j.patrec.2025.04.037","DOIUrl":"10.1016/j.patrec.2025.04.037","url":null,"abstract":"<div><div>Point cloud is a crucial data format for 3D vision, but its irregularity makes it challenging to comprehend the associated geometric information. Although some previous research has attempted to improve deep learning on point cloud and achieved promising results, they often overlook the robust shape descriptors of 3D targets, making them susceptible to translation and scaling transformations. This paper proposes a novel framework for point cloud analysis, to achieve feature extraction with translation and scaling invariance. It mainly includes local adaptive kernel, translation and scaling invariant convolution (TSIConv), and graph attention pooling. The key component is the design of TSIConv, which extracts the shape information with translation and scaling invariance. Then it performs convolution with local adaptive kernels to capture the features in various shape structures. Following the convolution layer, we add the graph attention pooling to coarsen point cloud, thus achieving multi-scale analysis and computational overhead reduction. The proposed framework, consisting of two networks, completes point cloud classification and part segmentation tasks in an end-to-end manner. The property analysis and experiments demonstrate that our model strictly guarantees the translation and scaling invariance, meanwhile achieving comparable performance to previous methods.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"195 ","pages":"Pages 30-36"},"PeriodicalIF":3.9,"publicationDate":"2025-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144099175","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lu Tan , Xue-Cheng Tai , Ling Li , Wan-Quan Liu , Raymond H. Chan , Dan-Feng Hong
{"title":"Image segmentation via two-step deep variational priors","authors":"Lu Tan , Xue-Cheng Tai , Ling Li , Wan-Quan Liu , Raymond H. Chan , Dan-Feng Hong","doi":"10.1016/j.patrec.2025.04.030","DOIUrl":"10.1016/j.patrec.2025.04.030","url":null,"abstract":"<div><div>This paper proposes an iterative deep variational approach for image segmentation in a fusion manner: it is not only able to realize selective segmentation, but can also alleviate the issue of parameter/initialization dependency. Moreover, it possesses a refinement process designed to handle challenging scenarios, such as images containing obscured, damaged, or absent objects, or those with complex backgrounds. Our proposed approach consists of two main procedures, i.e., selective segmentation and shape transformation. The first procedure works as a stem in a totally unsupervised way. A convolutional neural network (CNN) based architecture is properly incorporated into the selective weighting constrained variational segmentation model. The second procedure is to further refine the outputs. This part can be achieved in two ways: one direction is to establish a joint model with the semantic shape constraint. The other technical direction is to make the shape descriptor separated from the joint model and work as an individual unit. In the proposed approach, the minimization problem is transformed from iterative minimization for each variable to automatically minimizing the loss function by learning the generator network parameters. This also leads to a good inductive bias associated with classic variational methods. Extensive experiments have demonstrated the significant advantages.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"195 ","pages":"Pages 44-50"},"PeriodicalIF":3.9,"publicationDate":"2025-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144099177","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Conditional Stable Diffusion for Distortion Correction and Image Rectification","authors":"Pooja Kumari, Sukhendu Das","doi":"10.1016/j.patrec.2025.04.033","DOIUrl":"10.1016/j.patrec.2025.04.033","url":null,"abstract":"<div><div>Image rectification and distortion correction are fundamental tasks in the field of image processing and computer vision, with it is applications ranging from document processing to medical imaging. This study presents a novel Conditional Stable Diffusion framework designed to tackle the challenges posed by diverse types of image distortions. Unlike existing traditional methods, our approach introduces an adaptive diffusion process that customizes its behavior based on the specific characteristics of the input image. By introducing controlled noise in a bidirectional manner, our model learns to interpret and refine various distortion patterns and progressively refines the image into a more uniform distribution. Furthermore, to complement the diffusion process, we incorporate a Guided Rectification Network (GRN) that generates reliable conditions from the input image, effectively reducing ambiguity between the distorted and target outputs. The integration of stable diffusion is justified by its versatility in handling diverse types and degrees of distortion. Our proposed method effectively handles a wide range of distortions—including projective and complex lens-based distortions such as barrel and pincushion—by dynamically adapting to each unique distortion type. Whether stemming from lens abnormalities, perspective discrepancies, or other factors, our proposed stable diffusion-based method consistently adapts to the specific characteristics of the distortion, yielding superior outcomes. Experimental results across benchmark datasets demonstrate that our method consistently outperforms existing state-of-the-art approaches. Additionally, we highlight that our work is the first instance of the diffusion method being used to simultaneously address various distortion types (barrel, pincushion, lens, etc.) for multi-distortion image rectification. This Conditional Stable Diffusion framework thus offers a promising advancement for robust and versatile image distortion correction.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"194 ","pages":"Pages 62-70"},"PeriodicalIF":3.9,"publicationDate":"2025-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143949015","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nimol Thuon , Jun Du , Panhapin Theang , Ranysakol Thuon
{"title":"Multi-low resource languages in palm leaf manuscript recognition: Syllable-based augmentation and error analysis","authors":"Nimol Thuon , Jun Du , Panhapin Theang , Ranysakol Thuon","doi":"10.1016/j.patrec.2025.04.031","DOIUrl":"10.1016/j.patrec.2025.04.031","url":null,"abstract":"<div><div>Recognizing text from palm leaf manuscripts in low-resource, non-Latin languages like Balinese, Khmer, and Sundanese poses significant challenges due to limited annotated data and complex structures. Unlike modern languages, these ancient scripts exhibit unique linguistic complexities that hinder effective recognition and digital preservation. Building on the success of syllable analysis augmentation for the Khmer script, we propose a framework, PALM-SADA, for multi-script recognition. PALM-SADA integrates visual and linguistic processing using a hybrid CNN-Transformer architecture. The framework introduces syllable analysis augmentation techniques, consisting of two main components. (1) Monosyllabic synthesis generates single-syllable words by combining glyphs from isolated glyph datasets using predefined grammar forms. And (2) Polysyllabic synthesis creates longer, grammatically correct text sequences by combining monosyllabic words and isolated glyphs. To ensure linguistic integrity, grammar forms and vocabulary lists of complete words were meticulously designed and validated, preserving the linguistic characteristics of the augmented data. For recognition, PALM-SADA employs a hybrid CNN-Transformer network that enhances both feature extraction and transcription accuracy. CNN layers capture local features, while Transformer layers model global dependencies. A Transformer-based decoder further refines transcriptions by leveraging contextual relationships within the text. Experiments conducted on the ICFHR 2018 contest datasets demonstrate that PALM-SADA significantly outperforms existing methods.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"195 ","pages":"Pages 8-15"},"PeriodicalIF":3.9,"publicationDate":"2025-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144068522","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ying Guo , Yuan Li , Kexin Zhen , Bingxin Li , Jie Liu
{"title":"CAMFND: Cross-modal adaptive-aware learning for multimodal fake news detection","authors":"Ying Guo , Yuan Li , Kexin Zhen , Bingxin Li , Jie Liu","doi":"10.1016/j.patrec.2025.02.035","DOIUrl":"10.1016/j.patrec.2025.02.035","url":null,"abstract":"<div><div>Recently, there has been a growing focus on the automatic identification of multimodal fake news detection. A fundamental challenge of multimodal fake news detection lies in the inherent semantic ambiguity across different content modalities. Decisions stemming from distinct unimodal sources may exhibit discrepancies, potentially creating inconsistency with the collective insights derived from multimodal data fusion. To address this issue, we propose CAMFND: a cross-modal adaptive-aware learning framework for multi-modal fake news detection, aiming to reduce semantic ambiguities among different modalities. CAMFND consists of (1) a cross-modal alignment module to transform the heterogeneous unimodality features into a shared semantic space, (2) a cross-modal adaptive-interactive module to capture the semantic correlation and consistency, computed by the multi-modal gated fusion unit, (3) a cross-modal adaptive-selective module to decide the semantic meaning or bias, guided by the multi-modal semantic matching score. CAMFND enhances the fake news detection by intelligently and dynamically combining features from uni-modality and identifying correlations across different modalities. It leverages unimodal features in scenarios with low cross-modal ambiguity, while utilizing cross-modal correlations in cases of high cross-modal uncertainty. The experimental results show that CAMFND significantly surpasses prior methodologies and sets new benchmarks on both English Twitter and Chinese Weibo datasets, marking a notable advancement in performance.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"195 ","pages":"Pages 1-7"},"PeriodicalIF":3.9,"publicationDate":"2025-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144068521","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Negotiation games with structured post-hoc intents","authors":"David Warren, Mark Dras, Malcolm Ryan","doi":"10.1016/j.patrec.2025.04.029","DOIUrl":"10.1016/j.patrec.2025.04.029","url":null,"abstract":"<div><div>An important class of negotiation games that use human language do not have predefined ‘moves’: it is up to the agents in the game to define moves via natural language that will lead them towards their goal. In the context of other games, however, a notion of <em>intents</em> — structured moves from a predefined set — have been found to be useful. In this paper, we show that it is possible to define and learn <em>post-hoc intents</em> in a practical way for AI agents in a negotiation game, using a text-to-text Transformer model; we show that this improves agent performance, and further allows the definition of a wider range of agents for training.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"195 ","pages":"Pages 23-29"},"PeriodicalIF":3.9,"publicationDate":"2025-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144071571","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yunzuo Zhang, Puze Yu, Yaoge Xiao, Shuangshuang Wang
{"title":"Pyramid-structured multi-scale transformer for efficient semi-supervised video object segmentation with adaptive fusion","authors":"Yunzuo Zhang, Puze Yu, Yaoge Xiao, Shuangshuang Wang","doi":"10.1016/j.patrec.2025.04.027","DOIUrl":"10.1016/j.patrec.2025.04.027","url":null,"abstract":"<div><div>In recent years, Transformer-based methods have demonstrated promising performance in the field of semi-supervised video object segmentation. However, these methods require the maintenance of a memory frame from memory bank, which leads to an exponential increase in GPU memory requirements as the length of the video increases, necessitating updates of the memory bank every few frames. We propose a novel approach based on a multi-scale pyramid structure for object association with transformers, which can effectively encode both global and local features at different granularity levels, while significantly reducing GPU memory requirements as video length increases, thus maintaining high inference speed. To effectively integrate multi-scale ID embeddings and video frame embeddings, rather than simply overlaying the original features through addition, we have designed an adaptive fusion module to address this issue. We conducted extensive experiments on four commonly used VOS benchmarks (including YouTube-VOS 2018 and 2019 Val, DAVIS-2017,and LVOS), evaluating various variants of AOT. Our method outperformed state-of-the-art competitors and consistently demonstrated superior efficiency and scalability across all four benchmark tests.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"194 ","pages":"Pages 48-54"},"PeriodicalIF":3.9,"publicationDate":"2025-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143941155","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yang Lu , Zifan Yang , Zilu Zhou , Gaowei Zhang , Xiaoheng Jiang , Mingliang Xu
{"title":"From a perceptual perspective: No-Reference Image Quality Assessment using Dual Perception Hybrid Network","authors":"Yang Lu , Zifan Yang , Zilu Zhou , Gaowei Zhang , Xiaoheng Jiang , Mingliang Xu","doi":"10.1016/j.patrec.2025.04.035","DOIUrl":"10.1016/j.patrec.2025.04.035","url":null,"abstract":"<div><div>The goal of the No-Reference Image Quality Assessment is to simulate human perception of image quality without the availability of a reference image. Previous research has largely focused on extracting content features from distorted images, perceiving distortion types pixel-by-pixel or in blocks while neglecting the establishment of a relationship between the quality of distorted and reference images. To address this issue, this paper proposes a Dual Perception Hybrid Network (DPHN), where dual perception refers to the parallel extraction of quality and content features. Quality perception involves constructing a quality relationship by leveraging the difference between the features of the distorted image and the reconstructed image, while content perception focuses on learning the content information of the distortion itself from the distorted image. To demonstrate the effectiveness of the proposed Dual Perception Fusion Network, we utilised four representative IQA datasets. Extensive experimental results show that the proposed network exhibits promising performance. Our code will be available at <span><span>https://github.com/YZFzzu/DPHN</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"194 ","pages":"Pages 55-61"},"PeriodicalIF":3.9,"publicationDate":"2025-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143949014","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Graph Neural Networks for 3D facial morphology: Assessing the effectiveness of anthropometric and automated landmark detection","authors":"Giuseppe Maurizio Facchi , Giuliano Grossi , Alessandro D’Amelio , Francesco Agnelli , Chiarella Sforza , Gianluca Martino Tartaglia , Raffaella Lanzarotti","doi":"10.1016/j.patrec.2025.04.028","DOIUrl":"10.1016/j.patrec.2025.04.028","url":null,"abstract":"<div><div>This study investigates the potential of Graph Neural Networks (GNNs) for analyzing 3D facial morphology, leveraging facial landmarks as graph nodes to capture the intrinsic structure of 3D face scans. This research evaluates the effectiveness of three distinct approaches for defining graph vertices by associating them with: (1) a well-established set of anthropometric landmarks identified through tactile assessment, widely considered the gold standard in facial anthropometry; (2) automatically detected 3D facial keypoints estimated using advanced algorithms; and (3) geometry-based random point cloud sub-sampling via farthest point sampling (FPS). To evaluate the effectiveness of GNNs and facial landmarks in capturing and representing meaningful morphological patterns, the study employs two benchmark tasks: gender classification and age regression. Extensive experiments across various GNN architectures and three datasets — each presenting diverse and challenging conditions — demonstrate that semantically meaningful landmarks, whether anthropometric or automatically detected, consistently outperform non-semantic random samples in both tasks and across all datasets. These results highlight the crucial role of semantic contextualization in graph-based facial analysis. Notably, models utilizing automatically detected facial keypoints achieved performance comparable to those based on manually annotated anthropometric landmarks, offering a scalable and cost-effective alternative without compromising accuracy. These findings support the integration of automated GNN-based methodologies into a wide range of applications, including clinical diagnosis, forensic analysis, and biometric recognition.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"195 ","pages":"Pages 16-22"},"PeriodicalIF":3.9,"publicationDate":"2025-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144071570","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Enhancing random surface anomaly detection in real-world using a four-stage one-class approach","authors":"Pulin Li , Guocheng Wu , Yanjie Zhou , Jiewu Leng","doi":"10.1016/j.patrec.2025.05.002","DOIUrl":"10.1016/j.patrec.2025.05.002","url":null,"abstract":"<div><div>Defect detection and localization are critical for quality control in manufacturing, yet existing algorithms and models trained on laboratory datasets often fail in real industrial scenarios due to their static nature, especially in non-mass production. Moreover, limited and heterogeneous defective samples, coupled with costly human annotation, highlight the need for unsupervised methods relying solely on normal images. To address these challenges, we propose the Random Surface Anomaly Detection (RSAD) model, a four-stage one-class anomaly detection and localization approach. Initially, leveraging embedding-based techniques, we introduce transfer learning with a pretrained ImageNet network in extracting locally aggregated features. Next, adapter tuning is applied to transfer these features into the industrial domain, reducing bias towards natural images. Additionally, random Gaussian noise is introduced into normal feature representations within the feature space and a discriminator then scores feature normality. Finally, experiments on the MPDD dataset and other benchmarks, demonstrate the RSAD model's state-of-the-art (SOTA) performance in anomaly detection, validating its trustworthiness in real-world manufacturing environments.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"194 ","pages":"Pages 32-40"},"PeriodicalIF":3.9,"publicationDate":"2025-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143934618","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}