Jose M. Saavedra , Christopher Stears , Waldo Campos
{"title":"Achieving high performance on sketch-based image retrieval without real sketches for training","authors":"Jose M. Saavedra , Christopher Stears , Waldo Campos","doi":"10.1016/j.patrec.2025.04.018","DOIUrl":"10.1016/j.patrec.2025.04.018","url":null,"abstract":"<div><div>Sketch-based image retrieval (SBIR) has become an attractive area in computer vision. Along with the advances in deep learning, we have seen more sophisticated models for SBIR that have shown increasingly better results. However, these models are still based on supervised learning strategies, requiring the availability of real sketch-photo pairs. Having a paired dataset is impractical in real environments (e.g. eCommerce), which can limit the massification of this technology. Therefore, based on advances in foundation models for extracting highly semantic features from images, we propose S3BIR-DINOv2, a self-supervised SBIR model using pseudo-sketches to address the absence of real sketches for training, learnable vectors to allow the model to hold only one encoder for processing the underlying two image modalities, contrastive learning and an adapted DINOv2 as the visual encoder. Our experiments show our model performs outstandingly in diverse public datasets without requiring real sketches for training. We reach an overall mAP of 61.10% in Flickr15K and 44.37% in the eCommerce dataset.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"193 ","pages":"Pages 94-100"},"PeriodicalIF":3.9,"publicationDate":"2025-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143882752","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chunshi Wang , Chuan Xiong , Bin Zhao , Shuxue Ding
{"title":"CycleMatch: Cyclic pseudo-labeling distillation in semi-supervised medical image segmentation","authors":"Chunshi Wang , Chuan Xiong , Bin Zhao , Shuxue Ding","doi":"10.1016/j.patrec.2025.04.014","DOIUrl":"10.1016/j.patrec.2025.04.014","url":null,"abstract":"<div><div>In this study, we present a semi-supervised medical image segmentation framework called CycleMatch, which aims to tackle the dependency of fully supervised methods on a large amount of labeled data. By integrating a cyclic pseudo-label distillation mechanism with image-level and feature-level perturbations, CycleMatch effectively leverages unlabeled data to enhance model performance and robustness. Experimental results demonstrate that CycleMatch outperforms existing semi-supervised methods across various data annotation ratios, particularly excelling in scenarios with limited labeled data. Additionally, an in-depth analysis of feature perturbation types and parameter choices further validates CycleMatch’s effectiveness and adaptability in handling different medical image datasets. Overall, CycleMatch offers a new solution for medical image segmentation, showcasing the potential for achieving efficient and accurate segmentation even with limited data.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"193 ","pages":"Pages 135-141"},"PeriodicalIF":3.9,"publicationDate":"2025-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143888126","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Cross-erasure enhanced network for occluded person re-identification","authors":"Yunzuo Zhang, Yuehui Yang, Weili Kang, Jiawen Zhen","doi":"10.1016/j.patrec.2025.04.015","DOIUrl":"10.1016/j.patrec.2025.04.015","url":null,"abstract":"<div><div>Occluded person re-identification is one of the most challenging tasks in safety monitoring. Most existing methods for occluded person re-identification rely on external auxiliary models, which cannot handle non-target pedestrian occlusions and ignore the contextual information of pedestrian images. To address the above issues, we propose a cross-erasure enhanced network (CENet) for occluded person re-identification. To be specific, we propose a feature map cross-erasure module (FMCM) that can simulate obstacle occlusion and non-target pedestrian occlusion in real scenes by erasing feature maps. Meanwhile, we design an occluded-aware mixed attention module (OMAM), which empowers the network to efficiently capture features from non-occluded areas. Finally, we propose a full-view enhancement module (FEM) to extract discriminative features of pedestrian images by parsing the contextual information of the images. Comprehensive experimental outcomes on both occluded and holistic datasets affirm the effectiveness of our method.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"193 ","pages":"Pages 108-114"},"PeriodicalIF":3.9,"publicationDate":"2025-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143881965","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Simulated annealing-based text clustering","authors":"Nacim Fateh Chikhi","doi":"10.1016/j.patrec.2025.04.019","DOIUrl":"10.1016/j.patrec.2025.04.019","url":null,"abstract":"<div><div>Like traditional K-means, the main drawback of spherical K-means is its high sensitivity to the initialization of centroids. This issue can cause the algorithm to converge to poor local optima, resulting in clusters that do not accurately reflect the true structure of the data. In this paper, we propose two new text clustering algorithms that are less sensitive to initialization and that significantly improve clustering performance. The first algorithm employs simulated annealing to avoid getting trapped in poor local optima. The second algorithm, a relaxed version of simulated annealing, also uses randomization to escape poor local optima but requires significantly fewer computations than the first algorithm. The two algorithms are extensively evaluated across more than thirty text datasets. Experimental results demonstrate that the proposed approaches significantly outperform well-established text clustering algorithms in terms of clustering quality. Furthermore, the second algorithm is as efficient as standard spherical K-means regarding clustering speed, as both have the same time complexity. Finally, an important advantage of the proposed algorithms is that they can be applied to other domains involving directional data, such as recommender systems, social network analysis, image analysis, and more.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"193 ","pages":"Pages 128-134"},"PeriodicalIF":3.9,"publicationDate":"2025-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143881967","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Marcos Sergio Pacheco dos Santos Lima Junior , Ezequiel López-Rubio , Juan Miguel Ortiz-de-Lazcano-Lobato , José David Fernández-Rodríguez
{"title":"Enhanced generation of automatically labelled image segmentation datasets by advanced style interpreter deep architectures","authors":"Marcos Sergio Pacheco dos Santos Lima Junior , Ezequiel López-Rubio , Juan Miguel Ortiz-de-Lazcano-Lobato , José David Fernández-Rodríguez","doi":"10.1016/j.patrec.2025.04.021","DOIUrl":"10.1016/j.patrec.2025.04.021","url":null,"abstract":"<div><div>Large image datasets with annotated pixel-level semantics are necessary to train and evaluate supervised deep-learning models. These datasets are very expensive in terms of the human effort required to build them. Still, recent developments such as DatasetGAN open the possibility of leveraging generative systems to automatically synthesise massive amounts of images along with pixel-level information. This work analyses DatasetGAN and proposes a novel architecture that utilises the semantic information of neighbouring pixels to achieve significantly better performance. Additionally, the overfitting observed in the original architecture is thoroughly investigated, and modifications are proposed to mitigate it. Furthermore, the implementation has been redesigned to greatly reduce the memory requirements of DatasetGAN, and a comprehensive study of the impact of the number of classes in the segmentation task is presented.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"193 ","pages":"Pages 101-107"},"PeriodicalIF":3.9,"publicationDate":"2025-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143882753","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Moon Ye-Bin , Nam Hyeon-Woo , Wonseok Choi , Nayeong Kim , Suha Kwak , Tae-Hyun Oh
{"title":"SYNAuG: Exploiting synthetic data for data imbalance problems","authors":"Moon Ye-Bin , Nam Hyeon-Woo , Wonseok Choi , Nayeong Kim , Suha Kwak , Tae-Hyun Oh","doi":"10.1016/j.patrec.2025.04.013","DOIUrl":"10.1016/j.patrec.2025.04.013","url":null,"abstract":"<div><div>Data imbalance in training data often leads to biased predictions from trained models, which in turn causes ethical and social issues. A straightforward solution is to carefully curate training data, but given the enormous scale of modern neural networks, this is prohibitively labor-intensive and thus impractical. Inspired by recent developments in generative models, this paper explores the potential of synthetic data to address the data imbalance problem. To be specific, our method, dubbed SYNAuG, leverages synthetic data to equalize the unbalanced distribution of training data. Our experiments demonstrate that although a domain gap between real and synthetic data exists, training with SYNAuG followed by fine-tuning with a few real samples allows us to achieve impressive performance on diverse tasks with different data imbalance issues, surpassing existing task-specific methods for the same purpose.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"193 ","pages":"Pages 115-121"},"PeriodicalIF":3.9,"publicationDate":"2025-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143881966","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Samuel Repka , Bořek Reich , Fedor Zolotarev , Tuomas Eerola , Pavel Zemčík
{"title":"Mineral segmentation using electron microscope images and spectral sampling through multimodal graph neural networks","authors":"Samuel Repka , Bořek Reich , Fedor Zolotarev , Tuomas Eerola , Pavel Zemčík","doi":"10.1016/j.patrec.2025.04.012","DOIUrl":"10.1016/j.patrec.2025.04.012","url":null,"abstract":"<div><div>We propose a novel Graph Neural Network-based method for segmentation based on data fusion of multimodal Scanning Electron Microscope (SEM) images. In most cases, Backscattered Electron (BSE) images obtained using SEM do not contain sufficient information for mineral segmentation. Therefore, imaging is often complemented with point-wise Energy-Dispersive X-ray Spectroscopy (EDS) spectral measurements that provide highly accurate information about the chemical composition but that are time-consuming to acquire. This motivates the use of sparse spectral data in conjunction with BSE images for mineral segmentation. The unstructured nature of the spectral data makes most traditional image fusion techniques unsuitable for BSE-EDS fusion. We propose using graph neural networks to fuse the two modalities and segment the mineral phases simultaneously. Our results demonstrate that providing EDS data for as few as 1% of BSE pixels produces accurate segmentation, enabling rapid analysis of mineral samples. The proposed data fusion pipeline is versatile and can be adapted to other domains that involve image data and point-wise measurements.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"193 ","pages":"Pages 79-85"},"PeriodicalIF":3.9,"publicationDate":"2025-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143869163","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jugurta Montalvão, Gabriel Bastos, Rodrigo Sousa, Ataíde Gualberto
{"title":"On the representation of sparse stochastic matrices with state embedding","authors":"Jugurta Montalvão, Gabriel Bastos, Rodrigo Sousa, Ataíde Gualberto","doi":"10.1016/j.patrec.2025.04.011","DOIUrl":"10.1016/j.patrec.2025.04.011","url":null,"abstract":"<div><div>Embeddings are adjusted to allow points representing states and observations in Markov models, where conditional probabilities are approximately encoded as the exponential of (negative) distances, jointly scaled by a density factor. It is shown that the goodness of this approximation can be managed, mainly if the embedding dimension is chosen in function of entropies associated to the corresponding Markov model. Therefore, for sparse (low entropy) models, their representation as state embeddings can save memory and allow fully geometric versions of probabilistic algorithms, as the Viterbi, taken as an example in this work. Besides, evidences are also gathered in favor of potentially useful properties that emerge from the geometric representation of Markov models, such as analogies, superstates (aggregation) and semantic fields.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"193 ","pages":"Pages 71-78"},"PeriodicalIF":3.9,"publicationDate":"2025-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143869162","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiujuan Zheng, Binghang Zou, Chang Zhang, Haiyan Tu
{"title":"Remote blood pressure estimation using BVP signal features from facial videos","authors":"Xiujuan Zheng, Binghang Zou, Chang Zhang, Haiyan Tu","doi":"10.1016/j.patrec.2025.04.010","DOIUrl":"10.1016/j.patrec.2025.04.010","url":null,"abstract":"<div><div>Pulse signals contain abundant cardiovascular functional information and can be used for blood pressure estimation. Remote photoplethysmography (rPPG) technology offers a solution to obtain pulse signals from facial videos and then to achieve continuous blood pressure estimation. However, rPPG is susceptible to external factors that lead to a decrease in pulse signal quality, which directly affects the accuracy and reliability of blood pressure estimation. Therefore, this paper proposes a method that integrates advanced signal processing techniques and pulse feature analysis to improve the accuracy of video-based blood pressure estimation. First, we used an adaptive chirp mode decomposition algorithm and a waveform quality analysis algorithm based on a correlation coefficient to suppress noise interference and ensure the effectiveness of the pulse features obtained. Then, we conducted pulse signal feature selection using the mean impact value algorithm and established a blood pressure estimation model based on a BP neural network. Finally, we updated the neural network BP using the sparrow search algorithm to obtain the optimal blood pressure estimation model. Through validation on a private dataset, the results show that the proposed method can meet the blood pressure measurement standards and effectively achieve remote blood pressure estimation.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"193 ","pages":"Pages 122-127"},"PeriodicalIF":3.9,"publicationDate":"2025-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143882077","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fatih Aksu , Fabrizia Gelardi , Arturo Chiti , Paolo Soda
{"title":"Multi-stage intermediate fusion for multimodal learning to classify non-small cell lung cancer subtypes from CT and PET","authors":"Fatih Aksu , Fabrizia Gelardi , Arturo Chiti , Paolo Soda","doi":"10.1016/j.patrec.2025.04.001","DOIUrl":"10.1016/j.patrec.2025.04.001","url":null,"abstract":"<div><div>Accurate classification of histological subtypes of non-small cell lung cancer (NSCLC) is essential in the era of precision medicine, yet current invasive techniques are not always feasible and may lead to clinical complications. This study presents MINT, a Multi-stage INTermediate fusion approach to classify NSCLC subtypes from CT and PET images. Our method integrates the two modalities at different stages of feature extraction, using voxel-wise fusion to exploit complementary information across varying abstraction levels while preserving spatial correlations. We compare our method against unimodal approaches using only CT or PET images to demonstrate the benefits of modality fusion, and further benchmark it against early and late fusion techniques to highlight the advantages of intermediate fusion during feature extraction. Additionally, we compare our model with the only existing intermediate fusion method for histological subtype classification using PET/CT images. Our results demonstrate that the proposed method outperforms all alternatives across key metrics, with an accuracy and AUC equal to 0.724 and 0.681, respectively. This non-invasive approach has the potential to significantly improve diagnostic accuracy, facilitate more informed treatment decisions, and advance personalized care in lung cancer management.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"193 ","pages":"Pages 86-93"},"PeriodicalIF":3.9,"publicationDate":"2025-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143877294","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}