Zhaoxin Su , Gang Huang , Zhou Zhou , Yongfu Li , Sanyuan Zhang , Wei Hua
{"title":"Improving generative trajectory prediction via collision-free modeling and goal scene reconstruction","authors":"Zhaoxin Su , Gang Huang , Zhou Zhou , Yongfu Li , Sanyuan Zhang , Wei Hua","doi":"10.1016/j.patrec.2024.12.004","DOIUrl":"10.1016/j.patrec.2024.12.004","url":null,"abstract":"<div><div>In the context of bird’s-eye view traffic scenarios, accurate prediction of future trajectories for various traffic agents (e.g., pedestrians, vehicles) is crucial for driving safety and decision planning. In this paper, we present a generative trajectory prediction framework that incorporates both collision-free modeling and additional reconstruction for future scene context. For the social encoder, we leverage the collision prior by incorporating collision-free constraints (CFC). We construct a social model composed of multiple graphs, where each graph points to the collision prior calculated from uniform direction sampling. In the scene encoder, we employ an attention module to establish connections between trajectory motion and all scene image pixels. Additionally, we reconstruct a goal response map (GRM) aligned with the intended one, thereby enhancing the scene representations. Experiments conducted on nuScenes and ETH/UCY datasets demonstrate the superiority of the proposed framework, achieving a 13.8% reduction in off-road rate on nuScenes and an average 13.2% reduction in collision rate on ETH/UCY datasets.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"188 ","pages":"Pages 117-124"},"PeriodicalIF":3.9,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143149693","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alexandra-Georgiana Andrei , Mihai Gabriel Constantin , Mara Graziani , Henning Müller , Bogdan Ionescu
{"title":"Privacy preserving histopathological image augmentation with Conditional Generative Adversarial Networks","authors":"Alexandra-Georgiana Andrei , Mihai Gabriel Constantin , Mara Graziani , Henning Müller , Bogdan Ionescu","doi":"10.1016/j.patrec.2024.12.014","DOIUrl":"10.1016/j.patrec.2024.12.014","url":null,"abstract":"<div><div>Deep learning approaches for histopathology image processing and analysis are gaining increasing interest in the research field, and this comes with a demand to extract more information from images. Pathological datasets are relatively small mainly due to confidentiality of medical data and legal questions, data complexity and labeling costs. Typically, a large number of annotated images for different tissue subtypes are required as training samples to automate the learning algorithms. In this paper, we present a latent-to-image approach for generating synthetic images by applying a Conditional Deep Convolutional Generative Adversarial Network for generating images of human colorectal cancer and healthy tissue. We generate high-quality images of various tissue types that preserve the general structure and features of the source classes, and we investigate an important yet overlooked aspect of data generation: ensuring privacy-preserving capabilities. The quality of these images is evaluated through perceptual experiments with pathologists and the Fréchet Inception Distance (FID) metric. Using the generated data to train classifiers improved MobileNet’s accuracy by 35.36%, and also enhanced the accuracies of DenseNet, ResNet, and EfficientNet. We further validated the robustness and versatility of our model on a different dataset, yielding promising results. Additionally, we make a novel contribution by addressing security and privacy concerns in personal medical image data, ensuring that training medical images “fingerprints” are not contained in the synthetic images generated with the model we propose.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"188 ","pages":"Pages 185-192"},"PeriodicalIF":3.9,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143150168","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Dataset condensation with coarse-to-fine regularization","authors":"Hyundong Jin, Eunwoo Kim","doi":"10.1016/j.patrec.2024.12.018","DOIUrl":"10.1016/j.patrec.2024.12.018","url":null,"abstract":"<div><div>State-of-the-art artificial intelligence models heavily rely on datasets with large numbers of samples, necessitating substantial memory allocation for data storage and high computational costs for model training. To alleviate storage and computational overheads, dataset condensation has recently gained attention. This approach encapsulates large samples into a more compact sample set while preserving the accuracy of a network trained on an entire sample set. Existing methods focus on aligning the output logits or network parameters trained on synthetic images with those of networks trained on real images. However, these approaches fail to encapsulate the diverse information because of their inability to account for relationships between synthetic images, leading to information redundancy between multiple synthetic images. To address these issues, we exploit the relationships among synthetic samples. This allows us to create diverse representations of synthetic images across distinct classes and to encourage diversity within the same class. We further promote diverse representations between synthetic image sub-regions. Experimental results with various datasets demonstrate that our method outperforms competitors by up to 12.2%. Moreover, the networks, which were not encountered during the condensation process, and were trained using our synthesized dataset, outperform other methods.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"188 ","pages":"Pages 178-184"},"PeriodicalIF":3.9,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143150167","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Cascading enhancement representation for face anti-spoofing","authors":"Yimei Ma, Yangwei Dong, Jianjun Qian, Jian Yang","doi":"10.1016/j.patrec.2024.11.031","DOIUrl":"10.1016/j.patrec.2024.11.031","url":null,"abstract":"<div><div>Face anti-spoofing (FAS) is the first security line of defense in face recognition systems. The majority of current methods focus on distinguishing the live faces from spoof faces by designing an adaptive network with auxiliary information to enhance the feature discrimination, which shows that how to achieve the discriminative representation is also vital to solve FAS task. In this paper, motivated by the idea of cascading enhancement, we propose a novel cascading enhancement representation network (CERN) for effective FAS. Specifically, the CERN utilizes two branches to achieve multi-level feature in cascading enhancement feature extraction stage. The first branch employs the backbone network to concatenate the multi-scale feature in conjunction with attention modules. The second branch utilizes the shared attention modules to enhance the input space for learning the multi-level refinement features. In cascading enhancement feature fusion stage, we transmit the high-level feature to the middle level (mid-level) for enhancing the mid-level representation. The novel mid-level feature is then used to enhance the low-level feature. Moreover, the weight map learning scheme is proposed to further enhance the discrimination of the predicted binary map. Additionally, we use meta-learning to extend our CERN for solving cross-database testing. Experiments on five benchmark databases demonstrate the effectiveness of our methods against the state-of-art methods.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"188 ","pages":"Pages 53-59"},"PeriodicalIF":3.9,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143150828","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Cellular spatial-semantic embedding for multi-label classification of cell clusters in thyroid fine needle aspiration biopsy whole slide images","authors":"Juntao Gao , Jing Zhang , Meng Sun , Li Zhuo","doi":"10.1016/j.patrec.2024.12.012","DOIUrl":"10.1016/j.patrec.2024.12.012","url":null,"abstract":"<div><div>Multi-label classification of cell clusters is crucial for thyroid computer-aided diagnosis. The intricate spatial configurations and multifaceted semantic annotations inherent in thyroid fine-needle aspiration biopsy whole-slide images (FNAB-WSI) pose considerable obstacles to the precise multi-label classification of cell clusters. Considering the complex spatial structures and diverse label semantics in FNAB-WSI, we propose a multi-label classification method of cell clusters using cellular spatial-semantic embedding. This method effectively processes both spatial structure and multi-label semantic information. To address the challenge posed by limited training data for hard-to-classify categories, our method partially masks easily classifiable cells within the multi-label clusters. The preprocessed cell cluster images are then fed into a weighted down-sampling improved Convolutional vision Transformer (wCvT) encoder model to extract spatial features. The probability scores for each label are subsequently obtained through a multi-layer Transformer decoder that integrates both spatial features and label semantics, thus achieving accurate multi-label classification of the cell clusters. Experiments conducted on a self-built FNAB-WSI cell cluster dataset demonstrate an optimal classification accuracy of 90.26 % mAP, surpassing the highest comparable methods by 4.96 %. Moreover, the model employs a minimal number of parameters, with only 41.91 million parameters, achieving a tradeoff between accuracy and computational efficiency. This means that the proposed method could be utilized as a swift and precise computational intelligence aid for the clinical diagnosis of thyroid cancer.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"188 ","pages":"Pages 125-132"},"PeriodicalIF":3.9,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143149692","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dongjun Hwang , Hyoseo Kim , Doyeol Baek , Hyunbin Kim , Inhye Kye , Junsuk Choe
{"title":"Curriculum learning with class-label composition for weakly supervised semantic segmentation","authors":"Dongjun Hwang , Hyoseo Kim , Doyeol Baek , Hyunbin Kim , Inhye Kye , Junsuk Choe","doi":"10.1016/j.patrec.2024.12.016","DOIUrl":"10.1016/j.patrec.2024.12.016","url":null,"abstract":"<div><div>Weakly Supervised Semantic Segmentation (WSSS) aims to build a segmentation network using only weak labels. In WSSS training using image-level labels, a classifier is trained with multi-labeled images, as the task assumes the presence of multiple classes. The classifier plays a crucial role due to its impact on the quality of the derived pseudo-masks. However, training the classifier with the multi-labeled images presents two following challenges: (1) The presence of frequently co-occurring classes (e.g. <em>chair</em> and <em>dining table</em>) introduces a spurious correlation, making it difficult for the classifier to determine the location of each class. (2) Such multi-labeled datasets often exhibit imbalanced class distributions, which can create challenges during the training process. To tackle these issues, we propose a curriculum learning strategy based on the length and frequency of class-label composition. This strategy gradually reduces the influence of images with spurious correlation between classes and ensures that classes with fewer images appear more frequently during training. Our extensive experiments demonstrate that, when applied to eight WSSS methods, our curriculum strategy consistently enhances the quality of the pseudo-labels and segmentation performances, and also reduces the required computational resources for training.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"188 ","pages":"Pages 171-177"},"PeriodicalIF":3.9,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143150165","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Intelligent evaluation of therapeutic effect of electroacupuncture moxibustion on cerebral ischemia reperfusion injury based on multimodal information fusion and neural network","authors":"Shiting Zhu , Shiting Yu , Muhadasi Tuerxunyiming","doi":"10.1016/j.patrec.2024.12.001","DOIUrl":"10.1016/j.patrec.2024.12.001","url":null,"abstract":"<div><div>Ischemic stroke and reperfusion injury pose significant challenges in treatment due to their complex pathophysiology and the difficulty of integrating multimodal brain imaging data. Electroacupuncture has shown potential in alleviating reperfusion injury by modulating physiological responses, but assessing its efficacy remains difficult. This study proposes an intelligent evaluation method for electroacupuncture efficacy by integrating multimodal information from CT and MRI images using advanced machine learning techniques. Specifically, a ResNet50-based Convolutional Neural Network (CNN) is employed, enhanced with a Convolutional Block Attention Module (CBAM) and a Multi-Scale Residual Module (MSRM) to improve feature extraction and fusion at multiple scales and multiple modal. The proposed approach effectively captures critical patterns and subtle details across different modalities, improving the accuracy of brain injury and recovery assessments. In experimental evaluations, the method achieved 97.1% accuracy and a 96.1% F1 score, demonstrating the effectiveness of the proposed method.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"188 ","pages":"Pages 164-170"},"PeriodicalIF":3.9,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143150166","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jiangshe Zhang, Lizhen Ji, Fei Gao, Mengyao Li, Chunxia Zhang, Yukun Cui
{"title":"An information-theoretic learning model based on importance sampling with application in face verification","authors":"Jiangshe Zhang, Lizhen Ji, Fei Gao, Mengyao Li, Chunxia Zhang, Yukun Cui","doi":"10.1016/j.patrec.2024.11.033","DOIUrl":"10.1016/j.patrec.2024.11.033","url":null,"abstract":"<div><div>A crucial assumption underlying the most current theory of machine learning is that the training distribution is identical to the test distribution. However, this assumption may not hold in some real-world applications. In this paper, we develop a learning model based on principles of information theory by minimizing the worst-case loss at prescribed levels of uncertainty. We reformulate the empirical estimation of the risk function and the distribution deviation constraint based on the importance sampling method. The objective of the proposed approach is to minimize the loss under maximum degradation and hence the resulting problem is a minimax problem which can be converted to an unconstrained minimum problem using the Lagrange method with the Lagrange multiplier <span><math><mi>T</mi></math></span>. We reveal that the minimization of the objective function under logarithmic transformation is equivalent to the minimization of the <span><math><mi>p</mi></math></span>-norm loss with <span><math><mrow><mi>p</mi><mo>=</mo><mfrac><mrow><mn>1</mn></mrow><mrow><mi>T</mi></mrow></mfrac></mrow></math></span>. We applied the proposed model to the face verification task, demonstrating enhanced performance both under large distribution deviations and on hard samples.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"188 ","pages":"Pages 81-87"},"PeriodicalIF":3.9,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143150838","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"MedLesSynth-LD: Lesion synthesis using physics-based noise models for robust lesion segmentation in low-data medical imaging regimes","authors":"Ramanujam Narayanan, Vaanathi Sundaresan","doi":"10.1016/j.patrec.2024.12.011","DOIUrl":"10.1016/j.patrec.2024.12.011","url":null,"abstract":"<div><div>Training models for robust lesion segmentation in medical imaging relies on the availability of sufficiently large pathological datasets and high-quality manual annotations. Hence, training such models is challenging in low-data regimes, even for localised lesions with defined boundaries, due to the lack of representation of variations in contrast, texture and sizes. In this work, we proposed a lesion simulation method, MedLesSynth-LD, to overcome the lack of diversity in localised lesion characteristics for training robust segmentation models. In MedLesSynth-LD, we used noise models inherently based on the physics involved in the acquisition of modalities to generate sufficiently realistic lesion textures by perturbing healthy tissues. Later, we localised these perturbations within masks defined by composites of ellipsoids (thus forming random shapes) and blended them with the input image with varying contrast. The lesion simulation step does not require training and can be tailored to generate defined, localised lesions to introduce sufficient variability (in size, shape, texture and contrast) in the training data pool. We evaluated the performance of a downstream lesion segmentation task using simulated lesionsfor multiple publicly available datasets across imaging modalities and organs: Brain MRI for tumour and white matter hyperintensity segmentation, liver CT for tumour segmentation, breast ultrasound for tumour segmentation, and retinal fundus imaging for exudate segmentation. Using only 75% of labelled real-world data, the proposed method significantly improved lesion segmentation compared to real data-based fully supervised training with an 16% mean increase in the Dice score (DSC) and 33% mean decrease in the normalised 95th percentile of the Hausdorff distance (HD95 (norm)). The proposed method also performed better than state-of-the-art lesion segmentation methods in low-data regimes, with an 10% higher mean DSC and a 19% mean decrease in HD95 (norm). The source code is available at <span><span>https://github.com/Ramanujam-N/MedLesSynth-LD</span><svg><path></path></svg></span> [commit SHA cc2b15b].</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"188 ","pages":"Pages 155-163"},"PeriodicalIF":3.9,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143150829","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Simone Alberto Peirone , Gabriele Goletto , Mirco Planamente, Andrea Bottino, Barbara Caputo, Giuseppe Averta
{"title":"Egocentric zone-aware action recognition across environments","authors":"Simone Alberto Peirone , Gabriele Goletto , Mirco Planamente, Andrea Bottino, Barbara Caputo, Giuseppe Averta","doi":"10.1016/j.patrec.2024.12.008","DOIUrl":"10.1016/j.patrec.2024.12.008","url":null,"abstract":"<div><div>Human activities exhibit a strong correlation between actions and the places where these are performed, such as washing something at a sink. More specifically, in daily living environments we may identify particular locations, hereinafter named <em>activity-centric zones</em>, which may afford a set of homogeneous actions. Their knowledge can serve as a prior to favor vision models to recognize human activities. However, the appearance of these zones is scene-specific, limiting the transferability of this prior information to unfamiliar areas and domains. This problem is particularly relevant in egocentric vision, where the environment takes up most of the image, making it even more difficult to separate the action from the context. In this paper, we discuss the importance of decoupling the domain-specific appearance of activity-centric zones from their universal, domain-agnostic representations, and show how the latter can improve the cross-domain transferability of Egocentric Action Recognition (EAR) models. We validate our solution on the EPIC-Kitchens-100 and Argo1M datasets.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"188 ","pages":"Pages 140-147"},"PeriodicalIF":3.9,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143150164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}