Quentin Lebon , Josselin Lefèvre , Jean Cousty , Benjamin Perret
{"title":"Incremental watershed cuts: Interactive segmentation algorithm with parallel strategy","authors":"Quentin Lebon , Josselin Lefèvre , Jean Cousty , Benjamin Perret","doi":"10.1016/j.patrec.2024.12.005","DOIUrl":"10.1016/j.patrec.2024.12.005","url":null,"abstract":"<div><div>In this article, we design an incremental method for computing seeded watershed cuts for interactive image segmentation. We propose an algorithm based on the hierarchical image representation called the binary partition tree to compute a seeded watershed cut. Additionally, we leverage properties of minimum spanning forests to introduce a parallel method for labeling a connected component. We show that those algorithms fits perfectly in an interactive segmentation process by handling user interactions, seed addition or removal, in linear time with respect to the number of affected pixels. Run time comparisons with several state-of-the-art interactive and non-interactive watershed methods show that the proposed method can handle user interactions much faster than previous methods with a significant speedup ranging from 10 to 60 on both 2D and 3D images, thus improving the user experience on large images.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"189 ","pages":"Pages 256-263"},"PeriodicalIF":3.9,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143519646","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Inter-separability and intra-concentration to enhance stochastic neural network adversarial robustness","authors":"Omar Dardour , Eduardo Aguilar , Petia Radeva , Mourad Zaied","doi":"10.1016/j.patrec.2025.02.028","DOIUrl":"10.1016/j.patrec.2025.02.028","url":null,"abstract":"<div><div>It has been shown that Deep Neural Networks can be easily fooled by adding an imperceptible noise termed as adversarial examples. To address this issue, in this paper, we propose a defense method called Inter-Separability and Intra-Concentration Stochastic Neural Networks (ISIC-SNN). The suggested ISIC-SNN method learns to enlarge between different label representations using label embedding and a designed inter-separability loss. It introduces uncertainty in the features latent space using the variational information bottleneck method and enhances compactness in stochastic features using intra-concentration loss. Finally, it uses dot-product similarity between stochastic feature representations and label embedding to classify features. ISIC-SNN learns in standard training which is much more efficient than adversarial training. Experiments on datasets SVHN, CIFAR-10 and CIFAR-100 demonstrate the superior defensive capability of the proposed method compared to various SNNs defensive methods.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"191 ","pages":"Pages 1-7"},"PeriodicalIF":3.9,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143534496","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Editorial of the special section: CIARP 2023","authors":"Inês Domingues, Verónica Vasconcelos, Simão Paredes","doi":"10.1016/j.patrec.2024.12.013","DOIUrl":"10.1016/j.patrec.2024.12.013","url":null,"abstract":"","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"189 ","pages":"Pages 239-240"},"PeriodicalIF":3.9,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143518960","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Interactive shape estimation for densely cluttered objects","authors":"Jiangfan Ran, Haibin Yan","doi":"10.1016/j.patrec.2025.02.026","DOIUrl":"10.1016/j.patrec.2025.02.026","url":null,"abstract":"<div><div>Accurately recognizing the shape of objects in dense and cluttered scenes is important for robots to perform a variety of manipulation tasks, such as grasping and packing. However, the performance of previous shape estimation methods is not satisfactory due to the heavy occlusion between objects in dense clutter. In this paper, we propose an interactive exploration framework to estimate the shape of densely cluttered objects. Our framework utilizes pixel-wise uncertainty to generate efficient interactions, allowing to achieve a better trade-off between the shape estimation accuracy and the interaction cost. Specifically, the extracted features are utilized as network weights to predict the confidence of each proposal located on the surface of the objects. Proposals with higher confidence are considered reliable results for shape estimation. Meanwhile, we obtain the uncertainty of shape and scale estimation based on the confidence of each proposal, and further propose the adaptive fusion strategy to construct the pixel-wise estimation uncertainty height map. In addition, our proposed interaction strategy leverages the uncertainty height map to generate effective interaction actions to significantly improve the shape estimation accuracy for severely occluded objects. Therefore, the optimal accuracy-efficiency trade-off for shape estimation in dense clutter is achieved by iterating the shape estimation and interaction actions. Extensive experimental results verify the effectiveness of the proposed approach. Under challenging cases, the proposed approach has 66.7% and 52.0% less average Chamfer distance than direct estimation and random interaction, respectively.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"191 ","pages":"Pages 8-14"},"PeriodicalIF":3.9,"publicationDate":"2025-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143534494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Elena Ryumina , Dmitry Ryumin , Alexandr Axyonov , Denis Ivanko , Alexey Karpov
{"title":"Multi-corpus emotion recognition method based on cross-modal gated attention fusion","authors":"Elena Ryumina , Dmitry Ryumin , Alexandr Axyonov , Denis Ivanko , Alexey Karpov","doi":"10.1016/j.patrec.2025.02.024","DOIUrl":"10.1016/j.patrec.2025.02.024","url":null,"abstract":"<div><div>Automatic emotion recognition techniques are critical to natural human–computer interaction. However, current methods suffer from limited applicability due to their tendency to overfit on single-corpus datasets. It reduces real-world effectiveness of the methods when faced with new unseen corpora. We propose the first multi-corpus multimodal emotion recognition method with high generalizability evaluated through a leave-one-corpus-out protocol. The method uses three fine-tuned encoders per modality (audio, video, and text) and a decoder employing a context-independent gated attention to combine features from all three modalities. The research is conducted on four benchmark corpora: MOSEI, MELD, IEMOCAP, and AFEW. The proposed method achieves the state-of-the-art results on these corpora and establishes the first baseline for multi-corpus studies. We demonstrate that due to the MELD rich emotional expressiveness across three modalities, the models trained on it exhibit the best generalization ability when applied to other corpora used. We also reveal that the AFEW annotation better correlates with the annotations of MOSEI, MELD, and IEMOCAP, as well as shows the best cross-corpus performance as it is consistent with the widely-accepted theories of basic emotions.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"190 ","pages":"Pages 192-200"},"PeriodicalIF":3.9,"publicationDate":"2025-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143521215","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Man2Marine : Marine mammal sound classification in small samples by transfer learning from human sound data","authors":"Qianglong Yi, Chenggang Xie, Donghai Guan, Weiwei Yuan","doi":"10.1016/j.patrec.2025.02.025","DOIUrl":"10.1016/j.patrec.2025.02.025","url":null,"abstract":"<div><div>The lack of annotated training data for marine mammal sounds poses a challenge for the use of large-scale neural network models trained in a supervised manner. Consequently, previous studies have primarily focused on classifying a limited number of species due to the availability of sufficient data. Drawing inspiration from the overlapping frequencies of human voice and marine mammal sounds, we propose a solution that utilizes large amounts of unannotated human speech data for self-supervised training of audio, followed by fine-tuning the model using marine mammal sounds. Experiments on three different datasets compare the proposed method with leading models in the voiceprint field, demonstrating that it achieves the best results, with an average classification accuracy of 91%. Furthermore, to address the issue of over-parameterization in pre-trained models, we employ knowledge distillation techniques to condense model parameters. This process helps balance the model’s parameters with classification accuracy. The final model’s parameters are reduced by 99.97% while maintaining a classification accuracy comparable to the original model at 94%. This research showcases the successful application of large amounts of human voice data in the field of bioacoustics. The method significantly reduces the cost of acquiring marine mammal sound data and offers a promising approach for marine mammal sound classification in practical applications.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"190 ","pages":"Pages 185-191"},"PeriodicalIF":3.9,"publicationDate":"2025-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143521214","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alessandro Galdelli , Gagan Narang , Rocco Pietrini , Micol Zazzarini , Andrea Fiorani , Anna Nora Tassetti
{"title":"Multimodal AI-enhanced ship detection for mapping fishing vessels and informing on suspicious activities","authors":"Alessandro Galdelli , Gagan Narang , Rocco Pietrini , Micol Zazzarini , Andrea Fiorani , Anna Nora Tassetti","doi":"10.1016/j.patrec.2025.02.022","DOIUrl":"10.1016/j.patrec.2025.02.022","url":null,"abstract":"<div><div>Ship detection using remote sensing and data from tracking devices like Automatic Identification System (AIS) play a critical role in maritime surveillance, supporting security, fisheries management, and efforts to combat illegal activities. However, challenges such as varying ship sizes, complex backgrounds, and intentional deactivation of AIS hinder accurate mapping. This study proposes a novel multimodal framework that integrates Sentinel-1 Synthetic Aperture Radar, Sentinel-2 and higher resolution optical imagery. It features an enhanced deep learning-based ship detection model combined with an AIS matchmaking algorithm to detect and cross-reference potentially suspicious maritime activities. The detection model is based on an enhanced You Only Look Once architecture, optimized for identifying small vessels in cluttered and noisy image backgrounds. The model achieves superior performance, surpassing state-of-the-art accuracy on multiple public datasets while reducing training time by 12% compared to baseline models. To ensure transparency within the pipeline, <em>Eigen-CAM</em> explainability techniques were employed, while <span><math><mrow><mi>C</mi><msub><mrow><mi>O</mi></mrow><mrow><mn>2</mn></mrow></msub></mrow></math></span> emissions were minimized during training using <em>CodeCarbon</em>, aligning the process with environmentally sustainable practices. Finally, the effectiveness of the pipeline was validated in a case study, successfully identifying potential ‘dark vessels’ and highlighting their possible involvement in suspicious activities.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"191 ","pages":"Pages 15-22"},"PeriodicalIF":3.9,"publicationDate":"2025-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143548623","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chengyuan Ke , Sheng Liu , Yuan Feng , Shengyong Chen
{"title":"Selective directed graph convolutional network for skeleton-based action recognition","authors":"Chengyuan Ke , Sheng Liu , Yuan Feng , Shengyong Chen","doi":"10.1016/j.patrec.2025.02.020","DOIUrl":"10.1016/j.patrec.2025.02.020","url":null,"abstract":"<div><div>Skeleton-based action recognition has gained significant attention due to the lightweight and robust nature of skeleton representations. However, the feature extraction process often misses subtle action cues, making it challenging to differentiate between similar actions and leading to misclassification. To address this issue, we propose a Selective Directed Graph Convolutional Network (SD-GCN) that decouples features at varying granularities to enhance sensitivity to subtle actions. Specifically, we introduce a Dynamic Topology Generation (DTG) module, which dynamically constructs a new topological structure by focusing on key local joints. This reduces the influence of dominant global features on subtle ones, thereby amplifying fine-grained motion features and improving the distinction between similar actions. Additionally, we present an Attention-guided Group Fusion (AGF) module that selectively evaluates and fuses local motion features of the skeleton while incorporating global skeletal features to capture contextual relationships among all joints. We validated the effectiveness of our method on three benchmark datasets, and experimental results demonstrate that our model not only outperforms existing methods in terms of accuracy but also excels at distinguishing similar actions.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"190 ","pages":"Pages 141-146"},"PeriodicalIF":3.9,"publicationDate":"2025-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143488384","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Emanuele Nardone, Tiziana D’Alessandro, Claudio De Stefano, Francesco Fontanella, Alessandra Scotto di Freca
{"title":"A Bayesian network combiner for multimodal handwriting analysis in Alzheimer’s disease detection","authors":"Emanuele Nardone, Tiziana D’Alessandro, Claudio De Stefano, Francesco Fontanella, Alessandra Scotto di Freca","doi":"10.1016/j.patrec.2025.02.019","DOIUrl":"10.1016/j.patrec.2025.02.019","url":null,"abstract":"<div><div>Alzheimer’s disease, recognized as the most widespread neurodegenerative disorder worldwide, strongly affects the cognitive ability of patients. The cognitive impairments range from mild to severe and are a risk factor for Alzheimer’s disease. They have profound implications for individuals, even as they maintain some daily functionality. Previous studies proposed a protocol involving handwriting tasks as a potential diagnostic tool for predicting the symptoms of Alzheimer’s disease. Literature reveals that the potential of multimodal handwriting analysis, leveraging data from multiple handwriting tasks, has not been fully explored. Thus, we propose a two-stage multimodal approach for Alzheimer’s disease detection using handwriting data derived from the protocol mentioned above, including 25 tasks. In the first stage, static and dynamic handwriting features are extracted and fused with the subject’s personal information. Then, the data obtained for each task are used to train a single classifier, providing task-specific predictions. Thus, for each subject, 25 different predictions are provided by the whole system. In the second stage, a Bayesian Network is used to model task interdependencies and to select, via the Markov Blanket, the task subset conditionally dependent on the class label. The experimental findings demonstrate that the proposed multimodal combining classifiers approach outperforms single-task classifiers and other ensemble methods. The proposed approach achieved the highest accuracy (86.98%) by using the Majority Vote method for the tasks included in the Markov Blanket selection.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"190 ","pages":"Pages 177-184"},"PeriodicalIF":3.9,"publicationDate":"2025-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143521318","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}