Florian Bley , Sebastian Lapuschkin , Wojciech Samek , Grégoire Montavon
{"title":"Explaining predictive uncertainty by exposing second-order effects","authors":"Florian Bley , Sebastian Lapuschkin , Wojciech Samek , Grégoire Montavon","doi":"10.1016/j.patcog.2024.111171","DOIUrl":"10.1016/j.patcog.2024.111171","url":null,"abstract":"<div><div>Explainable AI has brought transparency to complex ML black boxes, enabling us, in particular, to identify which features these models use to make predictions. So far, the question of how to explain predictive uncertainty, i.e., why a model ‘doubts’, has been scarcely studied. Our investigation reveals that predictive uncertainty is dominated by <em>second-order effects</em>, involving single features or product interactions between them. We contribute a new method for explaining predictive uncertainty based on these second-order effects. Computationally, our method reduces to a simple covariance computation over a collection of first-order explanations. Our method is generally applicable, allowing for turning common attribution techniques (LRP, Gradient<span><math><mrow><mspace></mspace><mo>×</mo><mspace></mspace></mrow></math></span>Input, etc.) into powerful second-order uncertainty explainers, which we call CovLRP, CovGI, etc. The accuracy of the explanations our method produces is demonstrated through systematic quantitative evaluations, and the overall usefulness of our method is demonstrated through two practical showcases.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"160 ","pages":"Article 111171"},"PeriodicalIF":7.5,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142722843","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zitong Zhang , Xiaojun Chen , Chen Wang , Ruili Wang , Wei Song , Feiping Nie
{"title":"A Structured Bipartite Graph Learning method for ensemble clustering","authors":"Zitong Zhang , Xiaojun Chen , Chen Wang , Ruili Wang , Wei Song , Feiping Nie","doi":"10.1016/j.patcog.2024.111133","DOIUrl":"10.1016/j.patcog.2024.111133","url":null,"abstract":"<div><div>Given a set of base clustering results, conventional bipartite graph-based ensemble clustering methods typically require computing a sample-cluster similarity matrix from each base clustering result. These matrices are then either concatenated or averaged to form a bipartite weight matrix, which is used to create a bipartite graph. Graph-based partition techniques are subsequently applied to this graph to obtain the final clustering result. However, these methods often suffer from unreliable base clustering results, making it challenging to identify a clear cluster structure due to the variations in cluster structures across the base results. In this paper, we propose a novel Structured Bipartite Graph Learning (SBGL) method. Our approach begins by computing a sample-cluster similarity matrix from each base clustering result and constructing a base bipartite graph from each of these matrices. We assume these base bipartite graphs contain a set of latent clusters and project them into a set of sample-latent-cluster bipartite graphs. These new graphs are then ensembled into a bipartite graph with a distinct cluster structure, from which the final set of clusters is derived. Our method allows for different numbers of clusters across base clusterings, leading to improved performance. Experimental results on both synthetic and real-world datasets demonstrate the superior performance of our new method.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"160 ","pages":"Article 111133"},"PeriodicalIF":7.5,"publicationDate":"2024-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142722010","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Deep learning-enhanced environment perception for autonomous driving: MDNet with CSP-DarkNet53","authors":"Xuyao Guo, Feng Jiang, Quanzhen Chen, Yuxuan Wang, Kaiyue Sha, Jing Chen","doi":"10.1016/j.patcog.2024.111174","DOIUrl":"10.1016/j.patcog.2024.111174","url":null,"abstract":"<div><div>Implementing environmental perception in intelligent vehicles is a crucial application, but the parallel processing of numerous algorithms on the vehicle side is complex, and their integration remains a critical challenge. To address this problem, this paper proposes a multitask detection algorithm Multitask Detection Network (MDNet) based on Cross Stage Partial Networks with Darknet53 Backbone (CSP-DarkNet53) with high feature extraction capability, which can simultaneously detect vehicles, pedestrians, traffic lights, traffic signs, and bicycles as well as lane lines. MDNet obtains exceptional results in multitask scenarios by employing innovative architectural designs consisting of a Feature Extraction Module, Target-level Branches, and Pixel-level Branches. The feature extraction module proposes an improved CSPPF structure to extract features more efficiently for three tasks, facilitating MDNet's capacity. The target-level branch suggests PFPN, which combines features from the backbone network, and the pixel-level branch utilizes a primary feature fusion network and an enhanced C2F_Faster method to spot lane lines more precisely. By incorporating these designs, MDNet's performance in complex environments is enhanced significantly. The algorithm underwent testing on the Berkeley DeepDrive 100K (BDD100K) and Cityscapes datasets, in which it could identify traffic targets and lane lines in numerous challenging settings, resulting in a 9.8 % measure of improvement in detection accuracy map for all three tasks relative to You Only Look Once for Panoptic Driving Perception (YOLOP, a multitask detection network), an 8.9 % improvement in IoU, a 22.1 % improvement in accuracy. It reached a speed of 46fps, which serves the practical applications' requirements more effectively.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"160 ","pages":"Article 111174"},"PeriodicalIF":7.5,"publicationDate":"2024-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142722947","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Learning accurate and enriched features for stereo image super-resolution","authors":"Hu Gao, Depeng Dang","doi":"10.1016/j.patcog.2024.111170","DOIUrl":"10.1016/j.patcog.2024.111170","url":null,"abstract":"<div><div>Stereo image super-resolution (stereoSR) aims to enhance the quality of super-resolution results by incorporating complementary information from an alternative view. Although current methods have shown significant advancements, they typically operate on representations at full resolution to preserve spatial details, facing challenges in accurately capturing contextual information. Simultaneously, they utilize all feature similarities to cross-fuse information from the two views, potentially disregarding the impact of irrelevant information. To overcome this problem, we propose a mixed-scale selective fusion network (MSSFNet) to preserve precise spatial details and incorporate abundant contextual information, and adaptively select and fuse most accurate features from two views to enhance the promotion of high-quality stereoSR. Specifically, we develop a mixed-scale block (MSB) that obtains contextually enriched feature representations across multiple spatial scales while preserving precise spatial details. Furthermore, to dynamically retain the most essential cross-view information, we design a selective fusion attention module (SFAM) that searches and transfers the most accurate features from another view. To learn an enriched set of local and non-local features, we introduce a fast fourier convolution block (FFCB) to explicitly integrate frequency domain knowledge. Extensive experiments show that MSSFNet achieves significant improvements over state-of-the-art approaches on both quantitative and qualitative evaluations. The code and the pre-trained models will be released at <span><span>https://github.com/Tombs98/MSSFNet</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"159 ","pages":"Article 111170"},"PeriodicalIF":7.5,"publicationDate":"2024-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142652232","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Junqi Wu , Wen Yao , Shuai Jia , Tingsong Jiang , Weien Zhou , Chao Ma , Xiaoqian Chen
{"title":"Gradient-based sparse voxel attacks on point cloud object detection","authors":"Junqi Wu , Wen Yao , Shuai Jia , Tingsong Jiang , Weien Zhou , Chao Ma , Xiaoqian Chen","doi":"10.1016/j.patcog.2024.111156","DOIUrl":"10.1016/j.patcog.2024.111156","url":null,"abstract":"<div><div>Point cloud object detection is crucial for a variety of applications, including autonomous driving and robotics. Voxel-based representation for 3D point clouds has drawn significant attention due to their efficiency and effectiveness. Recent studies have revealed the vulnerability of deep learning models to adversarial attacks, while considerably less attention is paid to the robustness of voxel-based point cloud object detectors. Existing adversarial attacks on the point cloud data involve generating fake obstacles, removing objects or producing fake predictions. Despite the demonstrated success, these approaches have three limitations. First, manipulating point data, which was originally designed for point-based representation, is inapplicable to voxel-based representation. Second, existing works that modified points in the hold scene yield redundant perturbations. Third, the evaluation primarily performed on small-scale datasets, such as KITTI, does not scale well. To address these limitations, we propose a gradient-based sparse voxel attack (GSVA) algorithm for voxel-based 3D point cloud object detectors. Two novel frameworks, i.e., re-voxelization-based voxel attack framework and light voxel attack framework, successfully modify voxel-based representation instead of raw points. In addition to KITTI, extensive experiments on large-scale datasets including nuScenes and Waymo Open Dataset demonstrate the favorable attack performance (with mAP decrease by <span><math><mrow><mn>86</mn><mo>.</mo><mn>2</mn><mtext>%</mtext><mo>∼</mo><mn>99</mn><mo>.</mo><mn>5</mn><mtext>%</mtext></mrow></math></span>) and the slight perturbation costs (with lowest modification rate of 3.5%) of our voxel attack method over the state-of-the-art approaches.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"160 ","pages":"Article 111156"},"PeriodicalIF":7.5,"publicationDate":"2024-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142722845","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"CAST: An innovative framework for Cross-dimensional Attention Structure in Transformers","authors":"Dezheng Wang , Xiaoyi Wei , Congyan Chen","doi":"10.1016/j.patcog.2024.111153","DOIUrl":"10.1016/j.patcog.2024.111153","url":null,"abstract":"<div><div>Dominant Transformer-based approaches rely solely on attention mechanisms and their variations, primarily emphasizing capturing crucial information within the temporal dimension. For enhanced performance, we introduce a novel architecture for Cross-dimensional Attention Structure in Transformers (CAST), which presents an innovative approach in Transformer-based models, emphasizing attention mechanisms across both temporal and spatial dimensions. The core component of CAST, the cross-dimensional attention structure (CAS), captures dependencies among multivariable time series in both temporal and spatial dimensions. The Static Attention Mechanism (SAM) is incorporated to simplify and enhance multivariate time series forecasting performance. This integration effectively reduces complexity, leading to a more efficient model. CAST demonstrates robust and efficient capabilities in predicting multivariate time series, with the simplicity of SAM broadening its applicability to various tasks. Beyond time series forecasting, CAST also shows promise in CV classification tasks. By integrating CAS into pre-trained image models, CAST facilitates spatiotemporal reasoning. Experimental results highlight the superior performance of CAST in time series forecasting and its competitive edge in CV classification tasks.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"159 ","pages":"Article 111153"},"PeriodicalIF":7.5,"publicationDate":"2024-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142652184","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bingbing Jiang , Jun Liu , Zidong Wang , Chenglong Zhang , Jie Yang , Yadi Wang , Weiguo Sheng , Weiping Ding
{"title":"Semi-supervised multi-view feature selection with adaptive similarity fusion and learning","authors":"Bingbing Jiang , Jun Liu , Zidong Wang , Chenglong Zhang , Jie Yang , Yadi Wang , Weiguo Sheng , Weiping Ding","doi":"10.1016/j.patcog.2024.111159","DOIUrl":"10.1016/j.patcog.2024.111159","url":null,"abstract":"<div><div>Existing multi-view semi-supervised feature selection methods typically need to calculate the inversion of high-order dense matrices, rendering them impractical for large-scale applications. Meanwhile, traditional works construct similarity graphs on different views and directly fuse these graphs from the view level, ignoring the differences among samples in various views and the interplay between graph learning and feature selection. Consequently, both the reliability of graphs and the discrimination of selected features are compromised. To address these issues, we propose a novel multi-view semi-supervised feature selection with Adaptive Similarity Fusion and Learning (ASFL) for large-scale tasks. Specifically, ASFL constructs bipartite graphs for each view and then leverages the relationships between samples and anchors to align anchors and graphs across different views, preserving the complementarity and consistency among views. Moreover, an effective view-to-sample fusion manner is designed to coalesce the aligned graphs while simultaneously exploiting the neighbor structures in projection subspaces to construct the joint graph compatible across views, reducing the adverse effects of noisy features and outliers. By incorporating bipartite graph fusion and learning, label propagation, and <span><math><msub><mrow><mi>l</mi></mrow><mrow><mn>2</mn><mo>,</mo><mn>0</mn></mrow></msub></math></span>-norm multi-view feature selection into a unified framework, ASFL not only avoids the expensive computation in the solution procedures but also enhances the quality of selected features. An effective optimization strategy with fast convergence is developed to solve the objective function, and experimental results validate its efficiency and effectiveness over state-of-the-art methods.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"159 ","pages":"Article 111159"},"PeriodicalIF":7.5,"publicationDate":"2024-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142652013","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"DyConfidMatch: Dynamic thresholding and re-sampling for 3D semi-supervised learning","authors":"Zhimin Chen, Bing Li","doi":"10.1016/j.patcog.2024.111154","DOIUrl":"10.1016/j.patcog.2024.111154","url":null,"abstract":"<div><div>Semi-supervised learning (SSL) leverages limited labeled and abundant unlabeled data but often faces challenges with data imbalance, especially in 3D contexts. This study investigates class-level confidence as an indicator of learning status in 3D SSL, proposing a novel method that utilizes dynamic thresholding to better use unlabeled data, particularly from underrepresented classes. A re-sampling strategy is also introduced to mitigate bias towards well-represented classes, ensuring equitable class representation. Through extensive experiments in 3D SSL, our method surpasses state-of-the-art counterparts in classification and detection tasks, highlighting its effectiveness in tackling data imbalance. This approach presents a significant advancement in SSL for 3D datasets, providing a robust solution for data imbalance issues.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"159 ","pages":"Article 111154"},"PeriodicalIF":7.5,"publicationDate":"2024-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142652071","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Miguel Carrasco , Benjamin Ivorra , Julio López , Angel M. Ramos
{"title":"Embedded feature selection for robust probability learning machines","authors":"Miguel Carrasco , Benjamin Ivorra , Julio López , Angel M. Ramos","doi":"10.1016/j.patcog.2024.111157","DOIUrl":"10.1016/j.patcog.2024.111157","url":null,"abstract":"<div><h3>Methods:</h3><div>Feature selection is essential for building effective machine learning models in binary classification. Eliminating unnecessary features can reduce the risk of overfitting and improve classification performance. Moreover, the data we handle typically contains a stochastic component, making it important to develop robust models that are insensitive to data perturbations. Although there are numerous methods and tools for feature selection, relatively few studies address embedded feature selection within robust classification models using penalization techniques.</div></div><div><h3>Objective:</h3><div>In this work, we introduce robust classifiers with integrated feature selection capabilities, utilizing probability machines based on different penalization techniques, such as the <span><math><msub><mrow><mi>ℓ</mi></mrow><mrow><mn>1</mn></mrow></msub></math></span>-norm or the elastic-net, combined with a novel Direct Feature Elimination process to improve model resilience and efficiency.</div></div><div><h3>Findings:</h3><div>Numerical experiments on standard datasets demonstrate the effectiveness and robustness of the proposed models in classification tasks even when using a reduced number of features. These experiments were evaluated using original performance indicators, highlighting the models’ ability to maintain high performance with fewer features.</div></div><div><h3>Novelty:</h3><div>The study discusses the trade-offs involved in combining different penalties to select the most relevant features while minimizing empirical risk. In particular, the integration of elastic-net and <span><math><msub><mrow><mi>ℓ</mi></mrow><mrow><mn>1</mn></mrow></msub></math></span>-norm penalties within a unified framework, combined with the original Direct Feature Elimination approach, presents a novel method for improving both model accuracy and robustness.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"159 ","pages":"Article 111157"},"PeriodicalIF":7.5,"publicationDate":"2024-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142652186","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zitong Zhang , Xiaojun Chen , Chen Wang , Ruili Wang , Wei Song , Feiping Nie
{"title":"Structured multi-view k-means clustering","authors":"Zitong Zhang , Xiaojun Chen , Chen Wang , Ruili Wang , Wei Song , Feiping Nie","doi":"10.1016/j.patcog.2024.111113","DOIUrl":"10.1016/j.patcog.2024.111113","url":null,"abstract":"<div><div><span><math><mi>K</mi></math></span>-means is a very efficient clustering method and many multi-view <span><math><mi>k</mi></math></span>-means clustering methods have been proposed for multi-view clustering during the past decade. However, since <span><math><mi>k</mi></math></span>-means have trouble uncovering clusters of varying sizes and densities, these methods suffer from the same performance issues as <span><math><mi>k</mi></math></span>-means. Improving the clustering performance of multi-view <span><math><mi>k</mi></math></span>-means has become a challenging problem. In this paper, we propose a new multi-view <span><math><mi>k</mi></math></span>-means clustering method that is able to uncover clusters in arbitrary sizes and densities. The new method simultaneously performs three tasks, i.e., sparse connection probability matrices learning, prototypes aligning, and cluster structure learning. We evaluate the proposed new method by 5 benchmark datasets and compare it with 11 multi-view clustering methods. The experimental results on both synthetic and real-world experiments show the superiority of our proposed method.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"160 ","pages":"Article 111113"},"PeriodicalIF":7.5,"publicationDate":"2024-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142722939","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}