{"title":"Learning accurate and enriched features for stereo image super-resolution","authors":"Hu Gao, Depeng Dang","doi":"10.1016/j.patcog.2024.111170","DOIUrl":"10.1016/j.patcog.2024.111170","url":null,"abstract":"<div><div>Stereo image super-resolution (stereoSR) aims to enhance the quality of super-resolution results by incorporating complementary information from an alternative view. Although current methods have shown significant advancements, they typically operate on representations at full resolution to preserve spatial details, facing challenges in accurately capturing contextual information. Simultaneously, they utilize all feature similarities to cross-fuse information from the two views, potentially disregarding the impact of irrelevant information. To overcome this problem, we propose a mixed-scale selective fusion network (MSSFNet) to preserve precise spatial details and incorporate abundant contextual information, and adaptively select and fuse most accurate features from two views to enhance the promotion of high-quality stereoSR. Specifically, we develop a mixed-scale block (MSB) that obtains contextually enriched feature representations across multiple spatial scales while preserving precise spatial details. Furthermore, to dynamically retain the most essential cross-view information, we design a selective fusion attention module (SFAM) that searches and transfers the most accurate features from another view. To learn an enriched set of local and non-local features, we introduce a fast fourier convolution block (FFCB) to explicitly integrate frequency domain knowledge. Extensive experiments show that MSSFNet achieves significant improvements over state-of-the-art approaches on both quantitative and qualitative evaluations. The code and the pre-trained models will be released at <span><span>https://github.com/Tombs98/MSSFNet</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"159 ","pages":"Article 111170"},"PeriodicalIF":7.5,"publicationDate":"2024-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142652232","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"CAST: An innovative framework for Cross-dimensional Attention Structure in Transformers","authors":"Dezheng Wang , Xiaoyi Wei , Congyan Chen","doi":"10.1016/j.patcog.2024.111153","DOIUrl":"10.1016/j.patcog.2024.111153","url":null,"abstract":"<div><div>Dominant Transformer-based approaches rely solely on attention mechanisms and their variations, primarily emphasizing capturing crucial information within the temporal dimension. For enhanced performance, we introduce a novel architecture for Cross-dimensional Attention Structure in Transformers (CAST), which presents an innovative approach in Transformer-based models, emphasizing attention mechanisms across both temporal and spatial dimensions. The core component of CAST, the cross-dimensional attention structure (CAS), captures dependencies among multivariable time series in both temporal and spatial dimensions. The Static Attention Mechanism (SAM) is incorporated to simplify and enhance multivariate time series forecasting performance. This integration effectively reduces complexity, leading to a more efficient model. CAST demonstrates robust and efficient capabilities in predicting multivariate time series, with the simplicity of SAM broadening its applicability to various tasks. Beyond time series forecasting, CAST also shows promise in CV classification tasks. By integrating CAS into pre-trained image models, CAST facilitates spatiotemporal reasoning. Experimental results highlight the superior performance of CAST in time series forecasting and its competitive edge in CV classification tasks.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"159 ","pages":"Article 111153"},"PeriodicalIF":7.5,"publicationDate":"2024-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142652184","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bingbing Jiang , Jun Liu , Zidong Wang , Chenglong Zhang , Jie Yang , Yadi Wang , Weiguo Sheng , Weiping Ding
{"title":"Semi-supervised multi-view feature selection with adaptive similarity fusion and learning","authors":"Bingbing Jiang , Jun Liu , Zidong Wang , Chenglong Zhang , Jie Yang , Yadi Wang , Weiguo Sheng , Weiping Ding","doi":"10.1016/j.patcog.2024.111159","DOIUrl":"10.1016/j.patcog.2024.111159","url":null,"abstract":"<div><div>Existing multi-view semi-supervised feature selection methods typically need to calculate the inversion of high-order dense matrices, rendering them impractical for large-scale applications. Meanwhile, traditional works construct similarity graphs on different views and directly fuse these graphs from the view level, ignoring the differences among samples in various views and the interplay between graph learning and feature selection. Consequently, both the reliability of graphs and the discrimination of selected features are compromised. To address these issues, we propose a novel multi-view semi-supervised feature selection with Adaptive Similarity Fusion and Learning (ASFL) for large-scale tasks. Specifically, ASFL constructs bipartite graphs for each view and then leverages the relationships between samples and anchors to align anchors and graphs across different views, preserving the complementarity and consistency among views. Moreover, an effective view-to-sample fusion manner is designed to coalesce the aligned graphs while simultaneously exploiting the neighbor structures in projection subspaces to construct the joint graph compatible across views, reducing the adverse effects of noisy features and outliers. By incorporating bipartite graph fusion and learning, label propagation, and <span><math><msub><mrow><mi>l</mi></mrow><mrow><mn>2</mn><mo>,</mo><mn>0</mn></mrow></msub></math></span>-norm multi-view feature selection into a unified framework, ASFL not only avoids the expensive computation in the solution procedures but also enhances the quality of selected features. An effective optimization strategy with fast convergence is developed to solve the objective function, and experimental results validate its efficiency and effectiveness over state-of-the-art methods.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"159 ","pages":"Article 111159"},"PeriodicalIF":7.5,"publicationDate":"2024-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142652013","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"DyConfidMatch: Dynamic thresholding and re-sampling for 3D semi-supervised learning","authors":"Zhimin Chen, Bing Li","doi":"10.1016/j.patcog.2024.111154","DOIUrl":"10.1016/j.patcog.2024.111154","url":null,"abstract":"<div><div>Semi-supervised learning (SSL) leverages limited labeled and abundant unlabeled data but often faces challenges with data imbalance, especially in 3D contexts. This study investigates class-level confidence as an indicator of learning status in 3D SSL, proposing a novel method that utilizes dynamic thresholding to better use unlabeled data, particularly from underrepresented classes. A re-sampling strategy is also introduced to mitigate bias towards well-represented classes, ensuring equitable class representation. Through extensive experiments in 3D SSL, our method surpasses state-of-the-art counterparts in classification and detection tasks, highlighting its effectiveness in tackling data imbalance. This approach presents a significant advancement in SSL for 3D datasets, providing a robust solution for data imbalance issues.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"159 ","pages":"Article 111154"},"PeriodicalIF":7.5,"publicationDate":"2024-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142652071","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Miguel Carrasco , Benjamin Ivorra , Julio López , Angel M. Ramos
{"title":"Embedded feature selection for robust probability learning machines","authors":"Miguel Carrasco , Benjamin Ivorra , Julio López , Angel M. Ramos","doi":"10.1016/j.patcog.2024.111157","DOIUrl":"10.1016/j.patcog.2024.111157","url":null,"abstract":"<div><h3>Methods:</h3><div>Feature selection is essential for building effective machine learning models in binary classification. Eliminating unnecessary features can reduce the risk of overfitting and improve classification performance. Moreover, the data we handle typically contains a stochastic component, making it important to develop robust models that are insensitive to data perturbations. Although there are numerous methods and tools for feature selection, relatively few studies address embedded feature selection within robust classification models using penalization techniques.</div></div><div><h3>Objective:</h3><div>In this work, we introduce robust classifiers with integrated feature selection capabilities, utilizing probability machines based on different penalization techniques, such as the <span><math><msub><mrow><mi>ℓ</mi></mrow><mrow><mn>1</mn></mrow></msub></math></span>-norm or the elastic-net, combined with a novel Direct Feature Elimination process to improve model resilience and efficiency.</div></div><div><h3>Findings:</h3><div>Numerical experiments on standard datasets demonstrate the effectiveness and robustness of the proposed models in classification tasks even when using a reduced number of features. These experiments were evaluated using original performance indicators, highlighting the models’ ability to maintain high performance with fewer features.</div></div><div><h3>Novelty:</h3><div>The study discusses the trade-offs involved in combining different penalties to select the most relevant features while minimizing empirical risk. In particular, the integration of elastic-net and <span><math><msub><mrow><mi>ℓ</mi></mrow><mrow><mn>1</mn></mrow></msub></math></span>-norm penalties within a unified framework, combined with the original Direct Feature Elimination approach, presents a novel method for improving both model accuracy and robustness.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"159 ","pages":"Article 111157"},"PeriodicalIF":7.5,"publicationDate":"2024-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142652186","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Topology reorganized graph contrastive learning with mitigating semantic drift","authors":"Jiaqiang Zhang, Songcan Chen","doi":"10.1016/j.patcog.2024.111160","DOIUrl":"10.1016/j.patcog.2024.111160","url":null,"abstract":"<div><div>Graph contrastive learning (GCL) is an effective paradigm for node representation learning in graphs. The key components hidden behind GCL are data augmentation and positive–negative pair selection. Typical data augmentations in GCL, such as uniform deletion of edges, are generally blind and resort to local perturbation, which is prone to producing under-diversity views. Additionally, there is a risk of making the augmented data traverse to other classes. Moreover, most methods always treat all other samples as negatives. Such a negative pairing naturally results in sampling bias and likewise may make the learned representation suffer from semantic drift. Therefore, to increase the diversity of the contrastive view, we propose two simple and effective global topological augmentations to compensate current GCL. One is to mine the semantic correlation between nodes in the feature space. The other is to utilize the algebraic properties of the adjacency matrix to characterize the topology by eigen-decomposition. With the help of both, we can retain important edges to build a better view. To reduce the risk of semantic drift, a prototype-based negative pair selection is further designed which can filter false negative samples. Extensive experiments on various tasks demonstrate the advantages of the model compared to the state-of-the-art methods.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"159 ","pages":"Article 111160"},"PeriodicalIF":7.5,"publicationDate":"2024-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142652187","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Cuican Yu , Fengxun Sun , Zihui Zhang , Huibin Li , Liming Chen , Jian Sun , Zongben Xu
{"title":"Adaptive representation learning and sample weighting for low-quality 3D face recognition","authors":"Cuican Yu , Fengxun Sun , Zihui Zhang , Huibin Li , Liming Chen , Jian Sun , Zongben Xu","doi":"10.1016/j.patcog.2024.111161","DOIUrl":"10.1016/j.patcog.2024.111161","url":null,"abstract":"<div><div>3D face recognition (3DFR) algorithms have advanced significantly in the past two decades by leveraging facial geometric information, but they mostly focus on high-quality 3D face scans, thus limiting their practicality in real-world scenarios. Recently, with the development of affordable consumer-level depth cameras, the focus has shifted towards low-quality 3D face scans. In this paper, we propose a method for low-quality 3DFR. On one hand, our approach employs the normalizing flow to model an adaptive-form distribution for any given 3D face scan. This adaptive distributional representation learning strategy allows for more robust representations of low-quality 3D face scans (which may be caused by the scan noises, pose or occlusion variations, etc.). On the other hand, we introduce an adaptive sample weighting strategy to adjust the importance of each training sample by measuring both the difficulty of being recognized and the data quality. This adaptive sample weighting strategy can further enhance the robustness of the deep model and meanwhile improve its performance on low-quality 3DFR. Through comprehensive experiments, we demonstrate that our method can significantly improve the performance of low-quality 3DFR. For example, our method achieves competitive results on both the IIIT-D database and the Lock3DFace datasets, underscoring its effectiveness in addressing the challenges associated with low-quality 3D faces.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"159 ","pages":"Article 111161"},"PeriodicalIF":7.5,"publicationDate":"2024-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142652078","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Corruption-based anomaly detection and interpretation in tabular data","authors":"Chunghyup Mok , Seoung Bum Kim","doi":"10.1016/j.patcog.2024.111149","DOIUrl":"10.1016/j.patcog.2024.111149","url":null,"abstract":"<div><div>Recent advances in self-supervised learning (SSL) have proven crucial in effectively learning representations of unstructured data, encompassing text, images, and audio. Although the applications of these advances in anomaly detection have been explored extensively, applying SSL to tabular data presents challenges because of the absence of prior information on data structure. In response, we propose a framework for anomaly detection in tabular datasets using variable corruption. Through selective variable corruption and assignment of new labels based on the degree of corruption, our framework can effectively distinguish between normal and abnormal data. Furthermore, analyzing the impact of corruption on anomaly scores aids in the identification of important variables. Experimental results obtained from various tabular datasets validate the precision and applicability of the proposed method. The source code can be accessed at <span><span>https://github.com/mokch/CAIT</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"159 ","pages":"Article 111149"},"PeriodicalIF":7.5,"publicationDate":"2024-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142652185","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yang Chen , Lin Zhang , Shengjie Zhao , Yicong Zhou
{"title":"Online indoor visual odometry with semantic assistance under implicit epipolar constraints","authors":"Yang Chen , Lin Zhang , Shengjie Zhao , Yicong Zhou","doi":"10.1016/j.patcog.2024.111150","DOIUrl":"10.1016/j.patcog.2024.111150","url":null,"abstract":"<div><div>Among solutions to the tasks of indoor localization and reconstruction, compared with traditional SLAM (Simultaneous Localization And Mapping), learning-based VO (Visual Odometry) has gained more and more popularity due to its robustness and low cost. However, the performance of existing indoor deep VOs is still limited in comparison with their outdoor counterparts mainly owing to large areas of textureless regions and complex indoor motions containing much more rotations. In this paper, the above two challenges are carefully tackled with the proposed SEOVO (Semantic Epipolar-constrained Online VO). On the one hand, as far as we know, SEOVO is the first semantic-aided VO under an online adaptive framework, which adaptively reconstructs low-texture planes without any supervision. On the other hand, we introduce the epipolar geometric constraint in an implicit way for improving the accuracy of pose estimation without destroying the global scale consistency. The efficiency and efficacy of SEOVO have been corroborated by extensive experiments conducted on both public datasets and our collected video sequences.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"159 ","pages":"Article 111150"},"PeriodicalIF":7.5,"publicationDate":"2024-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142652072","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"DSCIMABNet: A novel multi-head attention depthwise separable CNN model for skin cancer detection","authors":"Hatice Catal Reis , Veysel Turk","doi":"10.1016/j.patcog.2024.111182","DOIUrl":"10.1016/j.patcog.2024.111182","url":null,"abstract":"<div><div>Skin cancer is a common type of cancer worldwide. Early diagnosis of skin cancer can reduce the risk of death by increasing treatment success. However, it is challenging for dermatologists or specialists because the symptoms are vague in the early stages and cannot be noticed by the naked eye. This study examines digital diagnostic techniques supported by artificial intelligence, focusing on early skin cancer detection and two methods have been proposed. In the first method, DSCIMABNet deep learning architecture was developed by combining multi-head attention and depthwise separable convolution techniques. This model provides flexibility in learning the dataset's local features, abstract concepts, and long-term relationships. The DSCIMABNet model and modern deep learning models trained on ImageNet are proposed to be combined with the ensemble learning method in the second method. This approach provides a comprehensive feature extraction process that will increase the performance of the classification process with ensemble learning. The proposed approaches are trained and evaluated on the ISIC 2018 dataset with image enhancement applied in preprocessing. In the experimental results, DSCIMABNet achieved 84.28% accuracy, while the proposed hybrid method achieved 99.40% accuracy. Moreover, on the Mendeley dataset (CNN for Melanoma Detection Data), DSCIMABNet achieved 92.58% accuracy, while the hybrid method achieved 99.37% accuracy. This study may significantly contribute to developing new and effective methods for the early diagnosis and treatment of skin cancer.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"159 ","pages":"Article 111182"},"PeriodicalIF":7.5,"publicationDate":"2024-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142652041","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}