{"title":"K-BEST subspace clustering: kernel-friendly block-diagonal embedded and similarity-preserving transformed subspace clustering","authors":"Jyoti Maggu, Anurag Goel","doi":"10.1007/s10044-024-01336-2","DOIUrl":"https://doi.org/10.1007/s10044-024-01336-2","url":null,"abstract":"<p>Subspace clustering methods, employing sparse and low-rank models, have demonstrated efficacy in clustering high-dimensional data. These approaches typically assume the separability of input data into distinct subspaces, a premise that does not hold true in general. Furthermore, prevalent low-rank and sparse methods relying on self-expression exhibit effectiveness primarily with linear structure data, facing limitations in processing datasets with intricate nonlinear structures. While kernel subspace clustering methods excel in handling nonlinear structures, they may compromise similarity information during the reconstruction of original data in kernel space. Additionally, these methods may fall short of attaining an affinity matrix with an optimal block-diagonal property. In response to these challenges, this paper introduces a novel subspace clustering approach named Similarity Preserving Kernel Block Diagonal Representation based Transformed Subspace Clustering (KBD-TSC). KBD-TSC contributes in three key aspects: (1) integration of a kernelized version of transform learning within a subspace clustering framework, introducing a block diagonal representation term to generate an affinity matrix with a block-diagonal structure. (2) Construction and integration of a similarity preserving regularizer into the model by minimizing the discrepancy between inner products of the original data and those of the reconstructed data in kernel space. This facilitates enhanced preservation of similarity information between the original data points. (3) Proposal of KBD-TSC by integrating the block diagonal representation term and similarity preserving regularizer into a kernel self-expressing model. The optimization of the proposed model is efficiently addressed through the alternating direction method of multipliers. This study validates the effectiveness of the proposed KBD-TSC method through experimental results obtained from nine datasets, showcasing its potential in addressing the limitations of existing subspace clustering techniques.</p>","PeriodicalId":54639,"journal":{"name":"Pattern Analysis and Applications","volume":"14 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142247451","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ons Bouarada, Muhammad Azam, Manar Amayri, Nizar Bouguila
{"title":"Hidden Markov models with multivariate bounded asymmetric student’s t-mixture model emissions","authors":"Ons Bouarada, Muhammad Azam, Manar Amayri, Nizar Bouguila","doi":"10.1007/s10044-024-01341-5","DOIUrl":"https://doi.org/10.1007/s10044-024-01341-5","url":null,"abstract":"<p>Hidden Markov models (HMMs) are popular methods for continuous sequential data modeling and classification tasks. In such applications, the observation emission densities of the HMM hidden states are generally continuous, can vary from one model to the other, and are typically modeled by elliptically contoured distributions, namely Gaussians or Student’s t-distributions. In this context, this paper proposes a novel HMM with Bounded Asymmetric Student’s t-Mixture Model (BASMM) emissions. Our new BASMMHMM is introduced in the light of the added robustness guaranteed by the BASMM in comparison to other popular emission distributions such as the Gaussian Mixture Model (GMM). In fact, GMMs generally have a limited performance with outliers in the data sets (observations) that the HMM is fitted to. Also, GMMs cannot sufficiently model skewed populations, which are typical in many fields, such as financial or signal processing-related data sets. An excellent alternative to solve this problem is found in Student’s t-mixture models. They have similar behaviour and shape to GMMs, but with heavier tails. This allows to have more tolerance towards data sets that span extensive ranges and include outliers. Asymmetry and bounded support are also important features that can further extend the model’s flexibility and fit the imperfections of real-world data. This leads us to explore the effectiveness of the BASMM as an observation emission distribution in HMMs, hence the proposed BASMMHMM. We will also demonstrate the improved robustness of our model by presenting the results of three different experiments: occupancy estimation, stock price prediction, and human activity recognition.</p>","PeriodicalId":54639,"journal":{"name":"Pattern Analysis and Applications","volume":"14 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142247454","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Haigang Deng, Guocheng Lin, Chengwei Li, Chuanxu Wang
{"title":"Research on decoupled adaptive graph convolution networks based on skeleton data for action recognition","authors":"Haigang Deng, Guocheng Lin, Chengwei Li, Chuanxu Wang","doi":"10.1007/s10044-024-01319-3","DOIUrl":"https://doi.org/10.1007/s10044-024-01319-3","url":null,"abstract":"<p>Graph convolutional network is apt for feature extraction in terms of non-Euclidian human skeleton data, but its adjacency matrix is fixed and the receptive field is small, which results in bias representation for skeleton intrinsic information. In addition, the operation of mean pooling on spatio-temporal features in classification layer will result in losing information and degrade recognition accuracy. To this end, the Decoupled Adaptive Graph Convolutional Network (DAGCN) is proposed. Specifically, a multi-level adaptive adjacency matrix is designed, which can dynamically obtain the rich correlation information among the skeleton nodes by a non-local adaptive algorithm. Whereafter, a new Residual Multi-scale Temporal Convolution Network (RMTCN) is proposed to fully extract temporal feature of the above decoupled skeleton dada. For the second problem in classification, we decompose the spatio-temporal features into three parts as spatial, temporal, spatio-temporal information, they are averagely pooled respectively, and added together for classification, denoted as STMP (spatio-temporal mean pooling) module. Experimental results show that our algorithm achieves accuracy of 96.5%, 90.6%, 96.4% on NTU-RGB+D60, NTU-RGB+D120 and NW-UCLA data sets respectively.</p>","PeriodicalId":54639,"journal":{"name":"Pattern Analysis and Applications","volume":"2 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142247453","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"YOLOv7-GCM: a detection algorithm for creek waste based on improved YOLOv7 model","authors":"Jianhua Qin, Honglan Zhou, Huaian Yi, Luyao Ma, Jianhan Nie, Tingting Huang","doi":"10.1007/s10044-024-01338-0","DOIUrl":"https://doi.org/10.1007/s10044-024-01338-0","url":null,"abstract":"<p>To enhance the cleanliness of creek environments, quadruped robots can be utilized to detect for creek waste. The continuous changes in the water environment significantly reduce the accuracy of image detection when using quadruped robots for image acquisition. In order to improve the accuracy of quadruped robots in waste detection, this article proposed a detection model called YOLOv7-GCM model for creek waste. The model integrated a global attention mechanism (GAM) into the YOLOv7 model, which achieved accurate waste detection in ever-changing backgrounds and underwater conditions. A content-aware reassembly of features (CARAFE) replaced a up-sampling of the YOLOv7 model to achieve more accurate and efficient feature reconstruction. A minimum point distance intersection over union (MPDIOU) loss function replaced the CIOU loss function of the YOLOv7 model to more accurately measure the similarity between target boxes and predictive boxes. After the aforementioned improvements, the YOLOv7-GCM model was obtained. A quadruped robot to patrol the creek and collect images of creek waste. Finally, the YOLOv7-GCM model was trained on the creek waste dataset. The outcomes of the experiment show that the precision rate of the YOLOv7-GCM model has increased by 4.2% and the mean average precision (mAP@0.5) has accumulated by 2.1%. The YOLOv7-GCM model provides a new method for identifying creek waste, which may help promote efficient waste management.</p>","PeriodicalId":54639,"journal":{"name":"Pattern Analysis and Applications","volume":"36 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142247452","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Unveiling the unseen: novel strategies for object detection beyond known distributions","authors":"S. Devi, R. Dayana, P. Malarvezhi","doi":"10.1007/s10044-024-01334-4","DOIUrl":"https://doi.org/10.1007/s10044-024-01334-4","url":null,"abstract":"<p>In contemporary machine learning, models often struggle with data distribution variations, severely impacting their out-of-distribution (OOD) generalization and detection capabilities. Current object detection methods, relying on virtual outlier synthesis and class-conditional density estimation, struggle to effectively distinguish OOD samples. They often depend on accurate density estimation and may produce virtual outliers that lack realism, particularly in complex or dynamic environments. Furthermore, previous research has typically addressed covariate and semantic shifts independently, resulting in fragmented solutions that fail to comprehensively tackle OOD generalization. This study introduces a unified approach to enhance OOD generalization in object recognition models, addressing these critical gaps. The strategy involves employing adversarial perturbations on the ID (In-Distribution) dataset to enhance the model’s resilience to distribution shifts, thereby simulating potential real-world scenarios characterized by imperceptible variations. Additionally, the integration of Maximum Mean Discrepancy (MMD) at the object level effectively discriminates between ID and OOD samples by quantifying distributional differences. For precise OOD detection, a K-nearest neighbors (KNN) algorithm is used during inference to measure similarity between samples and their closest neighbors in the training data. Evaluations on benchmark datasets, including PASCAL VOC and BDD100K as ID, with COCO and Open Images subsets as OOD, demonstrate significant improvements in OOD generalization compared to existing methods. These discoveries underscore the framework’s potential to elevate the dependability and flexibility of object recognition systems in practical scenarios, particularly in autonomous vehicles where accurate object detection under diverse conditions is critical for safety. This research contributes to advancing OOD generalization techniques and lays the groundwork for future refinement to address evolving challenges in machine learning applications. The code can be accessed from https://github.com/DeviSPhd/<span>(OODG_OD)</span></p>","PeriodicalId":54639,"journal":{"name":"Pattern Analysis and Applications","volume":"94 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142247456","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yibo Lv, Shenglian Lu, Xiaoyu Liu, Jiangchuan Bao, Binghao Liu, Ming Chen, Guo Li
{"title":"LDC-PP-YOLOE: a lightweight model for detecting and counting citrus fruit","authors":"Yibo Lv, Shenglian Lu, Xiaoyu Liu, Jiangchuan Bao, Binghao Liu, Ming Chen, Guo Li","doi":"10.1007/s10044-024-01329-1","DOIUrl":"https://doi.org/10.1007/s10044-024-01329-1","url":null,"abstract":"<p>In the citrus orchard environment, accurate counting of the fruit, and the use of lightweight detection methods are the key presteps to automate citrus picking and yield estimations. Most high-precision fruit detection models based on deep learning use complex models with devices that require high quantities of computational resources and memory. Devices with limited resources cannot meet the requirements of these models. Thus, to overcome this problem, we focus on creating a lightweight model with a convolutional neural network. In this research, we propose a lightweight citrus detection model based on the mobile device LDC-PP-YOLOE. LDC-PP-YOLOE is improved based on PP-YOLOE by using localized knowledge distillation and CBAM, with a mAP@0.5 of 88<span>(%)</span>, mAP@0.95 of 51.3<span>(%)</span>, params of 8 M and speed of 0.34 s, respectively. The performance of LDC-PP-YOLOE was compared against commonly used detectors and LDC-PP-YOLOE’s mAP@0.5 was 2.5, 6.9 and 16.3<span>(%)</span>, and was 4.3<span>(%)</span> greater than Faster R-CNN, YOLOX-s and PicoDet-L, respectively. LDC-PP-YOLOE achieved an RMSE of 8.63 and an MSE of 5.27 compared to the ground truth on citrus applications. In addition, we used apple and passion fruit datasets to verify the generalization of the model; the mAP@0.5 is improved by 1 and 0.7<span>(%)</span>. LDC-PP-YOLOE can be used as a lightweight model to help growers track citrus populations and optimize citrus yields in complex citrus orchard environments with resource-limited equipment. It also provides a solution for lightweight models.</p>","PeriodicalId":54639,"journal":{"name":"Pattern Analysis and Applications","volume":"190 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142247455","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Methods for calculating gliding-box lacunarity efficiently on large datasets","authors":"Bálint Barna H. Kovács, Miklós Erdélyi","doi":"10.1007/s10044-024-01332-6","DOIUrl":"https://doi.org/10.1007/s10044-024-01332-6","url":null,"abstract":"<p>Lacunarity has proven to be a useful, multifaceted tool for image analysis in several different scientific fields, from geography to virology, which has lent increasing importance to the lacunarity analysis of large datasets. It can be most reliably calculated with the so-called gliding-box method, but the evaluation process can be exceedingly time-consuming and unviable as this algorithm is not designed to operate on large datasets. Here we introduce two novel methods that can calculate gliding-box lacunarity orders of magnitude faster than the original method without any loss of accuracy. We compare these methods with the original as well as with two already existing optimized methods based on runtime memory usage and complexity. The application of all five methods for both 2D and 3D datasets analysis confirms that each of the four optimized methods are orders of magnitude faster than the original one, but each has its advantages and limitations.</p>","PeriodicalId":54639,"journal":{"name":"Pattern Analysis and Applications","volume":"69 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142247461","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"DN3MF: deep neural network for non-negative matrix factorization towards low rank approximation","authors":"Prasun Dutta, Rajat K. De","doi":"10.1007/s10044-024-01335-3","DOIUrl":"https://doi.org/10.1007/s10044-024-01335-3","url":null,"abstract":"<p>Dimension reduction is one of the most sought-after methodologies to deal with high-dimensional ever-expanding complex datasets. Non-negative matrix factorization (NMF) is one such technique for dimension reduction. Here, a multiple deconstruction multiple reconstruction deep learning model (DN3MF) for NMF targeted towards low rank approximation, has been developed. Non-negative input data has been processed using hierarchical learning to generate part-based sparse and meaningful representation. The novel design of DN3MF ensures the non-negativity requirement of the model. The use of Xavier initialization technique solves the exploding or vanishing gradient problem. The objective function of the model has been designed employing regularization, ensuring the best possible approximation of the input matrix. A novel adaptive learning mechanism has been developed to accomplish the objective of the model. The superior performance of the proposed model has been established by comparing the results obtained by the model with that of six other well-established dimension reduction algorithms on three well-known datasets in terms of preservation of the local structure of data in low rank embedding, and in the context of downstream analyses using classification and clustering. The statistical significance of the results has also been established. The outcome clearly demonstrates DN3MF’s superiority over compared dimension reduction approaches in terms of both statistical and intrinsic property preservation standards. The comparative analysis of all seven dimensionality reduction algorithms including DN3MF with respect to the computational complexity and a pictorial depiction of the convergence analysis for both stages of DN3MF have also been presented.</p>","PeriodicalId":54639,"journal":{"name":"Pattern Analysis and Applications","volume":"6 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142179467","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"MFA U-Net: a U-Net like multi-stage feature analysis network for medical image segmentation","authors":"Yupeng Wang, Suyu Wang, Jian He","doi":"10.1007/s10044-024-01331-7","DOIUrl":"https://doi.org/10.1007/s10044-024-01331-7","url":null,"abstract":"<p>The U-Net and its extensions have achieved good success in medical image segmentation. However, fine-grained segmentation of the objects at their fuzzy edges, which is commonly found in medical images, is still challenging. In this paper, we propose a U-Net like Multi-Stage Feature Analysis Network (MFA U-Net) for medical image segmentation, which focus on mining the reusability of the images and features from several perspectives. Firstly, a multi-channel dimensional feature extraction module is proposed, where the input image was reused by multiple branches of convolutions with different channels to generate supplement features to the original U shaped network. Next, a cascaded U-shaped network is designed for deeper feature mining and analysis, which enables progressive refinement of the features. In the neck of the cascaded network, a parallel hybrid convolution module is designed that concatenating several types of convolutional methods to enhance the semantic representation ability of the model. In short, by reusing of the input images and detected features in several stages, more effective features were extracted and the segmentation performances were improved. The proposed algorithm was evaluated by three mainstream 2D color medical image segmentation datasets and gets significant improvements compared with the traditional U-Net framework, as well as the latest improved ones. Compared to the baseline network, it gets the improvements of 0.93% (Dice) and 1.45% (IoU) on GlaS, 2.09% (Dice) and 2.87% (IoU) on MoNuSeg, and 0.17% (F1) and 1.72% (SE) on DRIVE.</p>","PeriodicalId":54639,"journal":{"name":"Pattern Analysis and Applications","volume":"11 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142179470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A novel residual fourier convolution model for brain tumor segmentation of mr images","authors":"Haipeng Zhu, Hong He","doi":"10.1007/s10044-024-01312-w","DOIUrl":"https://doi.org/10.1007/s10044-024-01312-w","url":null,"abstract":"<p>Magnetic resonance imaging is an essential tool for the early diagnosis of brain tumors. However, it is challenging for the segmentation of the brain tumor of magnetic resonance images due to the most severe problem of blurred boundaries and variable spatial structure. Therefore, combining multiple brain datasets, a novel residual Fourier convolution model with local interpretability is presented to address mentioned above problem in this study. Firstly, an interpretable residual Fourier convolution encoder is constructed by the Fourier transform and its inverse for fast extraction of the spectral features of the brain tumor regions. Furthermore, the dilated-gated attention mechanism is designed to expand the receptive fields and extract blurred irregular boundary features that are closer to the lesion regions. Finally, the encoder-decoder spatial attention fusion mechanism is developed to further extract more fine-grained contextual spatial features from the variable spatial structure of adjacent magnetic resonance slices. Compared to other advanced models, our proposed model has achieved state-of-the-art average segmentation performance by testing on the BraTS2019, Figshare, and TCIA datasets. The average Dice coefficient, sensitivity, MIoU, and PPV respectively reach to 0.892, 87.1%, 0.843, and 91.5%. The proposed segmentation framework can provide more reliable segmentation results for the early diagnosis of brain tumors because of its robust feature extraction ability, interpretability, and generalization ability.</p>","PeriodicalId":54639,"journal":{"name":"Pattern Analysis and Applications","volume":"20 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142179469","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}