{"title":"Subspace clustering via adaptive-loss regularized representation learning with latent affinities","authors":"Kun Jiang, Lei Zhu, Zheng Liu, Qindong Sun","doi":"10.1007/s10044-024-01226-7","DOIUrl":"https://doi.org/10.1007/s10044-024-01226-7","url":null,"abstract":"<p>High-dimensional data that lies on several subspaces tend to be highly correlated and contaminated by various noises, and its affinities across different subspaces are not always reliable, which impedes the effectiveness of subspace clustering. To alleviate the deficiencies, we propose a novel subspace learning model via adaptive-loss regularized representation learning with latent affinities (ALRLA). Specifically, the robust least square regression with nonnegative constraint is firstly proposed to generate more interpretable reconstruction coefficients in low-dimensional subspace and specify the weighted self-representation capability with adaptive loss norm for better robustness and discrimination. Moreover, an adaptive latent graph learning regularizer with an initialized affinity approximation is considered to provide more accurate and robust neighborhood assignment for low-dimensional representations. Finally, the objective model is solved by an alternating optimization algorithm, with theoretical analyses on its convergence and computational complexity. Extensive experiments on benchmark databases demonstrate that the ALRLA model can produce clearer structured representation under redundant and noisy data environment. It achieves competing clustering performance compared with the state-of-the-art clustering models.</p>","PeriodicalId":54639,"journal":{"name":"Pattern Analysis and Applications","volume":"170 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140009476","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Adaptive spectral graph wavelets for collaborative filtering","authors":"Osama Alshareet, A. Ben Hamza","doi":"10.1007/s10044-024-01214-x","DOIUrl":"https://doi.org/10.1007/s10044-024-01214-x","url":null,"abstract":"<p>Collaborative filtering is a popular approach in recommender systems, whose objective is to provide personalized item suggestions to potential users based on their purchase or browsing history. However, personalized recommendations require considerable amount of behavioral data on users, which is usually unavailable for new users, giving rise to the cold-start problem. To help alleviate this challenging problem, we introduce a spectral graph wavelet collaborative filtering framework for implicit feedback data, where users, items and their interactions are represented as a bipartite graph. Specifically, we first propose an adaptive transfer function by leveraging a power transform with the goal of stabilizing the variance of graph frequencies in the spectral domain. Then, we design a deep recommendation model for efficient learning of low-dimensional embeddings of users and items using spectral graph wavelets in an end-to-end fashion. In addition to capturing the graph’s local and global structures, our approach yields localization of graph signals in both spatial and spectral domains and hence not only learns discriminative representations of users and items, but also promotes the recommendation quality. The effectiveness of our proposed model is demonstrated through extensive experiments on real-world benchmark datasets, achieving better recommendation performance compared with strong baseline methods.</p>","PeriodicalId":54639,"journal":{"name":"Pattern Analysis and Applications","volume":"2673 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140009477","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Redirected transfer learning for robust multi-layer subspace learning","authors":"","doi":"10.1007/s10044-024-01233-8","DOIUrl":"https://doi.org/10.1007/s10044-024-01233-8","url":null,"abstract":"<h3>Abstract</h3> <p>Unsupervised transfer learning methods usually exploit the labeled source data to learn a classifier for unlabeled target data with a different but related distribution. However, most of the existing transfer learning methods leverage 0-1 matrix as labels which greatly narrows the flexibility of transfer learning. Another major limitation is that these methods are influenced by the redundant features and noises residing in cross-domain data. To cope with these two issues simultaneously, this paper proposes a redirected transfer learning (RTL) approach for unsupervised transfer learning with a multi-layer subspace learning structure. Specifically, in the first layer, we first learn a robust subspace where data from different domains can be well interlaced. This is made by reconstructing each target sample with the lowest-rank representation of source samples. Besides, imposing the <span> <span>(L_{2,1})</span> </span>-norm sparsity on the regression term and regularization term brings robustness against noise and works for selecting informative features, respectively. In the second layer, we further introduce a redirected label strategy in which the strict binary labels are relaxed into continuous values for each datum. To handle effectively unknown labels of the target domain, we construct the pseudo-labels iteratively for unlabeled target samples to improve the discriminative ability in classification. The superiority of our method in classification tasks is confirmed on several cross-domain datasets.</p>","PeriodicalId":54639,"journal":{"name":"Pattern Analysis and Applications","volume":"50 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140009367","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wai-Tsun Yeung, Xiaohao Cai, Zizhen Liang, Byung-Ho Kang
{"title":"3D orientation field transform","authors":"Wai-Tsun Yeung, Xiaohao Cai, Zizhen Liang, Byung-Ho Kang","doi":"10.1007/s10044-024-01212-z","DOIUrl":"https://doi.org/10.1007/s10044-024-01212-z","url":null,"abstract":"<p>Vascular structure enhancement is very useful in image processing and computer vision. The enhancement of the presence of the structures like tubular networks in given images can improve image-dependent diagnostics and can also facilitate tasks like segmentation. The two-dimensional (2D) orientation field transform has been proved to be effective at enhancing 2D contours and curves in images by means of top-down processing. It, however, has no counterpart in 3D images due to the extremely complicated orientation in 3D against 2D. Given the rising demand and interest in handling 3D images, we experiment with modularising the concept and generalise the algorithm to 3D curves. In this work, we propose a 3D orientation field transform. It is a vascular structure enhancement algorithm that can cleanly enhance images having very low signal-to-noise ratio, and push the limits of 3D image quality that can be enhanced computationally. This work also utilises the benefits of modularity and offers several combinative options that each yield moderately better enhancement results in different scenarios. In principle, the proposed 3D orientation field transform can naturally tackle any number of dimensions. As a special case, it is also ideal for 2D images, owning a simpler methodology compared to the previous 2D orientation field transform. The concise structure of the proposed 3D orientation field transform also allows it to be mixed with other enhancement algorithms, and as a preliminary filter to other tasks like segmentation and detection. The effectiveness of the proposed method is demonstrated with synthetic 3D images and real-world transmission electron microscopy tomograms ranging from 2D curve enhancement to, the more important and interesting, 3D ones. Extensive experiments and comparisons with existing related methods also demonstrate the excellent performance of the proposed 3D orientation field transform.</p>","PeriodicalId":54639,"journal":{"name":"Pattern Analysis and Applications","volume":"15 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140009361","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Learning discriminative local contexts for person re-identification in vehicle surveillance scenarios","authors":"","doi":"10.1007/s10044-024-01219-6","DOIUrl":"https://doi.org/10.1007/s10044-024-01219-6","url":null,"abstract":"<h3>Abstract</h3> <p>In recent years, person re-identification (Re-ID) has been widely used in intelligent surveillance and security. However, Re-ID faces many challenges in the vehicle surveillance scenario, such as heavy occlusion, misalignment, and similar appearances. Most Re-ID methods focus on learning discriminative global features or dividing regions for local feature learning, which may ignore critical but subtle differences between pedestrians. In this paper, we propose a local context aggregation branch for learning discriminative local contexts at multiple scales, which can supplement the critical detailed information omitted in global features. Specifically, we exploit dilated convolutions to simulate spatial feature pyramid to capture multi-scale spatial contexts efficiently. The essential information that can distinguish different pedestrians is then emphasized. Besides, we construct a Re-ID dataset named BSV for vehicle surveillance scenarios and propose a triplet loss with station constraint enhancement, which utilizes additional valuable station information to construct penalty terms to improve the performance of Re-ID further. Extensive experiments are conducted on the proposed BSV dataset and two standard Re-ID datasets, and the results validate the effectiveness of our method.</p>","PeriodicalId":54639,"journal":{"name":"Pattern Analysis and Applications","volume":"1 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140009372","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jose Ramón Prieto, David Becerra, Alejandro Hector Toselli, Carlos Alonso, Enrique Vidal
{"title":"Segmenting large historical notarial manuscripts into multi-page deeds","authors":"Jose Ramón Prieto, David Becerra, Alejandro Hector Toselli, Carlos Alonso, Enrique Vidal","doi":"10.1007/s10044-024-01235-6","DOIUrl":"https://doi.org/10.1007/s10044-024-01235-6","url":null,"abstract":"<p>Archives around the world hold vast digitized series of historical manuscript books or “bundles” containing, among others, notarial records also known as “deeds” or “acts”. One of the first steps to provide metadata which describe the contents of those bundles is to segment them into their individual deeds. Even if deeds are often page-aligned, as in the bundles considered in the present work, this is a time-consuming task, often prohibitive given the huge scale of the manuscript series involved. Unlike traditional Layout Analysis methods for page-level segmentation, our approach goes beyond the realm of a single-page image, providing consistent deed detection results on full bundles. This is achieved in two tightly integrated steps: first, we estimate the class-posterior at the page level for the “initial”, “middle”, and “final” classes; then we “decode” these posteriors applying a series of sequentiality consistency constraints to obtain a consistent book segmentation. Experiments are presented for four large historical manuscripts, varying the number of “deeds” used for training. Two metrics are introduced to assess the quality of book segmentation, one of them taking into account the loss of information entailed by segmentation errors. The problem formalization, the metrics and the empirical work significantly extend our previous works on this topic.</p>","PeriodicalId":54639,"journal":{"name":"Pattern Analysis and Applications","volume":"253 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140009474","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A spatio-temporal binary grid-based clustering model for seismicity analysis","authors":"Rahul Kumar Vijay, Satyasai Jagannath Nanda, Ashish Sharma","doi":"10.1007/s10044-024-01234-7","DOIUrl":"https://doi.org/10.1007/s10044-024-01234-7","url":null,"abstract":"<p>This paper presents a spatio-temporal binary grid-based clustering model for determining complex earthquake clusters with different shapes and heterogeneous densities present in a catalog. The 3D occurrence of earthquakes is mapped into a 2D-low memory sparse matrix through a grid mechanism in the binary domain with consideration of spatio-temporal attributes. Then, image-transformation of a non-empty sets binary feature matrix, a clustering strategy is implemented with logical AND operator as similarity measure among the binary vectors. This approach is applied to solve the problem of seismicity declustering which separates the clustering and non-clustering patterns of seismicity for real-world earthquake catalogs of Japan (1972–2020) and Eastern Mediterranean (1966–2020). Results demonstrate that the proposed method has a significant reduction in both computation and memory footprint with few tuning parameters. Background earthquakes have an impression on the homogeneous Poisson process with fair memory-less characteristics in the time domain as evident from graphical and statistical analysis. Overall seismicity and observed background activity both have similar multi-fractal behavior with a deviation of <span>(pm 0.04)</span>. The comparative analysis is carried out with benchmark declustering models: Gardner–Knopoff, Uhrhammer, Gruenthal window-based method, and Reasenberg’s approach, and superior performance of the proposed method is found in most cases.</p>","PeriodicalId":54639,"journal":{"name":"Pattern Analysis and Applications","volume":"254 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140009634","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Epileptic EEG signal classification using an improved VMD-based convolutional stacked autoencoder","authors":"Sebamai Parija, Pradipta Kishore Dash, Ranjeeta Bisoi","doi":"10.1007/s10044-024-01221-y","DOIUrl":"https://doi.org/10.1007/s10044-024-01221-y","url":null,"abstract":"<p>Numerous techniques have been explored so far for epileptic electroencephalograph (EEG) signal detection and classification. Deep learning-based approaches are in recent demand for data classification with huge features. In this paper, an improved deep learning approach based on convolutional features followed by stacked autoencoder (CSAE) and kernel extreme learning machine (KELM) classifier at the end is proposed for EEG signal classification. The convolutional network extracts initial features by convolution, and after this stage, the features are supplied to stacked autoencoder (SAE) for obtaining final compressed features. These suitable features are then fed to KELM classifier for identifying seizure, seizure-free and healthy EEG signals. The EEG signals are decomposed through chaotic water cycle algorithm-optimised variational mode decomposition (CWCA-OVMD) from which the optimised number of efficient modes is obtained yielding six features like energy, entropy, standard deviation, variance, kurtosis, and skewness. These CWCA-OVMD-based features are then fed to the CSAE for the extraction of relevant features. Once the features are obtained, the KELM classifier is used to classify the EEG signal. The classification results are compared with different deep learning classifiers validating the efficacy of the proposed model. The KELM classifier avoids the choice of hidden neurons in the end layer unlike traditional classifiers which is one of the major advantages.</p>","PeriodicalId":54639,"journal":{"name":"Pattern Analysis and Applications","volume":"108 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140009870","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A detection method for occluded and overlapped apples under close-range targets","authors":"Yuhui Yuan, Hubin Liu, Zengrong Yang, Jianhua Zheng, Junhui Li, Longlian Zhao","doi":"10.1007/s10044-024-01222-x","DOIUrl":"https://doi.org/10.1007/s10044-024-01222-x","url":null,"abstract":"<p>Accurate and rapid identification and location of apples contributes to speeding up automation harvesting. However, in unstructured orchard environments, it is common for apples to be overlapped and occluded by branches and leaves, which interferes with apple identification and localization. In order to quickly reconstruct the fruits under overlapping and occlusion conditions, an adaptive radius selection strategy based on random sample consensus algorithm (ARSS-RANSAC) was proposed. Firstly, the edge of apple in the image was obtained by using image preprocessing method. Secondly, an adaptive radius selection strategy was proposed, which is based on fruit shape characteristics. The fruit initial radius was obtained through horizontal or vertical scanning. Then, combined with RANSAC algorithm to select effective contour points by the determined radius, and the circle center coordinates were obtained. Finally, fitting the circle according to the selected valid contour and achieving the recognition and localization of overlapped and occluded apples. 175 apple images with different overlaps and branches and leaves occlusion were applied to verify the effectiveness of algorithm. The evaluation indicators of overlap rate, average false-positive rate, average false-negative rate, and average segmentation error of ARSS-RANSAC were improved compared with the classical Hough transform method. The detection time of a single image was less than 50 ms, which can meet requirements of real-time target detection. The experimental results show that the ARSS-RANSAC algorithm can quickly and accurately identify and locate occluded and overlapped apples and is expected to be applied to harvesting robots of apple and other round fruits.</p>","PeriodicalId":54639,"journal":{"name":"Pattern Analysis and Applications","volume":"76 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140009523","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alzayat Saleh, Marcus Sheaves, Dean Jerry, Mostafa Rahimi Azghadi
{"title":"How to track and segment fish without human annotations: a self-supervised deep learning approach","authors":"Alzayat Saleh, Marcus Sheaves, Dean Jerry, Mostafa Rahimi Azghadi","doi":"10.1007/s10044-024-01227-6","DOIUrl":"https://doi.org/10.1007/s10044-024-01227-6","url":null,"abstract":"<p>Tracking fish movements and sizes of fish is crucial to understanding their ecology and behaviour. Knowing where fish migrate, how they interact with their environment, and how their size affects their behaviour can help ecologists develop more effective conservation and management strategies to protect fish populations and their habitats. Deep learning is a promising tool to analyse fish ecology from underwater videos. However, training deep neural networks (DNNs) for fish tracking and segmentation requires high-quality labels, which are expensive to obtain. We propose an alternative unsupervised approach that relies on spatial and temporal variations in video data to generate noisy pseudo-ground-truth labels. We train a multi-task DNN using these pseudo-labels. Our framework consists of three stages: (1) an optical flow model generates the pseudo-labels using spatial and temporal consistency between frames, (2) a self-supervised model refines the pseudo-labels incrementally, and (3) a segmentation network uses the refined labels for training. Consequently, we perform extensive experiments to validate our method on three public underwater video datasets and demonstrate its effectiveness for video annotation and segmentation. We also evaluate its robustness to different imaging conditions and discuss its limitations.</p>","PeriodicalId":54639,"journal":{"name":"Pattern Analysis and Applications","volume":"7 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139946871","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}