Jinmei Song, Baokai Liu, Yao Yu, Kaiwu Zhang, Shiqiang Du
{"title":"Multi-view spectral clustering based on constrained Laplacian rank","authors":"Jinmei Song, Baokai Liu, Yao Yu, Kaiwu Zhang, Shiqiang Du","doi":"10.1007/s00138-023-01497-w","DOIUrl":"https://doi.org/10.1007/s00138-023-01497-w","url":null,"abstract":"<p>The graph-based approach is a representative clustering method among multi-view clustering algorithms. However, it remains a challenge to quickly acquire complementary information in multi-view data and to execute effective clustering when the quality of the initially constructed data graph is inadequate. Therefore, we propose multi-view spectral clustering based on constrained Laplacian rank method, a new graph-based method (CLRSC). The following are our contributions: (1) Self-representation learning and CLR are extended to multi-view and they are connected into a unified framework to learn a common affinity matrix. (2) To achieve the overall optimization we construct a graph learning method based on constrained Laplacian rank and combine it with spectral clustering. (3) An iterative optimization-based procedure we designed and showed that our algorithm is convergent. Finally, sufficient experiments are carried out on 5 benchmark datasets. The experimental results on MSRC-v1 and BBCSport datasets show that the accuracy (ACC) of the method is 10.95% and 4.61% higher than the optimal comparison algorithm, respectively.\u0000</p>","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":"12 1","pages":""},"PeriodicalIF":3.3,"publicationDate":"2024-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139464496","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
C. Elmo Kulanesan, P. Vacher, L. Charleux, E. Roux
{"title":"High-accuracy 3D locators tracking in real time using monocular vision","authors":"C. Elmo Kulanesan, P. Vacher, L. Charleux, E. Roux","doi":"10.1007/s00138-023-01498-9","DOIUrl":"https://doi.org/10.1007/s00138-023-01498-9","url":null,"abstract":"<p>In the field of medical applications, precise localization of medical instruments and bone structures is crucial to ensure computer-assisted surgical interventions. In orthopedic surgery, existing devices typically rely on stereoscopic vision. Their purpose is to aid the surgeon in screw fixation of prostheses or bone removal. This article addresses the challenge of localizing a rigid object consisting of randomly arranged planar markers using a single camera. This approach is especially vital in medical situations where accurate object alignment relative to a camera is necessary at distances ranging from 80 cm to 120 cm. In addition, the size limitation of a few tens of centimeters ensures that the resulting locator does not obstruct the work area. This rigid locator consists of a solid at the surface of which a set of plane markers (ArUco) are glued. These plane markers are randomly distributed over the surface in order to systematically have a minimum of two visible markers whatever the orientation of the locator. The calibration of the locator involves finding the relative positions between the individual planar elements and is based on a bundle adjustment approach. One of the main and known difficulties associated with planar markers is the problem of pose ambiguity. To solve this problem, our method lies in the formulation of an efficient initial solution for the optimization step. After the calibration step, the reached positioning uncertainties of the locator are better than two-tenth of a cubic millimeter and one-tenth of a degree, regardless of the orientation of the locator in space. To assess the proposed method, the locator is rigidly attached to a stylus of about twenty centimeters length. Thanks to this approach, the tip of this stylus seen by a 16.1 megapixel camera at a distance of about 1 m is localized in real time in a cube lower than 1 mm side. A surface registration application is proposed by using the stylus on an artificial scapula.\u0000</p>","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":"129 1","pages":""},"PeriodicalIF":3.3,"publicationDate":"2024-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139464897","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Local region-learning modules for point cloud classification","authors":"Kaya Turgut, Helin Dutagaci","doi":"10.1007/s00138-023-01495-y","DOIUrl":"https://doi.org/10.1007/s00138-023-01495-y","url":null,"abstract":"<p>Data organization via forming local regions is an integral part of deep learning networks that process 3D point clouds in a hierarchical manner. At each level, the point cloud is sampled to extract representative points and these points are used to be centers of local regions. The organization of local regions is of considerable importance since it determines the location and size of the receptive field at a particular layer of feature aggregation. In this paper, we present two local region-learning modules: Center Shift Module to infer the appropriate shift for each center point, and Radius Update Module to alter the radius of each local region. The parameters of the modules are learned through optimizing the loss associated with the particular task within an end-to-end network. We present alternatives for these modules through various ways of modeling the interactions of the features and locations of 3D points in the point cloud. We integrated both modules independently and together to the PointNet++ and PointCNN object classification architectures, and demonstrated that the modules contributed to a significant increase in classification accuracy for the ScanObjectNN data set consisting of scans of real-world objects. Our further experiments on ShapeNet data set showed that the modules are also effective on 3D CAD models.\u0000</p>","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":"307 5 1","pages":""},"PeriodicalIF":3.3,"publicationDate":"2023-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138823537","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A fast anchor-based graph-regularized low-rank representation approach for large-scale subspace clustering","authors":"Lili Fan, Guifu Lu, Ganyi Tang, Yong Wang","doi":"10.1007/s00138-023-01487-y","DOIUrl":"https://doi.org/10.1007/s00138-023-01487-y","url":null,"abstract":"<p>Graph-regularized low-rank representation (GLRR) is an important subspace clustering (SC) algorithm, which has been widely used in pattern recognition and other related fields. It can not only represent the global structure of data, but also capture the nonlinear geometric information. However, GLRR has encountered bottlenecks in dealing with large-scale SC problems since it contains singular value decomposition and similarity matrix construction. To solve this problem, we propose a novel method, i.e., fast anchor-based graph-regularized low-rank representation (FA-GLRR) approach for large-scale subspace clustering. Specifically, anchor graph is first used to accelerate the construction of similarity matrix, and then, some equivalent transformations are given to transform large-scale problems into small-scale problems. These two strategies reduce the computational complexity of GLRR dramatically. Experiments on several common datasets demonstrate the superiority of FA-GLRR in terms of time performance and clustering performance.</p>","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":"21 1","pages":""},"PeriodicalIF":3.3,"publicationDate":"2023-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138745667","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Biomimetic oculomotor control with spiking neural networks","authors":"Taasin Saquib, Demetri Terzopoulos","doi":"10.1007/s00138-023-01494-z","DOIUrl":"https://doi.org/10.1007/s00138-023-01494-z","url":null,"abstract":"<p>Spiking neural networks (SNNs) are comprised of artificial neurons that, like their biological counterparts, communicate via electrical spikes. SNNs have been hailed as the next wave of deep learning as they promise low latency and low-power consumption when run on neuromorphic hardware. Current deep neural network models for computer vision often require power-hungry GPUs to train and run, making them great candidates to replace with SNNs. We develop and train a biomimetic, SNN-driven, neuromuscular oculomotor controller for a realistic biomechanical model of the human eye. Inspired by the ON and OFF bipolar cells of the retina, we use event-based data flow in the SNN to direct the necessary extraocular muscle-driven eye movements. We train our SNN models from scratch, using modified deep learning techniques. Classification tasks are straightforward to implement with SNNs and have received the most research attention, but visual tracking is a regression task. We use surrogate gradients and introduce a linear layer to convert membrane voltages from the final spiking layer into the desired outputs. Our SNN foveation network enhances the biomimetic properties of the virtual eye model and enables it to perform reliable visual tracking. Overall, with event-based data processed by an SNN, our oculomotor controller successfully tracks a visual target while activating 87.3% fewer neurons than a conventional neural network.\u0000</p>","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":"226 1 1","pages":""},"PeriodicalIF":3.3,"publicationDate":"2023-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138717573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"DetailPoint: detailed feature learning on point clouds with attention mechanism","authors":"Ying Li, Jincheng Bai, Huankun Sheng","doi":"10.1007/s00138-023-01491-2","DOIUrl":"https://doi.org/10.1007/s00138-023-01491-2","url":null,"abstract":"<p>Point cloud analysis is an important part of 3D geometric processing. It has been widely used in many fields, such as automatic driving and robots. Although great progress has been made in recent years, there are still some unresolved problems. For example, current methods devote employing MLP to extract local features after search k neighbor points, they cannot effectively model the dependency relationship between the anchor point and k neighboring points. In addition, the prevailing models may not exploit the inherent structural similarities present in the global scope. To solve these issues, we propose a feature extraction model named DetailPoint to get detailed local information and long-range global dependency of point clouds. DetailPoint possess three units: the shallow local learning unit, the deep local learning unit and the deep global learning unit. We first use the SLL to extract shallow local features, and then use the DLL to learn deep local features. In these two units, we design a dual-path extraction method to acquire detail local features with dependencies. Finally, the DGL unit is employed to improve the generalization ability of local features and establish global interaction. These three units are connected in series to form our DetailPoint. We evaluated the performance of our model on four datasets, ScanObjectNN and ModelNet40 for shape classification, the ShapeNet dataset for part segmentation, and the S3DIS dataset for sementatic segmentations. The experimental results demonstrate that DetailPoint is capable of expressing point clouds more effectively, resulting in superior performance compared to existing methods.</p>","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":"72 1","pages":""},"PeriodicalIF":3.3,"publicationDate":"2023-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138717065","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Cancelable face recognition using phase retrieval and complex principal component analysis network","authors":"Zhuhong Shao, Leding Li, Zuowei Zhang, Bicao Li, Xilin Liu, Yuanyuan Shang, Bin Chen","doi":"10.1007/s00138-023-01496-x","DOIUrl":"https://doi.org/10.1007/s00138-023-01496-x","url":null,"abstract":"<p>Considering the necessity of sensitive information protection in face image, a cancelable template generation model for multimodal face images is proposed in this paper. Firstly, the visual meaningful face images are transformed into phase-only functions through phase retrieval in gyrator domain. Then random projection and chaotic-based mask are constituted into modulation for achieving revocability and distinguishability. The interim results are mapped to a higher-dimensional space using random Fourier features. Followed by two-stage complex-valued principal component analysis, the convolutional filters are learned efficiently. Together with binary hashing and decimal coding, local histogram features are obtained and forwarded to SVM for training and recognition. Experiments performed on three publicly multimodal datasets demonstrate that the proposed algorithm can obtain higher accuracy, precision, recall and F-score in comparison with some existing algorithms while the templates are non-invertible and easy to revoke.</p>","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":"150 1","pages":""},"PeriodicalIF":3.3,"publicationDate":"2023-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138686559","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Few-shot object detection via data augmentation and distribution calibration","authors":"Songhao Zhu, Kai Zhang","doi":"10.1007/s00138-023-01486-z","DOIUrl":"https://doi.org/10.1007/s00138-023-01486-z","url":null,"abstract":"<p>General object detection has been widely developed and studied over the past few years, while few-shot object detection is still in the exploratory stage. Learning effective knowledge from a limited number of samples is challenging, as the trained model is prone to over-fitting due to biased feature distributions in a few training samples. There exist two significant challenges in traditional few-shot object detection methods: (1) The scarcity of extreme samples aggravates the proposal distribution bias, hindering the evolution of regions of interest (ROI) heads toward new categories; (2) Due to the scarce of the samples in novel categories, the region proposal network (RPN) is identified as a key source of classification errors, resulting in a significant decrease in detection performance on novel categories. To overcome these challenges, an effective knowledge transfer method based on distributed calibration and data augmentation is proposed. Firstly, the biased novel category distributions are calibrated with the basic category distributions; secondly, a drift compensation strategy is utilized to reduce the negative impact on new categories classifications during the fine-tuning process; thirdly, synthetic features are obtained from calibrated distributions of novel categories and added to the subsequent training process. Furthermore, the domain-aware data augmentation is utilized to alleviate the issue of data scarcity by exploiting the cross-image foreground—background mixture to increase the diversity and rationality of augmented data. Experimental results demonstrate the effectiveness and applicability of the proposed method.</p>","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":"195 1","pages":""},"PeriodicalIF":3.3,"publicationDate":"2023-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138553152","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Pixel representations, sampling, and label correction for semantic part detection","authors":"Jiao-Chuan Huang, You-Lin Lin, Wen-Chieh Fang","doi":"10.1007/s00138-023-01493-0","DOIUrl":"https://doi.org/10.1007/s00138-023-01493-0","url":null,"abstract":"<p>Semantic part detection within an object is of importance in the field of computer vision. This study proposes a novel approach to semantic part detection that starts by employing a convolutional neural network to concatenate a selection of feature maps from the network into a long vector for pixel representation. Using this dedicated pixel representation, we implement a range of techniques, such as Poisson disk sampling for pixel sampling and Poisson matting for pixel label correction. These techniques efficiently facilitate the training of a practical pixel classifier for part detection. Our experimental exploration investigated various factors that affect the model’s performance, including training data labeling (with or without the aid of Poisson matting), hypercolumn representation dimensionality, neural network architecture, post-processing techniques, and pixel classifier selection. In addition, we conducted a comparative analysis of our approach with established object detection methods.\u0000</p>","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":"1 6","pages":""},"PeriodicalIF":3.3,"publicationDate":"2023-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138524233","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Spatial-temporal graph-guided global attention network for video-based person re-identification","authors":"Xiaobao Li, Wen Wang, Qingyong Li, Jiang Zhang","doi":"10.1007/s00138-023-01489-w","DOIUrl":"https://doi.org/10.1007/s00138-023-01489-w","url":null,"abstract":"<p>Global attention learning has been extensively applied in video-based person re-identification due to its superiority in capturing contextual correlations. However, existing global attention learning methods usually adopt the conventional neural network to model non-Euclidean contextual correlations, resulting in a limited representation ability. Inspired by the graph-structure property of the contextual correlations, we propose a spatial-temporal graph-guided global attention network (STG<span>(^3)</span>A) for video-based person re-identification. STG<span>(^3)</span>A comprises two graph-guided attention modules to capture the spatial contexts within a frame and temporal contexts across all frames in a sequence for global attention learning. Furthermore, the graphs from both modules are encoded as graph representations, which combine with weighted representations to grasp the spatial-temporal contextual information adequately for video feature learning. To reduce the effect of noisy graph nodes and learn robust graph representations, a graph node attention is developed to trade-off the importance of each graph node, leading to noise-tolerant graph models. Finally, we design a graph-guided fusion scheme to integrate the representations output by these two attentive modules for a more compact video feature. Extensive experiments on MARS and DukeMTMCVideoReID datasets demonstrate the superior performance of the STG<span>(^3)</span>A.</p>","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":"55 1","pages":""},"PeriodicalIF":3.3,"publicationDate":"2023-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138524232","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}