Yan Gui, Yiru Ou, Min Liang, Jianming Zhang, Zhihua Chen
{"title":"Spatio-temporal SiamFC: per-clip visual tracking with siamese non-local 3D convolutional networks and multi-template updating","authors":"Yan Gui, Yiru Ou, Min Liang, Jianming Zhang, Zhihua Chen","doi":"10.1007/s10044-024-01328-2","DOIUrl":"https://doi.org/10.1007/s10044-024-01328-2","url":null,"abstract":"<p>Recently, Siamese network based approaches show promising results on visual object tracking. These methods typically handle the tracking task by per-frame object detection and thus fail to fully exploit the rich temporal contexts among successive frames, which are important for accurate and robust object tracking. To benefit from the temporal information, in this paper, we investigate a per-clip tracking scheme in the Siamese-based approach and present a novel spatio-temporal SiamFC method for high-performance visual tracking. More specifically, we incorporate a non-local 3D fully convolutional network into a Siamese framework, which allows the model to act directly on the inputs of multiple templates and search video clips and to extract features from both spatial and temporal dimensions, thereby capturing the temporal information encoded in multiple video frames. We then propose a multi-template matching module to learn a representative tracking model using spatio-temporal template features and propagate informative target cues from the template set to the search clip using attention, which facilitate the object searching in clips. During inference, we employ a confident search region cropping and a dynamic multi-template update mechanism for stable and robust per-clip tracking. Experiments on six benchmark datasets show that our spatio-temporal SiamFC achieves competitive performance compared to state-of-the-art while running at approximatively 60 FPS on GPU. Codes are available at https://github.com/liangminstu/STSiamFC.</p>","PeriodicalId":54639,"journal":{"name":"Pattern Analysis and Applications","volume":"14 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142179565","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A robust 3D unique descriptor for 3D object detection","authors":"Piyush Joshi, Alireza Rastegarpanah, Rustam Stolkin","doi":"10.1007/s10044-024-01326-4","DOIUrl":"https://doi.org/10.1007/s10044-024-01326-4","url":null,"abstract":"<p>3D object recognition techniques based on local surface features are widely used for robust recognition. This paper proposes a 3D object recognition technique named 3DU using local features computed based on the uniqueness of keypoints. The technique first transforms 3D keypoints into another 3D space using Local Reference Frame. This transformation helps to find a list of probable matched keypoints of a query keypoint. Further, the proposed uniqueness-based descriptor rejects false matches to obtain the best match from the list. The proposed technique is validated by experiments on the Bologna dataset and achieved 100% recognition rate. In real-time scenarios, scenes obtained by an RGBD camera primarily consist of point density variation, cluttered surfaces, and occlusions. Most of the 3D descriptors have not been validated on such scenes in literature. We have analyzed 3DU and top-rated techniques on three RGBD datasets (dataset proposed in this paper, Challenge and Willow datasets). The results obtained by experiments on the proposed dataset show that the top-rated techniques have failed to handle RGBD data and 3DU has outperformed all compared techniques. The inferior performance of all techniques on complex datasets such as Challenge and Willow has elicited a need to develop robust training-free recognition techniques. The proposed dataset and code of the proposed technique 3DU are openly available in Mendeley (anonymously). http://dx.doi.org/10.17632/rfvzy9jn5v.1.</p>","PeriodicalId":54639,"journal":{"name":"Pattern Analysis and Applications","volume":"4 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142179471","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Li Lv, Zhipeng He, Juan Chen, Fayang Duan, Shenyu Qiu, Jeng-Shyang Pan
{"title":"Weighted least squares twin support vector machine based on density peaks","authors":"Li Lv, Zhipeng He, Juan Chen, Fayang Duan, Shenyu Qiu, Jeng-Shyang Pan","doi":"10.1007/s10044-024-01311-x","DOIUrl":"https://doi.org/10.1007/s10044-024-01311-x","url":null,"abstract":"<p>The least-squares twin support vector machine integrates all samples equally into the quadratic programming problem to calculate the optimal classification hyperplane, and does not distinguish the noise points in the samples, which causes the model to be sensitive to noise points and affected by the overlapping samples of positive and negative classes, and reduces the classification accuracy. To address the above problems, this paper proposes a weighted least squares twin support vector machine based on density peaks. Firstly, the algorithm combines the idea of density peaks to construct a new density weighting strategy, which gives a suitable weight value to this sample through the local density of the sample as well as the relative distance together to highlight the importance of the local center and reduce the influence of noise on the model; secondly, the separability between classes is defined according to the local density matrix, which reduces the influence of positive and negative class overlapping samples on the model and enhances the inter-class separability of the model; finally, an extensive weighting strategy is used in the model to assign weight values to both classes of samples to improve the robustness of the model to cross samples. The comparison experiments on the artificial dataset and the UCI dataset show that the algorithm in this paper can assign appropriate weights to different samples to improve the classification accuracy, while the experiments on the MNIST dataset demonstrate the effectiveness of the algorithm in this paper for real classification problems.</p>","PeriodicalId":54639,"journal":{"name":"Pattern Analysis and Applications","volume":"27 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142179473","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Intrinsic K-means clustering over homogeneous manifolds","authors":"Chao Tan, Huan Zhao, Han Ding","doi":"10.1007/s10044-024-01330-8","DOIUrl":"https://doi.org/10.1007/s10044-024-01330-8","url":null,"abstract":"<p>The original K-means algorithm is widely applied for clustering in Euclidean spaces. Nevertheless, due to the non-flat characteristics of the Riemannian manifold, standard Euclidean K-means algorithms yield inferior results on such data. To address this issue, this paper presents an intrinsic K-means clustering algorithm on homogeneous manifolds based on the geodesic distance. It allows the development of K-means-based methods for frequently occurring non-vector spaces in robotics, such as directional vector modelling <span>(mathbb {S}^2)</span> and pose estimation <span>(mathbb {S}^3)</span>. First, the Riemannian metric of the homogeneous manifold is delivered; on this basis, the intrinsic K-means is proposed using Karcher mean, and its convergence is proved. Then, differences between the proposed algorithm and four projection-based algorithms, such as embedding projection, stereographic projection, central projection and logarithmic projection, are discussed by investigating their distance preservation on manifolds. Finally, to evaluate the effectiveness of the proposed algorithm, it is compared with the projection-based algorithms on <span>(mathbb {S}^n)</span>. The results show that the intrinsic K-means achieves better clustering results, where the clustering accuracy of the proposed method is improved by 47% and 27% on average on artificial <span>(mathbb {S}^2)</span> and <span>(mathbb {S}^3)</span> datasets, respectively. Meanwhile, the noise immunity of the proposed algorithm becomes more evident with the noise ratio increase.</p>","PeriodicalId":54639,"journal":{"name":"Pattern Analysis and Applications","volume":"15 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142179476","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"CrackYOLO: towards efficient dam crack detection for underwater scenes","authors":"Pengfei Shi, Shen Shao, Xinnan Fan, Yuanxue Xin, Zhongkai Zhou, Pengfei Cao, Xinyu Li, Sisi Zhu","doi":"10.1007/s10044-024-01310-y","DOIUrl":"https://doi.org/10.1007/s10044-024-01310-y","url":null,"abstract":"<p>Crack is one of the main factors threatening the safety of the dam. Automatic image object detection is the main way of underwater dam crack detection. However, the traditional methods have problems with low crack detection speed, high false alarm rate, and poor robustness. In addition, the existing methods cannot get a satsifying detection result with a high detection speed. To solve these problems, we propose an efficient dam crack detection method for underwater scenes, called CrackYOLO. Firstly, to better integrate the multi-scale features without incurring excessive computational costs, we propose a feature fusion module in CrackYOLO. Next, we re-design the skip-connection in the network to get better features, compressing the overall model parameters. Then, we propose a feature extraction module called Res2C3, which combines semantic and location information. After that, we proposed a BCAtt to make features focus on both channel and location information. Finally, according to the characteristics of dam underwater crack images, we use a genetic algorithm to select the best value of hyperparameters of the model. The experimental results show that the proposed method detects underwater dam cracks robustly with less computational cost. Our CrackYOLO can get 94.3% mAP[0.5] and 151 FPS in underwater crack detection task which can achieve a real-time detection in practice.</p>","PeriodicalId":54639,"journal":{"name":"Pattern Analysis and Applications","volume":"6 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142179494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"CSMF-SPC: Multimodal Sentiment Analysis Model with Effective Context Semantic Modality Fusion and Sentiment Polarity Correction","authors":"Yuqiang Li, Wenxuan Weng, Chun Liu, Lin Li","doi":"10.1007/s10044-024-01320-w","DOIUrl":"https://doi.org/10.1007/s10044-024-01320-w","url":null,"abstract":"<p>Multimodal sentiment analysis focuses on the fusion of multiple modalities. However, modality representation learning is a key step for better modality fusion, so how to fully learn the sentiment information of non-text modalities is a problem worth exploring. In addition, how to further improve the accuracy of sentiment polarity prediction is also a work to be studied. To solve the above problems, we propose a multimodal sentiment analysis model with effective context semantic modality fusion and sentiment polarity correction (CSMF-SPC). Firstly, we design a low-rank multimodal fusion network based on context semantic modality (CSM-LRMFN). CSM-LRMFN uses the bi-directional long short-term memory network to extract the context semantic features of non-text modalities, and the BERT to extract the features of text modality. Then, CSM-LRMFN adopts a low-rank multimodal fusion method to fully extract the interaction information among modalities with contextual semantics. Different from previous studies, to improve the accuracy of sentiment polarity prediction, we design a weight self-adjusting sentiment polarity penalty loss function, which makes the model learn more sentiment features that are conducive to model prediction through backpropagation. Finally, a series of comparative experiments are conducted on the CMU-MOSI and CMU-MOSEI datasets. Compared with the current representative models, CSMF-SPC achieves better experimental results. Among them, the Acc-2 (including zero) metric is increased by 1.41% and 1.58% on the word-aligned and unaligned CMU-MOSI datasets respectively; it is improved by 1.50% and 2.14% respectively on the CMU-MOSEI dataset, which indicates that the improvement of CSMF-SPC is effective.</p>","PeriodicalId":54639,"journal":{"name":"Pattern Analysis and Applications","volume":"109 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142179493","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Small object detection based on YOLOv8 in UAV perspective","authors":"Tao Ning, Wantong Wu, Jin Zhang","doi":"10.1007/s10044-024-01323-7","DOIUrl":"https://doi.org/10.1007/s10044-024-01323-7","url":null,"abstract":"<p>Unmanned aerial vehicle (UAV) image object detection is a challenging task, primarily due to various factors such as multi-scale objects, a high proportion of small objects, significant overlap between objects, poor image quality, and complex and dynamic scenes. To address these challenges, several improvements were made to the YOLOv8 model. Firstly, by pruning the feature mapping layers responsible for detecting large objects in the YOLOv8 model, significant reduction in computational resources was achieved, rendering the model more lightweight. Simultaneously, a detection head fused with self-attention was introduced simultaneously to enhance the detection capability for small objects. Secondly, the introduction of space depth convolution in place of the original convolutional striding and pooling operations facilitates more effective preservation of details in low-resolution images and small objects. Lastly, a multi-level feature fusion module was designed to merge feature maps from different network layers, enhancing the network's representation capability. Results on the Visdrone dataset demonstrate that the proposed model achieved a significant 4.7% improvement in mAP50 compared to YOLOv8, while reducing the parameter count to only 39% of the original model. Moreover, transfer experiments on the TT100k dataset showed a 3.2% increase in mAP50, validating the effectiveness of the improved model for small object detection tasks in UAV images. Our code is made available at https://github.com/Wtgonw/Imporved-yolov8.</p>","PeriodicalId":54639,"journal":{"name":"Pattern Analysis and Applications","volume":"7 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142179472","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hadia Mecheri, Islam Benamirouche, Feriel Fass, Djemel Ziou, Nassima Kadri
{"title":"Prediction of rare events in the operation of household equipment using co-evolving time series","authors":"Hadia Mecheri, Islam Benamirouche, Feriel Fass, Djemel Ziou, Nassima Kadri","doi":"10.1007/s10044-024-01322-8","DOIUrl":"https://doi.org/10.1007/s10044-024-01322-8","url":null,"abstract":"<p>In this study, we propose a probabilistic approach to predict rare events by exploiting coevolving time series. The probability of a failure is calculated based on the weighted autologistic regression of these time series, accounting for specific characteristics of failures such as data imbalance. We estimate the model parameters using the maximum likelihood of the Bernoulli process. By incorporating the temporal behaviors of the various phenomena underlying the occurrence of failures and the nature of the data, we improve the prediction of rare events. Evaluations on both synthetic and real datasets demonstrate that our approach outperforms existing methods in predicting home equipment failures.</p>","PeriodicalId":54639,"journal":{"name":"Pattern Analysis and Applications","volume":"11 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142179475","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Leveraging two-level deep learning classifiers for 2D shape recognition to automatically solve geometry math word problems","authors":"Archana Boob, Mansi Radke","doi":"10.1007/s10044-024-01321-9","DOIUrl":"https://doi.org/10.1007/s10044-024-01321-9","url":null,"abstract":"<p>In mathematics, closed-domain systems for Question Answering (QA) have shown a distinct advantage over open-domain systems, primarily due to their focused use of supporting knowledge bases. This advantage is particularly salient in the era of online and hybrid tutoring, where automatic QA systems have become vital in addressing complex mathematical problems. This paper focuses on the challenge of geometric shape recognition in math word problems (MWPs) accompanied by figures that aid in the solution process. Existing systems rely on manually inputted shape information, which is less efficient. In this work, a novel customized two-layer deep learning model ‘2DGeoShapeNet’ for 2D geometric shape recognition has been developed. At the first level, it recognizes images in broad categories such as circles, quadrilaterals, or triangles. At the second level, the subtypes of quadrilaterals and triangles are detected. The proposed 2D shape detection model is trained and tested on a newly created integrated dataset, ‘GeoCQT’ (Circle, Quadrilateral, and Triangle), consisting of 6K+ images. The proposed deep learning technique achieved 93.98% accuracy on the ‘GeoCQT’ dataset. The performance of the proposed techniques is also evaluated on other geometry math word problem solver datasets such as GeoS, Geometry3K, GeoQA, PGDP5K, and PGPS9K. The proposed technique is compared with the already-published work that employed traditional image processing techniques for 2D shape detection. Findings highlight the superiority of two-level deep learning classifiers in detecting geometric shapes, marking a significant advancement in automated geometry problem-solving.</p>","PeriodicalId":54639,"journal":{"name":"Pattern Analysis and Applications","volume":"16 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142179474","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abdullah Al Mamun, Md Imranul Islam, Md Abu Sayeed Shohag, Wael Al-Kouz, KM Abdun Noor
{"title":"Multilinear principal component analysis-based tensor decomposition for fabric weave pattern recognition from high-dimensional streaming data","authors":"Abdullah Al Mamun, Md Imranul Islam, Md Abu Sayeed Shohag, Wael Al-Kouz, KM Abdun Noor","doi":"10.1007/s10044-024-01318-4","DOIUrl":"https://doi.org/10.1007/s10044-024-01318-4","url":null,"abstract":"<p>Modern textile industry integrates video sensors with automated fabric reeling systems for real-time fabric weave pattern inspection. This automation system lessens the human-vision-based cognitive load and improves fabric weave pattern inspection work. However, this automation system poses a unique challenge, particularly when dealing with high-dimensional streaming data from highly precision digital microscope cameras. The complexity arises from the continuous acquisition and management of such high-dimensional streaming video data. Considering the challenges posed by dimensionality reduction in high-dimensional data, this study employs multilinear principal component analysis (MPCA)-based tensor decomposition, a statistical technique designed to effectively reduce high-dimensional datasets into low-dimensional features. This paper proposes an innovative method for fabric weave pattern recognition (FWPR) by leveraging MPCA-based tensor decomposition to extract low-dimensional features from the high-dimensional fabric’s surface texture descriptor tensor (STDT). This proposed method replicates fabric pattern monitoring in automated fabric reeling systems by integrating a digital microscope camera to capture high-dimensional streaming video data from fabric surface texture features. Subsequently high-dimensional video data is converted into sequential image frames representing different fabric weave patterns. These image frames are processed with local binary pattern (LBP) and gray-level co-occurrence matrix (GLCM) methods to aggregate fabric’s surface pattern features and construct the high-dimensional STDT. This STDT is subsequently decomposed into low-dimensional features by leveraging MPCA, resulting in an impressive 99.99% reduction in dimension. A supervised machine learning method utilizes the extracted low-dimensional features to enable FWPR, demonstrating superiority of the proposed method over the benchmark methods in evaluation.</p>","PeriodicalId":54639,"journal":{"name":"Pattern Analysis and Applications","volume":"11 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142179564","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}