Machine Vision and Applications最新文献_第8页

Multi-person 3D pose estimation from unlabelled data 从无标签数据中估算多人三维姿态

IF 3.3 4区计算机科学

Machine Vision and Applications Pub Date : 2024-04-06 DOI: 10.1007/s00138-024-01530-6

Daniel Rodriguez-Criado, Pilar Bachiller-Burgos, George Vogiatzis, Luis J. Manso

{"title":"Multi-person 3D pose estimation from unlabelled data","authors":"Daniel Rodriguez-Criado, Pilar Bachiller-Burgos, George Vogiatzis, Luis J. Manso","doi":"10.1007/s00138-024-01530-6","DOIUrl":"https://doi.org/10.1007/s00138-024-01530-6","url":null,"abstract":"Its numerous applications make multi-human 3D pose estimation a remarkably impactful area of research. Nevertheless, it presents several challenges, especially when approached using multiple views and regular RGB cameras as the only input. First, each person must be uniquely identified in the different views. Secondly, it must be robust to noise, partial occlusions, and views where a person may not be detected. Thirdly, many pose estimation approaches rely on environment-specific annotated datasets that are frequently prohibitively expensive and/or require specialised hardware. Specifically, this is the first multi-camera, multi-person data-driven approach that does not require an annotated dataset. In this work, we address these three challenges with the help of self-supervised learning. In particular, we present a three-staged pipeline and a rigorous evaluation providing evidence that our approach performs faster than other state-of-the-art algorithms, with comparable accuracy, and most importantly, does not require annotated datasets. The pipeline is composed of a 2D skeleton detection step, followed by a Graph Neural Network to estimate cross-view correspondences of the people in the scenario, and a Multi-Layer Perceptron that transforms the 2D information into 3D pose estimations. Our proposal comprises the last two steps, and it is compatible with any 2D skeleton detector as input. These two models are trained in a self-supervised manner, thus avoiding the need for datasets annotated with 3D ground-truth poses.","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":"40 1","pages":""},"PeriodicalIF":3.3,"publicationDate":"2024-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140587383","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

USIR-Net: sand-dust image restoration based on unsupervised learning USIR-Net：基于无监督学习的沙尘图像修复技术

IF 3.3 4区计算机科学

Machine Vision and Applications Pub Date : 2024-04-01 DOI: 10.1007/s00138-024-01528-0

Yuan Ding, Kaijun Wu

{"title":"USIR-Net: sand-dust image restoration based on unsupervised learning","authors":"Yuan Ding, Kaijun Wu","doi":"10.1007/s00138-024-01528-0","DOIUrl":"https://doi.org/10.1007/s00138-024-01528-0","url":null,"abstract":"In sand-dust weather, the influence of sand-dust particles on imaging equipment often results in images with color deviation, blurring, and low contrast, among other issues. These problems making many traditional image restoration methods unable to accurately estimate the semantic information of the images and consequently resulting in poor restoration of clear images. Most current image restoration methods in the field of deep learning are based on supervised learning, which requires pairing and labeling a large amount of data, and the possibility of manual annotation errors. In light of this, we propose an unsupervised sand-dust image restoration network. The overall model adopts an improved CycleGAN to fit unpaired sand-dust images. Firstly, multiscale skip connections in the multiscale cascaded attention module are used to enhance the feature fusion effect after downsampling. Secondly, multi-head convolutional attention with multiple input concatenations is employed, with each head using different kernel sizes to improve the ability to restore detail information. Finally, the adaptive decoder-encoder module is used to achieve adaptive fitting of the model and output the restored image. According to the experiments conducted on the dataset, the qualitative and quantitative indicators of USIR-Net are superior to the selected comparison algorithms, furthermore, in additional experiments conducted on haze removal and underwater image enhancement, we have demonstrated the wide applicability of our model.","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":"94 1","pages":""},"PeriodicalIF":3.3,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140587365","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Ssman: self-supervised masked adaptive network for 3D human pose estimation Ssman：用于三维人体姿态估计的自监督屏蔽自适应网络

IF 3.3 4区计算机科学

Machine Vision and Applications Pub Date : 2024-03-27 DOI: 10.1007/s00138-024-01514-6

{"title":"Ssman: self-supervised masked adaptive network for 3D human pose estimation","authors":"","doi":"10.1007/s00138-024-01514-6","DOIUrl":"https://doi.org/10.1007/s00138-024-01514-6","url":null,"abstract":"<h3>Abstract</h3> The modern deep learning-based models for 3D human pose estimation from monocular images always lack the adaption ability between occlusion and non-occlusion scenarios, which might restrict the performance of current methods when faced with various scales of occluded conditions. In an attempt to tackle this problem, we propose a novel network called self-supervised masked adaptive network (SSMAN). Firstly, we leverage different levels of masks to cover the richness of occlusion in fully in-the-wild environment. Then, we design a multi-line adaptive network, which could be trained with various scales of masked images in parallel. Based on this masked adaptive network, we train it with self-supervised learning to enforce the consistency across the outputs under different mask ratios. Furthermore, a global refinement module is proposed to leverage global features of the human body to refine the human pose estimated solely by local features. We perform extensive experiments both on the occlusion datasets like 3DPW-OCC and OCHuman and general datasets such as Human3.6M and 3DPW. The results show that SSMAN achieves new state-of-the-art performance on both lightly and heavily occluded benchmarks and is highly competitive with significant improvement on standard benchmarks.","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":"6 1","pages":""},"PeriodicalIF":3.3,"publicationDate":"2024-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140316482","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Kernel based local matching network for video object segmentation 基于核的局部匹配网络用于视频对象分割

IF 3.3 4区计算机科学

Machine Vision and Applications Pub Date : 2024-03-25 DOI: 10.1007/s00138-024-01524-4

Guoqiang Wang, Lan Li, Min Zhu, Rui Zhao, Xiang Zhang

{"title":"Kernel based local matching network for video object segmentation","authors":"Guoqiang Wang, Lan Li, Min Zhu, Rui Zhao, Xiang Zhang","doi":"10.1007/s00138-024-01524-4","DOIUrl":"https://doi.org/10.1007/s00138-024-01524-4","url":null,"abstract":"Recently, the methods based on space-time memory network have achieved advanced performance in semi-supervised video object segmentation, which has attracted wide attention. However, this kind of methods still have a fatal limitation. It has the interference problem of similar objects caused by the way of non-local matching, which seriously limits the performance of video object segmentation. To solve this problem, we propose a Kernel-guided Attention Matching Network (KAMNet) by the use of local matching instead of non-local matching. At first, KAMNet uses spatio-temporal attention mechanism to enhance the model’s discrimination between foreground objects and background areas. Then KAMNet utilizes gaussian kernel to guide the matching between the current frame and the reference set. Because the gaussian kernel decays away from the center, it can limit the matching to the central region, thus achieving local matching. Our KAMNet gets speed-accuracy trade-off on benchmark datasets DAVIS 2016 (( mathcal {J & F}) of 87.6%) and DAVIS 2017 (( mathcal {J & F}) of 76.0%) with 0.12 second per frame.","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":"46 1","pages":""},"PeriodicalIF":3.3,"publicationDate":"2024-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140300673","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Addressing the generalization of 3D registration methods with a featureless baseline and an unbiased benchmark 利用无特征基线和无偏基准解决三维注册方法的通用性问题

IF 3.3 4区计算机科学

Machine Vision and Applications Pub Date : 2024-03-23 DOI: 10.1007/s00138-024-01510-w

David Bojanić, Kristijan Bartol, Josep Forest, Tomislav Petković, Tomislav Pribanić

{"title":"Addressing the generalization of 3D registration methods with a featureless baseline and an unbiased benchmark","authors":"David Bojanić, Kristijan Bartol, Josep Forest, Tomislav Petković, Tomislav Pribanić","doi":"10.1007/s00138-024-01510-w","DOIUrl":"https://doi.org/10.1007/s00138-024-01510-w","url":null,"abstract":"Recent 3D registration methods are mostly learning-based that either find correspondences in feature space and match them, or directly estimate the registration transformation from the given point cloud features. Therefore, these feature-based methods have difficulties with generalizing onto point clouds that differ substantially from their training data. This issue is not so apparent because of the problematic benchmark definitions that cannot provide any in-depth analysis and contain a bias toward similar data. Therefore, we propose a methodology to create a 3D registration benchmark, given a point cloud dataset, that provides a more informative evaluation of a method w.r.t. other benchmarks. Using this methodology, we create a novel FAUST-partial (FP) benchmark, based on the FAUST dataset, with several difficulty levels. The FP benchmark addresses the limitations of the current benchmarks: lack of data and parameter range variability, and allows to evaluate the strengths and weaknesses of a 3D registration method w.r.t. a single registration parameter. Using the new FP benchmark, we provide a thorough analysis of the current state-of-the-art methods and observe that the current method still struggle to generalize onto severely different out-of-sample data. Therefore, we propose a simple featureless traditional 3D registration baseline method based on the weighted cross-correlation between two given point clouds. Our method achieves strong results on current benchmarking datasets, outperforming most deep learning methods. Our source code is available on github.com/DavidBoja/exhaustive-grid-search.","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":"2015 1","pages":""},"PeriodicalIF":3.3,"publicationDate":"2024-03-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140197772","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

AFMCT: adaptive fusion module based on cross-modal transformer block for 3D object detection AFMCT：基于跨模态变换块的自适应融合模块，用于三维物体检测

IF 3.3 4区计算机科学

Machine Vision and Applications Pub Date : 2024-03-23 DOI: 10.1007/s00138-024-01509-3

Bingli Zhang, Yixin Wang, Chengbiao Zhang, Junzhao Jiang, Zehao Pan, Jin Cheng, Yangyang Zhang, Xinyu Wang, Chenglei Yang, Yanhui Wang

{"title":"AFMCT: adaptive fusion module based on cross-modal transformer block for 3D object detection","authors":"Bingli Zhang, Yixin Wang, Chengbiao Zhang, Junzhao Jiang, Zehao Pan, Jin Cheng, Yangyang Zhang, Xinyu Wang, Chenglei Yang, Yanhui Wang","doi":"10.1007/s00138-024-01509-3","DOIUrl":"https://doi.org/10.1007/s00138-024-01509-3","url":null,"abstract":"Lidar and camera are essential sensors for environment perception in autonomous driving. However, fully fusing heterogeneous data from multiple sources remains a non-trivial challenge. As a result, 3D object detection based on multi-modal sensor fusion are often inferior to single-modal methods only based on Lidar, which indicates that multi-sensor machine vision still needs development. In this paper, we propose an adaptive fusion module based on cross-modal transformer block(AFMCT) for 3D object detection by utilizing a bidirectional enhancing strategy. Specifically, we first enhance image feature by extracting an attention-based point feature based on a cross-modal transformer block and linking them in a concatenation fashion, followed by another cross-modal transformer block acting on the enhanced image feature to strengthen the point feature with image semantic information. Extensive experiments operated on the 3D detection benchmark of the KITTI dataset reveal that our proposed structure can significantly improve the detection accuracy of Lidar-only methods and outperform the existing advanced multi-sensor fusion modules by at least 0.45%, which indicates that our method might be a feasible solution to improving 3D object detection based on multi-sensor fusion.","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":"30 1","pages":""},"PeriodicalIF":3.3,"publicationDate":"2024-03-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140197472","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Hyperspectral image dynamic range reconstruction using deep neural network-based denoising methods 利用基于深度神经网络的去噪方法重建高光谱图像动态范围

IF 3.3 4区计算机科学

Machine Vision and Applications Pub Date : 2024-03-22 DOI: 10.1007/s00138-024-01523-5

Loran Cheplanov, Shai Avidan, David J. Bonfil, Iftach Klapp

{"title":"Hyperspectral image dynamic range reconstruction using deep neural network-based denoising methods","authors":"Loran Cheplanov, Shai Avidan, David J. Bonfil, Iftach Klapp","doi":"10.1007/s00138-024-01523-5","DOIUrl":"https://doi.org/10.1007/s00138-024-01523-5","url":null,"abstract":"Hyperspectral (HS) measurement is among the most useful tools in agriculture for early disease detection. However, the cost of HS cameras that can perform the desired detection tasks is prohibitive-typically fifty thousand to hundreds of thousands of dollars. In a previous study at the Agricultural Research Organization’s Volcani Institute (Israel), a low-cost, high-performing HS system was developed which included a point spectrometer and optical components. Its main disadvantage was long shooting time for each image. Shooting time strongly depends on the predetermined integration time of the point spectrometer. While essential for performing monitoring tasks in a reasonable time, shortening integration time from a typical value in the range of 200 ms to the 10 ms range results in deterioration of the dynamic range of the captured scene. In this work, we suggest correcting this by learning the transformation from data measured with short integration time to that measured with long integration time. Reduction of the dynamic range and consequent low SNR were successfully overcome using three developed deep neural networks models based on a denoising auto-encoder, DnCNN and LambdaNetworks architectures as a backbone. The best model was based on DnCNN using a combined loss function of (ell _{2}) and Kullback–Leibler divergence on images with 20 consecutive channels. The full spectrum of the model achieved a mean PSNR of 30.61 and mean SSIM of 0.9, showing total improvement relatively to the 10 ms measurements’ mean PSNR and mean SSIM values by 60.43% and 94.51%, respectively.","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":"25 1","pages":""},"PeriodicalIF":3.3,"publicationDate":"2024-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140197865","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Point cloud registration with quantile assignment 利用量化赋值进行点云注册

IF 3.3 4区计算机科学

Machine Vision and Applications Pub Date : 2024-03-19 DOI: 10.1007/s00138-024-01517-3

Ecenur Oğuz, Yalım Doğan, Uğur Güdükbay, Oya Karaşan, Mustafa Pınar

引用次数: 0

An image quality assessment method based on edge extraction and singular value for blurriness 基于边缘提取和奇异值的模糊度图像质量评估方法

IF 3.3 4区计算机科学

Machine Vision and Applications Pub Date : 2024-03-19 DOI: 10.1007/s00138-024-01522-6

Lei Zhou, Chuanlin Liu, Amit Yadav, Sami Azam, Asif Karim

{"title":"An image quality assessment method based on edge extraction and singular value for blurriness","authors":"Lei Zhou, Chuanlin Liu, Amit Yadav, Sami Azam, Asif Karim","doi":"10.1007/s00138-024-01522-6","DOIUrl":"https://doi.org/10.1007/s00138-024-01522-6","url":null,"abstract":"The automatic assessment of perceived image quality is crucial in the field of image processing. To achieve this idea, we propose an image quality assessment (IQA) method for blurriness. The features of gradient and singular value were extracted in this method instead of the single feature in the traditional IQA algorithms. According to the insufficient size of existing public image quality assessment datasets to support deep learning, machine learning was introduced to fuse the features of multiple domains, and a new no-reference (NR) IQA method for blurriness denoted Feature fusion IQA(Ffu-IQA) was proposed. The Ffu-IQA uses a probabilistic model to estimate the probability of each edge detection blur in the image, and then uses machine learning to aggregate the probability information to obtain the edge quality score. After that uses the singular value obtained by singular value decomposition of the image matrix to calculate the singular value score. Finally, machine learning pooling is used to obtain the true quality score. Ffu-IQA achieves PLCC scores of 0.9570 and 0.9616 on CSIQ and TID2013, respectively, and SROCC scores of 0.9380 and 0.9531, which are better than most traditional image quality assessment methods for blurriness.","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":"6 1","pages":""},"PeriodicalIF":3.3,"publicationDate":"2024-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140171870","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Temporal teacher with masked transformers for semi-supervised action proposal generation 半监督行动建议生成时态教师与遮蔽变换器

IF 3.3 4区计算机科学

Machine Vision and Applications Pub Date : 2024-03-15 DOI: 10.1007/s00138-024-01521-7

Selen Pehlivan, Jorma Laaksonen

{"title":"Temporal teacher with masked transformers for semi-supervised action proposal generation","authors":"Selen Pehlivan, Jorma Laaksonen","doi":"10.1007/s00138-024-01521-7","DOIUrl":"https://doi.org/10.1007/s00138-024-01521-7","url":null,"abstract":"By conditioning on unit-level predictions, anchor-free models for action proposal generation have displayed impressive capabilities, such as having a lightweight architecture. However, task performance depends significantly on the quality of data used in training, and most effective models have relied on human-annotated data. Semi-supervised learning, i.e., jointly training deep neural networks with a labeled dataset as well as an unlabeled dataset, has made significant progress recently. Existing works have either primarily focused on classification tasks, which may require less annotation effort, or considered anchor-based detection models. Inspired by recent advances in semi-supervised methods on anchor-free object detectors, we propose a teacher-student framework for a two-stage action detection pipeline, named Temporal Teacher with Masked Transformers (TTMT), to generate high-quality action proposals based on an anchor-free transformer model. Leveraging consistency learning as one self-training technique, the model jointly trains an anchor-free student model and a gradually progressing teacher counterpart in a mutually beneficial manner. As the core model, we design a Transformer-based anchor-free model to improve effectiveness for temporal evaluation. We integrate bi-directional masks and devise encoder-only Masked Transformers for sequences. Jointly training on boundary locations and various local snippet-based features, our model predicts via the proposed scoring function for generating proposal candidates. Experiments on the THUMOS14 and ActivityNet-1.3 benchmarks demonstrate the effectiveness of our model for temporal proposal generation task.","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":"67 1","pages":""},"PeriodicalIF":3.3,"publicationDate":"2024-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140150877","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0