2021 17th International Conference on Machine Vision and Applications (MVA)最新文献_第6页

Self-Supervised Deep Fisheye Image Rectification Approach using Coordinate Relations 基于坐标关系的自监督深度鱼眼图像校正方法

2021 17th International Conference on Machine Vision and Applications (MVA) Pub Date : 2021-07-25 DOI: 10.23919/MVA51890.2021.9511349

Masaki Hosono, E. Simo-Serra, Tomonari Sonoda

引用次数: 1

A baseline for semi-supervised learning of efficient semantic segmentation models 高效语义分割模型的半监督学习基线

2021 17th International Conference on Machine Vision and Applications (MVA) Pub Date : 2021-07-25 DOI: 10.23919/MVA51890.2021.9511402

I. Grubisic, Marin Orsic, Sinisa Segvic

引用次数: 3

Occlusion-Robust 3D Hand Pose Estimation from a Single RGB Image 从单个RGB图像进行遮挡-鲁棒3D手部姿态估计

2021 17th International Conference on Machine Vision and Applications (MVA) Pub Date : 2021-07-25 DOI: 10.23919/MVA51890.2021.9511389

Asuka Ishii, Gaku Nakano, Tetsuo Inoshita

{"title":"Occlusion-Robust 3D Hand Pose Estimation from a Single RGB Image","authors":"Asuka Ishii, Gaku Nakano, Tetsuo Inoshita","doi":"10.23919/MVA51890.2021.9511389","DOIUrl":"https://doi.org/10.23919/MVA51890.2021.9511389","url":null,"abstract":"We propose an occlusion-robust network for 3D hand pose estimation from a single RGB image. Severe occlusions degrade the estimation accuracy of not only occluded keypoints but also visible keypoints. Since the existing methods based on a deep neural network perform convolutions on all keypoints regardless of visibility, inaccurate features from occluded keypoints affect the localization of visible keypoints. To suppress the influence of occluded keypoints, our proposed deep neural network consists of three modules: a 2D heatmap generator, parallel sub-joints network (PSJNet), and an ensemble network (EN). First, the 2D position of all keypoints in an input image is predicted as a 2D heatmap, similar to the existing methods. Then, PSJNet, which consists of several graph convolutional networks (GCN) in parallel, estimates multiple incomplete 3D poses in which some of the keypoints have been removed. Each GCN performs convolutions on a limited number of keypoints, therefore, features from occluded keypoints do not spread to the whole pose. Finally, EN merges the incomplete poses into a single 3D pose by selecting accurate positions from them. Experimental results on a public dataset RHD demonstrate that the proposed method outperforms the existing methods in the case of both small and severe occlusions.","PeriodicalId":312481,"journal":{"name":"2021 17th International Conference on Machine Vision and Applications (MVA)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127517361","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Image Information Assistance Neural Network for VideoPose3D-based Monocular 3D Pose Estimation 基于videopose3d的单目三维姿态估计图像信息辅助神经网络

2021 17th International Conference on Machine Vision and Applications (MVA) Pub Date : 2021-07-25 DOI: 10.23919/MVA51890.2021.9511380

Hao Wang, Dingli Luo, T. Ikenaga

{"title":"Image Information Assistance Neural Network for VideoPose3D-based Monocular 3D Pose Estimation","authors":"Hao Wang, Dingli Luo, T. Ikenaga","doi":"10.23919/MVA51890.2021.9511380","DOIUrl":"https://doi.org/10.23919/MVA51890.2021.9511380","url":null,"abstract":"3D pose estimation based on a monocular camera can be applied to various fields such as human-computer interaction and human action recognition. As a two-stage 3D pose estimator, VideoPose3D achieves state-of-the-art accuracy. However, because of the limitation of two-stage processing, image information is partially lost in the process of mapping 2D poses to 3D space, which results in limited final accuracy. This paper proposes an image-assisting pose estimation model and a back-projection based offset generating module. The image-assisting pose estimation model consists of a 2D pose processing branch and an image processing branch. Image information is processed to generate an offset to refine the intermediate 3D pose produced by the 2D pose processing network. The back-projection based offset generating module projects the intermediate 3D poses to 2D space and calculates the error between the projection and input 2D pose. With the error combining with extracted image feature, the neural network generates an offset to decrease the error. By evaluation, the accuracy on each action of Human3.6M dataset gets an average improvement of 0.9 mm over the VideoPose3D baseline.","PeriodicalId":312481,"journal":{"name":"2021 17th International Conference on Machine Vision and Applications (MVA)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124450016","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

On the Influence of Viewpoint Change for Metric Learning 论观点转变对公制学习的影响

2021 17th International Conference on Machine Vision and Applications (MVA) Pub Date : 2021-07-25 DOI: 10.23919/MVA51890.2021.9511344

Marco Filax, F. Ortmeier

引用次数: 1

Encoding-free Incrementing Hough Transform for High Frame Rate and Ultra-low Delay Straight-line Detection 高帧率超低延迟直线检测的免编码增量霍夫变换

2021 17th International Conference on Machine Vision and Applications (MVA) Pub Date : 2021-07-25 DOI: 10.23919/MVA51890.2021.9511359

Ziwei Dong, Tingting Hu, Ryuji Fuchikami, T. Ikenaga

{"title":"Encoding-free Incrementing Hough Transform for High Frame Rate and Ultra-low Delay Straight-line Detection","authors":"Ziwei Dong, Tingting Hu, Ryuji Fuchikami, T. Ikenaga","doi":"10.23919/MVA51890.2021.9511359","DOIUrl":"https://doi.org/10.23919/MVA51890.2021.9511359","url":null,"abstract":"High frame rate and ultra-low delay straight-line detection plays an increasingly important role in highly automated factories that call for straight-line features to achieve swift locations in real scenes. However, vision systems based on CPU/GPU have a fixed delay between image capture and detection, making straight-line detection challenging to reach an ultra-low delay. Achieving detection nearly simultaneous with capture on the same image is considered. This paper proposes (A) an encoding-free incrementing Hough transform and (B) a partially compressed line parameter space to implement a straight-line detection core on an FPGA board. The encoding-free incrementing Hough transform directly calculates line parameters only by incrementing and initialization while capturing an image. Furthermore, the partially compressed line parameter space reduces the required memory resources and the path delay under the premise of accurate vote recordings for every line feature. The evaluation result shows that the proposals achieve as accurate detection (RMSE of θ on 0.0057, and RMSE of p on 2.09) as standard Hough transform (RMSE of θ on 0.0057, and RMSE of p on 2.13) and the designed detection core processes VGA (640 × 480) videos at 1.358 ms/frame delay.","PeriodicalId":312481,"journal":{"name":"2021 17th International Conference on Machine Vision and Applications (MVA)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131288360","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Critically Compressed Quantized Convolution Neural Network based High Frame Rate and Ultra-Low Delay Fruit External Defects Detection 基于临界压缩量化卷积神经网络的高帧率和超低延迟果实外部缺陷检测

2021 17th International Conference on Machine Vision and Applications (MVA) Pub Date : 2021-07-25 DOI: 10.23919/MVA51890.2021.9511388

Jihang Zhang, Dongmei Huang, Tingting Hu, Ryuji Fuchikami, T. Ikenaga

{"title":"Critically Compressed Quantized Convolution Neural Network based High Frame Rate and Ultra-Low Delay Fruit External Defects Detection","authors":"Jihang Zhang, Dongmei Huang, Tingting Hu, Ryuji Fuchikami, T. Ikenaga","doi":"10.23919/MVA51890.2021.9511388","DOIUrl":"https://doi.org/10.23919/MVA51890.2021.9511388","url":null,"abstract":"High frame rate and ultra-low delay fruit external defects detection plays a key role in high-efficiency and high-quality oriented fruit products manufacture. However, current traditional computer vision based commercial solutions still lack capability of detecting most types of deceptive external defects. Although recent researches have discovered deep learning 's great potential towards defects detection, solutions with large general CNNs are too slow to adapt to high-speed factory pipelines. This paper proposes a critically compressed separable convolution network, and bit depth degression quantization to further transform the network for FPGA acceleration, which makes the implementation of CNN on High Frame Rate and Ultra-Low Delay Vision System possible. With minimal searched specialized structure, the critically compressed separable convolution network is able to handle external quality classification task with a minuscule number of parameters. By assigning degressive bit depth to different layers according to degressive bit depth importance, the customized quantization is able to compress our network more efficiently than traditional method. The proposed network consists 0.1% weight size of MobileNet (alpha = 0.25), while only a 1.54% drop of overall accuracy on validation set is observed. The hardware estimation shows the network classification unit is able to work at 0.672 ms delay with the resolution of 100*100 and up to 6 classification units parallelly.","PeriodicalId":312481,"journal":{"name":"2021 17th International Conference on Machine Vision and Applications (MVA)","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130940090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Practical Descattering of Transmissive Inspection Using Slanted Linear Image Sensors 倾斜线性图像传感器透射检测的实用散射

2021 17th International Conference on Machine Vision and Applications (MVA) Pub Date : 2021-07-25 DOI: 10.23919/MVA51890.2021.9511372

Takahiro Kushida, Kenichiro Tanaka, Takuya Funatomi, K. Tahara, Y. Kagawa, Y. Mukaigawa

引用次数: 0

Live Video Action Recognition from Unsupervised Action Proposals 来自无监督动作提案的实时视频动作识别

2021 17th International Conference on Machine Vision and Applications (MVA) Pub Date : 2021-07-25 DOI: 10.23919/MVA51890.2021.9511355

Roberto J. Lópcz-Sastrc, Marcos Baptista-Ríos, F. J. Acevedo-Rodríguez, P. Martín-Martín, S. Maldonado-Bascón

引用次数: 0

Action Spotting and Temporal Attention Analysis in Soccer Videos 足球录像中的动作识别与时间注意力分析

2021 17th International Conference on Machine Vision and Applications (MVA) Pub Date : 2021-07-25 DOI: 10.23919/MVA51890.2021.9511342

H. Minoura, Tsubasa Hirakawa, Takayoshi Yamashita, H. Fujiyoshi, Mitsuru Nakazawa, Yeongnam Chae, B. Stenger

引用次数: 2