2021 17th International Conference on Machine Vision and Applications (MVA)最新文献

筛选
英文 中文
Self-Supervised Deep Fisheye Image Rectification Approach using Coordinate Relations 基于坐标关系的自监督深度鱼眼图像校正方法
2021 17th International Conference on Machine Vision and Applications (MVA) Pub Date : 2021-07-25 DOI: 10.23919/MVA51890.2021.9511349
Masaki Hosono, E. Simo-Serra, Tomonari Sonoda
{"title":"Self-Supervised Deep Fisheye Image Rectification Approach using Coordinate Relations","authors":"Masaki Hosono, E. Simo-Serra, Tomonari Sonoda","doi":"10.23919/MVA51890.2021.9511349","DOIUrl":"https://doi.org/10.23919/MVA51890.2021.9511349","url":null,"abstract":"With the ascent of wearable camera, dashcam, and autonomous vehicle technology, fisheye lens cameras are becoming more widespread. Unlike regular cameras, the videos and images taken with fisheye lens suffer from significant lens distortion, thus having detrimental effects on image processing algorithms. When the camera parameters are known, it is straight-forward to correct the distortion, however, without known camera parameters, distortion correction becomes a non-trivial task. While learning-based approaches exist, they rely on complex datasets and have limited generalization. In this work, we propose a CNN-based approach that can be trained with readily available data. We exploit the fact that relationships between pixel coordinates remain stable after homogeneous distortions to design an efficient rectification model. Experiments performed on the cityscapes dataset show the effectiveness of our approach. Our code is available at GitHub11https://github.com/MasakHosono/SelfSupervisedFisheyeRectification.","PeriodicalId":312481,"journal":{"name":"2021 17th International Conference on Machine Vision and Applications (MVA)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128103181","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A baseline for semi-supervised learning of efficient semantic segmentation models 高效语义分割模型的半监督学习基线
2021 17th International Conference on Machine Vision and Applications (MVA) Pub Date : 2021-07-25 DOI: 10.23919/MVA51890.2021.9511402
I. Grubisic, Marin Orsic, Sinisa Segvic
{"title":"A baseline for semi-supervised learning of efficient semantic segmentation models","authors":"I. Grubisic, Marin Orsic, Sinisa Segvic","doi":"10.23919/MVA51890.2021.9511402","DOIUrl":"https://doi.org/10.23919/MVA51890.2021.9511402","url":null,"abstract":"Semi-supervised learning is especially interesting in the dense prediction context due to high cost of pixel-level ground truth. Unfortunately, most such approaches are evaluated on outdated architectures which hamper research due to very slow training and high requirements on GPU RAM. We address this concern by presenting a simple and effective baseline which works very well both on standard and efficient architectures. Our baseline is based on one-way consistency and nonlinear geometric and photometric perturbations. We show advantage of perturbing only the student branch and present a plausible explanation of such behaviour. Experiments on Cityscapes and CIFAR-10 demonstrate competitive performance with respect to prior work.","PeriodicalId":312481,"journal":{"name":"2021 17th International Conference on Machine Vision and Applications (MVA)","volume":"521 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115351106","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Occlusion-Robust 3D Hand Pose Estimation from a Single RGB Image 从单个RGB图像进行遮挡-鲁棒3D手部姿态估计
2021 17th International Conference on Machine Vision and Applications (MVA) Pub Date : 2021-07-25 DOI: 10.23919/MVA51890.2021.9511389
Asuka Ishii, Gaku Nakano, Tetsuo Inoshita
{"title":"Occlusion-Robust 3D Hand Pose Estimation from a Single RGB Image","authors":"Asuka Ishii, Gaku Nakano, Tetsuo Inoshita","doi":"10.23919/MVA51890.2021.9511389","DOIUrl":"https://doi.org/10.23919/MVA51890.2021.9511389","url":null,"abstract":"We propose an occlusion-robust network for 3D hand pose estimation from a single RGB image. Severe occlusions degrade the estimation accuracy of not only occluded keypoints but also visible keypoints. Since the existing methods based on a deep neural network perform convolutions on all keypoints regardless of visibility, inaccurate features from occluded keypoints affect the localization of visible keypoints. To suppress the influence of occluded keypoints, our proposed deep neural network consists of three modules: a 2D heatmap generator, parallel sub-joints network (PSJNet), and an ensemble network (EN). First, the 2D position of all keypoints in an input image is predicted as a 2D heatmap, similar to the existing methods. Then, PSJNet, which consists of several graph convolutional networks (GCN) in parallel, estimates multiple incomplete 3D poses in which some of the keypoints have been removed. Each GCN performs convolutions on a limited number of keypoints, therefore, features from occluded keypoints do not spread to the whole pose. Finally, EN merges the incomplete poses into a single 3D pose by selecting accurate positions from them. Experimental results on a public dataset RHD demonstrate that the proposed method outperforms the existing methods in the case of both small and severe occlusions.","PeriodicalId":312481,"journal":{"name":"2021 17th International Conference on Machine Vision and Applications (MVA)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127517361","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Image Information Assistance Neural Network for VideoPose3D-based Monocular 3D Pose Estimation 基于videopose3d的单目三维姿态估计图像信息辅助神经网络
2021 17th International Conference on Machine Vision and Applications (MVA) Pub Date : 2021-07-25 DOI: 10.23919/MVA51890.2021.9511380
Hao Wang, Dingli Luo, T. Ikenaga
{"title":"Image Information Assistance Neural Network for VideoPose3D-based Monocular 3D Pose Estimation","authors":"Hao Wang, Dingli Luo, T. Ikenaga","doi":"10.23919/MVA51890.2021.9511380","DOIUrl":"https://doi.org/10.23919/MVA51890.2021.9511380","url":null,"abstract":"3D pose estimation based on a monocular camera can be applied to various fields such as human-computer interaction and human action recognition. As a two-stage 3D pose estimator, VideoPose3D achieves state-of-the-art accuracy. However, because of the limitation of two-stage processing, image information is partially lost in the process of mapping 2D poses to 3D space, which results in limited final accuracy. This paper proposes an image-assisting pose estimation model and a back-projection based offset generating module. The image-assisting pose estimation model consists of a 2D pose processing branch and an image processing branch. Image information is processed to generate an offset to refine the intermediate 3D pose produced by the 2D pose processing network. The back-projection based offset generating module projects the intermediate 3D poses to 2D space and calculates the error between the projection and input 2D pose. With the error combining with extracted image feature, the neural network generates an offset to decrease the error. By evaluation, the accuracy on each action of Human3.6M dataset gets an average improvement of 0.9 mm over the VideoPose3D baseline.","PeriodicalId":312481,"journal":{"name":"2021 17th International Conference on Machine Vision and Applications (MVA)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124450016","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On the Influence of Viewpoint Change for Metric Learning 论观点转变对公制学习的影响
2021 17th International Conference on Machine Vision and Applications (MVA) Pub Date : 2021-07-25 DOI: 10.23919/MVA51890.2021.9511344
Marco Filax, F. Ortmeier
{"title":"On the Influence of Viewpoint Change for Metric Learning","authors":"Marco Filax, F. Ortmeier","doi":"10.23919/MVA51890.2021.9511344","DOIUrl":"https://doi.org/10.23919/MVA51890.2021.9511344","url":null,"abstract":"Physical objects imaged through a camera change their visual representation based on various factors, c.g., illumination, occlusion, or viewpoint changes. Thus, it is the inevitable goal in computer vision systems to use mathematical representations of these objects robust to various changes and yet sufficient to determine even minor differences to distinguish objects. However, finding these powerful representations is challenging if the amount of data is limited, such as in few-shot learning problems. In this work, we investigate the influence of viewpoint changes in modern recognition systems in the context of metric learning problems, in which fine-grained differences differentiate objects based on their learned numeric representation. Our results demonstrate that restricting the degrees of freedom, especially by fixing the virtual viewpoint using synthetic frontal views, elevates the overall performance. We await that our observation of an increased performance using rectified patches is persistent and reproducible in other scenarios.","PeriodicalId":312481,"journal":{"name":"2021 17th International Conference on Machine Vision and Applications (MVA)","volume":"287 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113996108","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Encoding-free Incrementing Hough Transform for High Frame Rate and Ultra-low Delay Straight-line Detection 高帧率超低延迟直线检测的免编码增量霍夫变换
2021 17th International Conference on Machine Vision and Applications (MVA) Pub Date : 2021-07-25 DOI: 10.23919/MVA51890.2021.9511359
Ziwei Dong, Tingting Hu, Ryuji Fuchikami, T. Ikenaga
{"title":"Encoding-free Incrementing Hough Transform for High Frame Rate and Ultra-low Delay Straight-line Detection","authors":"Ziwei Dong, Tingting Hu, Ryuji Fuchikami, T. Ikenaga","doi":"10.23919/MVA51890.2021.9511359","DOIUrl":"https://doi.org/10.23919/MVA51890.2021.9511359","url":null,"abstract":"High frame rate and ultra-low delay straight-line detection plays an increasingly important role in highly automated factories that call for straight-line features to achieve swift locations in real scenes. However, vision systems based on CPU/GPU have a fixed delay between image capture and detection, making straight-line detection challenging to reach an ultra-low delay. Achieving detection nearly simultaneous with capture on the same image is considered. This paper proposes (A) an encoding-free incrementing Hough transform and (B) a partially compressed line parameter space to implement a straight-line detection core on an FPGA board. The encoding-free incrementing Hough transform directly calculates line parameters only by incrementing and initialization while capturing an image. Furthermore, the partially compressed line parameter space reduces the required memory resources and the path delay under the premise of accurate vote recordings for every line feature. The evaluation result shows that the proposals achieve as accurate detection (RMSE of θ on 0.0057, and RMSE of p on 2.09) as standard Hough transform (RMSE of θ on 0.0057, and RMSE of p on 2.13) and the designed detection core processes VGA (640 × 480) videos at 1.358 ms/frame delay.","PeriodicalId":312481,"journal":{"name":"2021 17th International Conference on Machine Vision and Applications (MVA)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131288360","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Critically Compressed Quantized Convolution Neural Network based High Frame Rate and Ultra-Low Delay Fruit External Defects Detection 基于临界压缩量化卷积神经网络的高帧率和超低延迟果实外部缺陷检测
2021 17th International Conference on Machine Vision and Applications (MVA) Pub Date : 2021-07-25 DOI: 10.23919/MVA51890.2021.9511388
Jihang Zhang, Dongmei Huang, Tingting Hu, Ryuji Fuchikami, T. Ikenaga
{"title":"Critically Compressed Quantized Convolution Neural Network based High Frame Rate and Ultra-Low Delay Fruit External Defects Detection","authors":"Jihang Zhang, Dongmei Huang, Tingting Hu, Ryuji Fuchikami, T. Ikenaga","doi":"10.23919/MVA51890.2021.9511388","DOIUrl":"https://doi.org/10.23919/MVA51890.2021.9511388","url":null,"abstract":"High frame rate and ultra-low delay fruit external defects detection plays a key role in high-efficiency and high-quality oriented fruit products manufacture. However, current traditional computer vision based commercial solutions still lack capability of detecting most types of deceptive external defects. Although recent researches have discovered deep learning 's great potential towards defects detection, solutions with large general CNNs are too slow to adapt to high-speed factory pipelines. This paper proposes a critically compressed separable convolution network, and bit depth degression quantization to further transform the network for FPGA acceleration, which makes the implementation of CNN on High Frame Rate and Ultra-Low Delay Vision System possible. With minimal searched specialized structure, the critically compressed separable convolution network is able to handle external quality classification task with a minuscule number of parameters. By assigning degressive bit depth to different layers according to degressive bit depth importance, the customized quantization is able to compress our network more efficiently than traditional method. The proposed network consists 0.1% weight size of MobileNet (alpha = 0.25), while only a 1.54% drop of overall accuracy on validation set is observed. The hardware estimation shows the network classification unit is able to work at 0.672 ms delay with the resolution of 100*100 and up to 6 classification units parallelly.","PeriodicalId":312481,"journal":{"name":"2021 17th International Conference on Machine Vision and Applications (MVA)","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130940090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Practical Descattering of Transmissive Inspection Using Slanted Linear Image Sensors 倾斜线性图像传感器透射检测的实用散射
2021 17th International Conference on Machine Vision and Applications (MVA) Pub Date : 2021-07-25 DOI: 10.23919/MVA51890.2021.9511372
Takahiro Kushida, Kenichiro Tanaka, Takuya Funatomi, K. Tahara, Y. Kagawa, Y. Mukaigawa
{"title":"Practical Descattering of Transmissive Inspection Using Slanted Linear Image Sensors","authors":"Takahiro Kushida, Kenichiro Tanaka, Takuya Funatomi, K. Tahara, Y. Kagawa, Y. Mukaigawa","doi":"10.23919/MVA51890.2021.9511372","DOIUrl":"https://doi.org/10.23919/MVA51890.2021.9511372","url":null,"abstract":"This paper presents an industry-ready descattering method that is easily applied to a food production line. The system consists of multiple sets comprising a linear image sensor and linear light source, which are slanted at different angles. The images captured by these sensors, which are partially clear along the perpendicular direction to the sensor, are computationally integrated into a single clear image over the frequency domain. We assess the effectiveness of the proposed method by simulation and by our prototype system, which demonstrates the feasibility of the proposed method on an actual production line.","PeriodicalId":312481,"journal":{"name":"2021 17th International Conference on Machine Vision and Applications (MVA)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115331638","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Live Video Action Recognition from Unsupervised Action Proposals 来自无监督动作提案的实时视频动作识别
2021 17th International Conference on Machine Vision and Applications (MVA) Pub Date : 2021-07-25 DOI: 10.23919/MVA51890.2021.9511355
Roberto J. Lópcz-Sastrc, Marcos Baptista-Ríos, F. J. Acevedo-Rodríguez, P. Martín-Martín, S. Maldonado-Bascón
{"title":"Live Video Action Recognition from Unsupervised Action Proposals","authors":"Roberto J. Lópcz-Sastrc, Marcos Baptista-Ríos, F. J. Acevedo-Rodríguez, P. Martín-Martín, S. Maldonado-Bascón","doi":"10.23919/MVA51890.2021.9511355","DOIUrl":"https://doi.org/10.23919/MVA51890.2021.9511355","url":null,"abstract":"The problem of action detection in untrimmed videos consists in localizing those parts of a certain video that can contain an action. Typically, state-of-the-art approaches to this problem use a temporal action proposals (TAPs) generator followed by an action classifier module. Moreover, TAPs solutions are learned from a supervised setting, and need the entire video to be processed to produce effective proposals. These properties become a limitation for certain real applications in which a system requires to know the content of the video in an online fashion. To do so, in this work we introduce a live video action detection application which integrates the action classifier step with an unsupervised and online TAPs generator. We evaluate, for the first time, the precision of this novel pipeline for the problem of action detection in untrimmed videos. We offer a thorough experimental evaluation in Activi-tyNet dataset, where our unsupervised model can compete with the state-of-the-art supervised solutions.","PeriodicalId":312481,"journal":{"name":"2021 17th International Conference on Machine Vision and Applications (MVA)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124985649","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Action Spotting and Temporal Attention Analysis in Soccer Videos 足球录像中的动作识别与时间注意力分析
2021 17th International Conference on Machine Vision and Applications (MVA) Pub Date : 2021-07-25 DOI: 10.23919/MVA51890.2021.9511342
H. Minoura, Tsubasa Hirakawa, Takayoshi Yamashita, H. Fujiyoshi, Mitsuru Nakazawa, Yeongnam Chae, B. Stenger
{"title":"Action Spotting and Temporal Attention Analysis in Soccer Videos","authors":"H. Minoura, Tsubasa Hirakawa, Takayoshi Yamashita, H. Fujiyoshi, Mitsuru Nakazawa, Yeongnam Chae, B. Stenger","doi":"10.23919/MVA51890.2021.9511342","DOIUrl":"https://doi.org/10.23919/MVA51890.2021.9511342","url":null,"abstract":"Action spotting is the task of finding a specific action in a video. In this paper, we consider the task of spotting actions in soccer videos, e.g., goals, player substitutions, and card scenes, which are temporally sparse within a complete game. We spot actions using a Transformer model, which allows capturing important features before and after action scenes. Moreover, we analyze which time instances the model focuses on when predicting an action by observing the internal weights of the transformer. Quantitative results on the public SoccerNet dataset show that the proposed method achieves an mAP of 81.6%, a significant improvement over previous methods. In addition, by analyzing the attention weights, we discover that the model focuses on different temporal neighborhoods for different actions.","PeriodicalId":312481,"journal":{"name":"2021 17th International Conference on Machine Vision and Applications (MVA)","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121948415","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信