2021 17th International Conference on Machine Vision and Applications (MVA)最新文献

筛选
英文 中文
Bi-directional Recurrent MVSNet for High-resolution Multi-view Stereo 用于高分辨率多视点立体的双向循环MVSNet
2021 17th International Conference on Machine Vision and Applications (MVA) Pub Date : 2021-07-25 DOI: 10.23919/MVA51890.2021.9511358
Taku Fujitomi, Seiya Ito, Naoshi Kaneko, K. Sumi
{"title":"Bi-directional Recurrent MVSNet for High-resolution Multi-view Stereo","authors":"Taku Fujitomi, Seiya Ito, Naoshi Kaneko, K. Sumi","doi":"10.23919/MVA51890.2021.9511358","DOIUrl":"https://doi.org/10.23919/MVA51890.2021.9511358","url":null,"abstract":"Learning-based multi-view stereo regularizes cost volumes containing spatial information to reduce noise and improve the quality of a depth map. Cost volume regularization using 3D CNNs consumes a large amount of memory, making it difficult to scale up the network architecture. Recent work proposed a cost-volume regularization method that applies 2D convolutional GRUs and significantly reduces memory consumption. However, this uni-directional recurrent processing has a narrower receptive field than 3D CNNs because the regularized cost at a time step does not contain information about future time steps. In this paper, we propose a cost volume regularization method using bi-directional GRUs that expands the receptive field in the depth direction. In our experiments, our proposed method significantly outperforms the conventional methods in several benchmarks while maintaining low memory consumption.","PeriodicalId":312481,"journal":{"name":"2021 17th International Conference on Machine Vision and Applications (MVA)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121050218","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Contextual Information based Network with High-Frequency Feature Fusion for High Frame Rate and Ultra-Low Delay Small-Scale Object Detection 基于上下文信息的高频特征融合网络高帧率超低延迟小尺度目标检测
2021 17th International Conference on Machine Vision and Applications (MVA) Pub Date : 2021-07-25 DOI: 10.23919/MVA51890.2021.9511387
Dongmei Huang, Jihang Zhang, Tingting Hu, Ryuji Fuchikami, T. Ikenaga
{"title":"Contextual Information based Network with High-Frequency Feature Fusion for High Frame Rate and Ultra-Low Delay Small-Scale Object Detection","authors":"Dongmei Huang, Jihang Zhang, Tingting Hu, Ryuji Fuchikami, T. Ikenaga","doi":"10.23919/MVA51890.2021.9511387","DOIUrl":"https://doi.org/10.23919/MVA51890.2021.9511387","url":null,"abstract":"High frame rate and ultra-low delay small-scale object detection plays an important role in factory automation for its timely and accurate reaction. Although many CNN based detection methods have been proposed to improve the accuracy of small object detection for the low resolution and large gap between the object and the background, it is difficult to achieve a trade-off between accuracy and speed. For the pursuit of ultra-low delay processing by utilizing FPGA, this paper proposes: (A) IoU and distance based loss function, (B) Contextual information with high temporal correlation based parallel detection, (C) High frequency feature fusion for enhancing low-bit networks. The proposed methods achieve 45.3 % mAP for test sequences, which is only 0.7 % mAP lower compared with the general method. Meanwhile, the size of the model has been compressed to 1.94 % of the original size and reaches a speed of 278 fPs on FPGA and 15 fPs on GPU.","PeriodicalId":312481,"journal":{"name":"2021 17th International Conference on Machine Vision and Applications (MVA)","volume":"17 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126034196","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Selecting an Iconic Pose From an Action Video 从动作视频中选择一个标志性的姿势
2021 17th International Conference on Machine Vision and Applications (MVA) Pub Date : 2021-07-25 DOI: 10.23919/MVA51890.2021.9511347
Geethu Miriam Jacob, B. Stenger
{"title":"Selecting an Iconic Pose From an Action Video","authors":"Geethu Miriam Jacob, B. Stenger","doi":"10.23919/MVA51890.2021.9511347","DOIUrl":"https://doi.org/10.23919/MVA51890.2021.9511347","url":null,"abstract":"This paper presents a method for selecting an iconic pose frame from an action video. An iconic pose frame is a frame showing a representative pose, distinct from other actions. We first extract a diverse set of keyframes from the video using unsupervised video summarization. A classification loss ensures that the selected frames retain high action classification accuracy. To find iconic poses, we introduce two loss terms, an Extreme Pose Loss, encouraging selecting poses far from the mean pose, and a Frame Contrastive Loss, which encourages poses from the same action to be similar. In a user preference study on UCF-101 videos we show that the automatically selected iconic pose keyframes are preferred to manually selected ones in 48% of cases.","PeriodicalId":312481,"journal":{"name":"2021 17th International Conference on Machine Vision and Applications (MVA)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123566885","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Attention Mining Branch for Optimizing Attention Map 优化注意图的注意挖掘分支
2021 17th International Conference on Machine Vision and Applications (MVA) Pub Date : 2021-07-25 DOI: 10.23919/MVA51890.2021.9511357
Takaaki Iwayoshi, Masahiro Mitsuhara, Masayuki Takada, Tsubasa Hirakawa, Takayoshi Yamashita, H. Fujiyoshi
{"title":"Attention Mining Branch for Optimizing Attention Map","authors":"Takaaki Iwayoshi, Masahiro Mitsuhara, Masayuki Takada, Tsubasa Hirakawa, Takayoshi Yamashita, H. Fujiyoshi","doi":"10.23919/MVA51890.2021.9511357","DOIUrl":"https://doi.org/10.23919/MVA51890.2021.9511357","url":null,"abstract":"Attention branch networks (ABNs) can achieve high accuracy by visualizing the attention area of the network during inference and utilizing it in the recognition process. However, if the attention area does not highlight the target object to be recognized, it may cause recognition failure. While there is a method for fine-tuning the ABN using attention maps modified by human knowledge, it takes up a lot of labor and time because the attention map needs to be modified manually. In this paper, we propose a method that automatically optimizes the attention map by introducing an attention mining branch to the ABN. Our evaluation experiments show that the proposed method improves the recognition accuracy and obtains an attention map that appropriately focuses on the target object to be recognized.","PeriodicalId":312481,"journal":{"name":"2021 17th International Conference on Machine Vision and Applications (MVA)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125555759","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Saliency based Subject Selection for Diverse Image Captioning 基于显著性的图像标题选择
2021 17th International Conference on Machine Vision and Applications (MVA) Pub Date : 2021-07-25 DOI: 10.23919/MVA51890.2021.9511360
Quoc-An Luong, Duc Minh Vo, A. Sugimoto
{"title":"Saliency based Subject Selection for Diverse Image Captioning","authors":"Quoc-An Luong, Duc Minh Vo, A. Sugimoto","doi":"10.23919/MVA51890.2021.9511360","DOIUrl":"https://doi.org/10.23919/MVA51890.2021.9511360","url":null,"abstract":"Image captioning has drawn more and more attention because of its practical usefulness in many multimedia applications. Multiple criteria such as accuracy, detail or diversity exist to evaluate the quality of generated captions. Among them, diversity is the most difficult because for a given image, its multiple captions should be generated while retaining their accuracy. We approach to diverse image captioning by explicitly selecting objects in an image one by one as a subject in generating captions. Our method has three main steps: (1) After generating scene graph of a given image, we first give selection priority to the nodes (namely, subjects) in the scene graph based on the size and visual saliency of objects. (2) With a selected subject, we prune a portion of the scene graph structure that is irrelevant to the subject to have subject-oriented scene graph for accurate captioning. (3) We convert the subject-oriented scene graph into its more sentence-friendly abstract meaning representation (AMR) to generate the caption whose the subject is the selected root. In this way, we can generate captions whose subjects are different from each other, achieving diversity. Our proposed method achieves comparable results with other methods in both diversity and accuracy.","PeriodicalId":312481,"journal":{"name":"2021 17th International Conference on Machine Vision and Applications (MVA)","volume":"5 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129417177","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Facial landmark detection transfer learning for a specific user in driver status monitoring systems 驾驶员状态监测系统中特定用户的面部地标检测迁移学习
2021 17th International Conference on Machine Vision and Applications (MVA) Pub Date : 2021-07-25 DOI: 10.23919/MVA51890.2021.9511385
Jaechul Kim, K. Taguchi, Yusuke Hayashi, Jungo Miyazaki, H. Fujiyoshi
{"title":"Facial landmark detection transfer learning for a specific user in driver status monitoring systems","authors":"Jaechul Kim, K. Taguchi, Yusuke Hayashi, Jungo Miyazaki, H. Fujiyoshi","doi":"10.23919/MVA51890.2021.9511385","DOIUrl":"https://doi.org/10.23919/MVA51890.2021.9511385","url":null,"abstract":"The wide variety of human faces make it nearly impossible to prepare a complete training data set for facial landmark detection. Because of this, the performance of facial landmark detection is unlikely to be sufficient for driver status monitoring (DSM) systems. To improve the performance for a specific person (SP) by collecting data about that person, we propose the generator and discriminator model using the Lucas-Kanade assistance (GDA) algorithm for compiling a training data set. Even when data for a specific user can be collected, another issue is how to efficiently, effectively, and quickly re-train the model using an insufficient data set. To address this problem, we propose a novel method of transfer learning in the context of composite backbone networks (GBNet). The assistant backbone of GBNet is trained on a large unspecified people (USP) data set in the source domain and transfers its representation to the lead backbone, which is trained by a small SP data set in the target domain. In addition, we design an assistance loss function with output that is not only close to the SP data set, but also consistent with a USP data set with respect to labeled images. We test the proposed method using the 300 Videos in the Wild (300VW) data set and our own data set. Furthermore, show that the proposed method improves the stability of predictions. We expect our method to contribute to the realization of stable DSM systems.","PeriodicalId":312481,"journal":{"name":"2021 17th International Conference on Machine Vision and Applications (MVA)","volume":"103 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126663479","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AVM Image Quality Enhancement by Synthetic Image Learning for Supervised Deblurring 基于监督去模糊的合成图像学习提高AVM图像质量
2021 17th International Conference on Machine Vision and Applications (MVA) Pub Date : 2021-07-25 DOI: 10.23919/MVA51890.2021.9511398
Kazutoshi Akita, Masayoshi Hayama, Haruya Kyutoku, N. Ukita
{"title":"AVM Image Quality Enhancement by Synthetic Image Learning for Supervised Deblurring","authors":"Kazutoshi Akita, Masayoshi Hayama, Haruya Kyutoku, N. Ukita","doi":"10.23919/MVA51890.2021.9511398","DOIUrl":"https://doi.org/10.23919/MVA51890.2021.9511398","url":null,"abstract":"An Around View Monitoring (AVM) system is widely used to allow a driver to watch the situation around a car. The AVM image is generated by image distortion correction and viewpoint transformation for images captured by wide view-angle cameras installed on the car. However, the AVM image is blurred due to these transformations. This blur impairs the visibility of the driver. While many deblurring methods based on CNN have been proposed, these general-purpose de-blurring methods are not designed for the AVM image. (1) Since the blur level in the AVM image is region-dependent, deblurring for the AVM should also be region-dependent. (2) Furthermore, while supervised deblurring methods require a pair of input-blurred and output-deblurred images, it is not easy to collect the deblurred AVM image. This paper proposes a method for generating the pairs of training images that cope with the aforementioned two problems. These training images are generated by the inverse transformation of the AVM image generation process. Experimental results show that our method can suppress blur on AVM images. We also confirmed that even a very shallow CNN with the inference time of 2.1ms has the same performance as the SoTA model.","PeriodicalId":312481,"journal":{"name":"2021 17th International Conference on Machine Vision and Applications (MVA)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125652588","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Japanese Sentence Dataset for Lip- reading 唇读日语句子数据集
2021 17th International Conference on Machine Vision and Applications (MVA) Pub Date : 2021-07-25 DOI: 10.23919/MVA51890.2021.9511353
Tatsuya Shirakata, T. Saitoh
{"title":"Japanese Sentence Dataset for Lip- reading","authors":"Tatsuya Shirakata, T. Saitoh","doi":"10.23919/MVA51890.2021.9511353","DOIUrl":"https://doi.org/10.23919/MVA51890.2021.9511353","url":null,"abstract":"This research is about lip-reading for Japanese sentences. Research on English sentences is actively pursued due to the extensive datasets. However, a sufficient dataset for Japanese sentences has not been released. Therefore, this paper builds a Japanese sentence dataset. A Transformer model is used for the recognition task. Three recognition target levels: phoneme, mora, and vowel, are set, and recognition experiments show that they can be recognized.","PeriodicalId":312481,"journal":{"name":"2021 17th International Conference on Machine Vision and Applications (MVA)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131674746","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Multi-physical and Temporal Feature Based Self-correcting Approximation Model for Monocular 3D Volleyball Trajectory Analysis 基于多物理和时间特征的单目三维排球轨迹分析自校正逼近模型
2021 17th International Conference on Machine Vision and Applications (MVA) Pub Date : 2021-07-25 DOI: 10.23919/MVA51890.2021.9511408
J. Dong, Xina Cheng, T. Ikenaga
{"title":"Multi-physical and Temporal Feature Based Self-correcting Approximation Model for Monocular 3D Volleyball Trajectory Analysis","authors":"J. Dong, Xina Cheng, T. Ikenaga","doi":"10.23919/MVA51890.2021.9511408","DOIUrl":"https://doi.org/10.23919/MVA51890.2021.9511408","url":null,"abstract":"Benefiting from the low venue requirements and deployment cost, analysis of 3D volleyball trajectory from monocular vision sensor is of important significance to volleyball game analysis and training assisting. Because of the monocular vision limitation, complicated ball trajectory caused by physical factors and model drifting owing to distance information loss are two governing challenges. This paper proposes a multi-physical factors and self-cor-recting trajectory approximation model. Also, a trajectory correction algorithm based on temporal motion features is proposed. For the first challenge, air resistance factor and gravity factor which mostly impact volleyball during flying are considered to simulate ball motion status. The approximation model parameters are evaluated and corrected during model calculating to reduce calculation error. To limiting model drifting, volleyball movement characteristics based on temporal motion feature is applied to correct approximated trajectory. The success rate of proposed monocular 3D trajectory approximation method achieves 82.5% which has 47.0% improvement comparing with conventional work.","PeriodicalId":312481,"journal":{"name":"2021 17th International Conference on Machine Vision and Applications (MVA)","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133851754","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Proceedings of MVA 2021 17th International Conference on Machine Vision Applications MVA 2021第17届机器视觉应用国际会议论文集
2021 17th International Conference on Machine Vision and Applications (MVA) Pub Date : 2021-07-25 DOI: 10.23919/mva51890.2021.9511373
{"title":"Proceedings of MVA 2021 17th International Conference on Machine Vision Applications","authors":"","doi":"10.23919/mva51890.2021.9511373","DOIUrl":"https://doi.org/10.23919/mva51890.2021.9511373","url":null,"abstract":"","PeriodicalId":312481,"journal":{"name":"2021 17th International Conference on Machine Vision and Applications (MVA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124075254","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信