2021 17th International Conference on Machine Vision and Applications (MVA)最新文献

Bi-directional Recurrent MVSNet for High-resolution Multi-view Stereo 用于高分辨率多视点立体的双向循环MVSNet

2021 17th International Conference on Machine Vision and Applications (MVA) Pub Date : 2021-07-25 DOI: 10.23919/MVA51890.2021.9511358

Taku Fujitomi, Seiya Ito, Naoshi Kaneko, K. Sumi

引用次数: 2

Contextual Information based Network with High-Frequency Feature Fusion for High Frame Rate and Ultra-Low Delay Small-Scale Object Detection 基于上下文信息的高频特征融合网络高帧率超低延迟小尺度目标检测

2021 17th International Conference on Machine Vision and Applications (MVA) Pub Date : 2021-07-25 DOI: 10.23919/MVA51890.2021.9511387

Dongmei Huang, Jihang Zhang, Tingting Hu, Ryuji Fuchikami, T. Ikenaga

引用次数: 0

Selecting an Iconic Pose From an Action Video 从动作视频中选择一个标志性的姿势

2021 17th International Conference on Machine Vision and Applications (MVA) Pub Date : 2021-07-25 DOI: 10.23919/MVA51890.2021.9511347

Geethu Miriam Jacob, B. Stenger

引用次数: 0

Attention Mining Branch for Optimizing Attention Map 优化注意图的注意挖掘分支

2021 17th International Conference on Machine Vision and Applications (MVA) Pub Date : 2021-07-25 DOI: 10.23919/MVA51890.2021.9511357

Takaaki Iwayoshi, Masahiro Mitsuhara, Masayuki Takada, Tsubasa Hirakawa, Takayoshi Yamashita, H. Fujiyoshi

引用次数: 2

Saliency based Subject Selection for Diverse Image Captioning 基于显著性的图像标题选择

2021 17th International Conference on Machine Vision and Applications (MVA) Pub Date : 2021-07-25 DOI: 10.23919/MVA51890.2021.9511360

Quoc-An Luong, Duc Minh Vo, A. Sugimoto

{"title":"Saliency based Subject Selection for Diverse Image Captioning","authors":"Quoc-An Luong, Duc Minh Vo, A. Sugimoto","doi":"10.23919/MVA51890.2021.9511360","DOIUrl":"https://doi.org/10.23919/MVA51890.2021.9511360","url":null,"abstract":"Image captioning has drawn more and more attention because of its practical usefulness in many multimedia applications. Multiple criteria such as accuracy, detail or diversity exist to evaluate the quality of generated captions. Among them, diversity is the most difficult because for a given image, its multiple captions should be generated while retaining their accuracy. We approach to diverse image captioning by explicitly selecting objects in an image one by one as a subject in generating captions. Our method has three main steps: (1) After generating scene graph of a given image, we first give selection priority to the nodes (namely, subjects) in the scene graph based on the size and visual saliency of objects. (2) With a selected subject, we prune a portion of the scene graph structure that is irrelevant to the subject to have subject-oriented scene graph for accurate captioning. (3) We convert the subject-oriented scene graph into its more sentence-friendly abstract meaning representation (AMR) to generate the caption whose the subject is the selected root. In this way, we can generate captions whose subjects are different from each other, achieving diversity. Our proposed method achieves comparable results with other methods in both diversity and accuracy.","PeriodicalId":312481,"journal":{"name":"2021 17th International Conference on Machine Vision and Applications (MVA)","volume":"5 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129417177","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Facial landmark detection transfer learning for a specific user in driver status monitoring systems 驾驶员状态监测系统中特定用户的面部地标检测迁移学习

2021 17th International Conference on Machine Vision and Applications (MVA) Pub Date : 2021-07-25 DOI: 10.23919/MVA51890.2021.9511385

Jaechul Kim, K. Taguchi, Yusuke Hayashi, Jungo Miyazaki, H. Fujiyoshi

{"title":"Facial landmark detection transfer learning for a specific user in driver status monitoring systems","authors":"Jaechul Kim, K. Taguchi, Yusuke Hayashi, Jungo Miyazaki, H. Fujiyoshi","doi":"10.23919/MVA51890.2021.9511385","DOIUrl":"https://doi.org/10.23919/MVA51890.2021.9511385","url":null,"abstract":"The wide variety of human faces make it nearly impossible to prepare a complete training data set for facial landmark detection. Because of this, the performance of facial landmark detection is unlikely to be sufficient for driver status monitoring (DSM) systems. To improve the performance for a specific person (SP) by collecting data about that person, we propose the generator and discriminator model using the Lucas-Kanade assistance (GDA) algorithm for compiling a training data set. Even when data for a specific user can be collected, another issue is how to efficiently, effectively, and quickly re-train the model using an insufficient data set. To address this problem, we propose a novel method of transfer learning in the context of composite backbone networks (GBNet). The assistant backbone of GBNet is trained on a large unspecified people (USP) data set in the source domain and transfers its representation to the lead backbone, which is trained by a small SP data set in the target domain. In addition, we design an assistance loss function with output that is not only close to the SP data set, but also consistent with a USP data set with respect to labeled images. We test the proposed method using the 300 Videos in the Wild (300VW) data set and our own data set. Furthermore, show that the proposed method improves the stability of predictions. We expect our method to contribute to the realization of stable DSM systems.","PeriodicalId":312481,"journal":{"name":"2021 17th International Conference on Machine Vision and Applications (MVA)","volume":"103 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126663479","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

AVM Image Quality Enhancement by Synthetic Image Learning for Supervised Deblurring 基于监督去模糊的合成图像学习提高AVM图像质量

2021 17th International Conference on Machine Vision and Applications (MVA) Pub Date : 2021-07-25 DOI: 10.23919/MVA51890.2021.9511398

Kazutoshi Akita, Masayoshi Hayama, Haruya Kyutoku, N. Ukita

{"title":"AVM Image Quality Enhancement by Synthetic Image Learning for Supervised Deblurring","authors":"Kazutoshi Akita, Masayoshi Hayama, Haruya Kyutoku, N. Ukita","doi":"10.23919/MVA51890.2021.9511398","DOIUrl":"https://doi.org/10.23919/MVA51890.2021.9511398","url":null,"abstract":"An Around View Monitoring (AVM) system is widely used to allow a driver to watch the situation around a car. The AVM image is generated by image distortion correction and viewpoint transformation for images captured by wide view-angle cameras installed on the car. However, the AVM image is blurred due to these transformations. This blur impairs the visibility of the driver. While many deblurring methods based on CNN have been proposed, these general-purpose de-blurring methods are not designed for the AVM image. (1) Since the blur level in the AVM image is region-dependent, deblurring for the AVM should also be region-dependent. (2) Furthermore, while supervised deblurring methods require a pair of input-blurred and output-deblurred images, it is not easy to collect the deblurred AVM image. This paper proposes a method for generating the pairs of training images that cope with the aforementioned two problems. These training images are generated by the inverse transformation of the AVM image generation process. Experimental results show that our method can suppress blur on AVM images. We also confirmed that even a very shallow CNN with the inference time of 2.1ms has the same performance as the SoTA model.","PeriodicalId":312481,"journal":{"name":"2021 17th International Conference on Machine Vision and Applications (MVA)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125652588","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Japanese Sentence Dataset for Lip- reading 唇读日语句子数据集

2021 17th International Conference on Machine Vision and Applications (MVA) Pub Date : 2021-07-25 DOI: 10.23919/MVA51890.2021.9511353

Tatsuya Shirakata, T. Saitoh

引用次数: 2

Multi-physical and Temporal Feature Based Self-correcting Approximation Model for Monocular 3D Volleyball Trajectory Analysis 基于多物理和时间特征的单目三维排球轨迹分析自校正逼近模型

2021 17th International Conference on Machine Vision and Applications (MVA) Pub Date : 2021-07-25 DOI: 10.23919/MVA51890.2021.9511408

J. Dong, Xina Cheng, T. Ikenaga

{"title":"Multi-physical and Temporal Feature Based Self-correcting Approximation Model for Monocular 3D Volleyball Trajectory Analysis","authors":"J. Dong, Xina Cheng, T. Ikenaga","doi":"10.23919/MVA51890.2021.9511408","DOIUrl":"https://doi.org/10.23919/MVA51890.2021.9511408","url":null,"abstract":"Benefiting from the low venue requirements and deployment cost, analysis of 3D volleyball trajectory from monocular vision sensor is of important significance to volleyball game analysis and training assisting. Because of the monocular vision limitation, complicated ball trajectory caused by physical factors and model drifting owing to distance information loss are two governing challenges. This paper proposes a multi-physical factors and self-cor-recting trajectory approximation model. Also, a trajectory correction algorithm based on temporal motion features is proposed. For the first challenge, air resistance factor and gravity factor which mostly impact volleyball during flying are considered to simulate ball motion status. The approximation model parameters are evaluated and corrected during model calculating to reduce calculation error. To limiting model drifting, volleyball movement characteristics based on temporal motion feature is applied to correct approximated trajectory. The success rate of proposed monocular 3D trajectory approximation method achieves 82.5% which has 47.0% improvement comparing with conventional work.","PeriodicalId":312481,"journal":{"name":"2021 17th International Conference on Machine Vision and Applications (MVA)","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133851754","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Proceedings of MVA 2021 17th International Conference on Machine Vision Applications MVA 2021第17届机器视觉应用国际会议论文集

2021 17th International Conference on Machine Vision and Applications (MVA) Pub Date : 2021-07-25 DOI: 10.23919/mva51890.2021.9511373

引用次数: 0