{"title":"MFFPN: an Anchor-Free Method for Patent Drawing Object Detection","authors":"Yu-Hsien Chen, Chih-Yi Chiu","doi":"10.23919/MVA57639.2023.10216017","DOIUrl":"https://doi.org/10.23919/MVA57639.2023.10216017","url":null,"abstract":"A patent document may contain meaningful drawings that can be used for image retrieval. However, labeling drawing locations manually is time-consuming. Since this work is similar to object detection, some object detection techniques can be employed to facilitate it. In this paper, we propose a new anchor-free object detection method for this purpose. The proposed method contains two parts, namely, max filtering feature pyramid network (MFFPN) and dilated sample selection loss (DSSL). We replace feature pyramid network and path aggregation network by 3D max pooling for multi-scale feature fusion with MFFPN. By using DSSL, we can adaptively select training samples according to the ground truth size. Experimental results show the proposed method can achieve a better performance compared with the state-of-the-art anchor-free methods on Taiwan patent dataset.","PeriodicalId":338734,"journal":{"name":"2023 18th International Conference on Machine Vision and Applications (MVA)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122688087","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"ASD-EVNet: An Ensemble Vision Network based on Facial Expression for Autism Spectrum Disorder Recognition","authors":"Assil Jaby, Md Baharul Islam, Md Atiqur Rahman Ahad","doi":"10.23919/MVA57639.2023.10215688","DOIUrl":"https://doi.org/10.23919/MVA57639.2023.10215688","url":null,"abstract":"Autism Spectrum Disorder (ASD) is a neurodevelopmental disorder that affects individuals’ social interaction, communication, and behavior. Early diagnosis and intervention are critical for the well-being and development of children with ASD. Available methods for diagnosing ASD are unpredictable (or with limited accuracy) or require significant time and resources. We aim to enhance the precision of ASD diagnosis by utilizing facial expressions, a readily accessible and limited time-consuming approach. This paper presents ASD Ensemble Vision Network (ASD-EVNet) for recognizing ASD based on facial expressions. The model utilizes three Vision Transformer (ViT) architectures, pre-trained on imageNet-21K and fine-tuned on the ASD dataset. We also develop an extensive collection of facial expression-based ASD dataset for children (FADC). The ensemble learning model was then created by combining the predictions of the three ViT models and feeding it to a classifier. Our experiments demonstrate that the proposed ensemble learning model outperforms and achieves state-of-the-art results in detecting ASD based on facial expressions.","PeriodicalId":338734,"journal":{"name":"2023 18th International Conference on Machine Vision and Applications (MVA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128883697","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Luis Acevedo-Bringas, Gibran Benitez-Garcia, J. Olivares-Mercado, Hiroki Takahashi
{"title":"YOLOv5 with Mixed Backbone for Efficient Spatio-Temporal Hand Gesture Localization and Recognition","authors":"Luis Acevedo-Bringas, Gibran Benitez-Garcia, J. Olivares-Mercado, Hiroki Takahashi","doi":"10.23919/MVA57639.2023.10215605","DOIUrl":"https://doi.org/10.23919/MVA57639.2023.10215605","url":null,"abstract":"Spatio-temporal Hand Gesture Localization and Recognition (SHGLR) refers to analyzing the spatial and temporal aspects of hand movements for detecting and identifying hand gestures in a video. Current state-of-the-art approaches for SHGLR utilize large and complex architectures that result in a high computational cost. To address this issue, we present a new efficient method based on a mixed backbone for YOLOv5. We decided to use it since it is a lightweight and one-stage framework. We designed a mixed backbone that combines 2D and 3D convolutions to obtain temporal information from previous frames. The proposed method offers an efficient way to perform SHGLR on videos by inflating specific convolutions of the backbone while keeping a similar computational cost to the conventional YOLOv5. Due to its challenging and continuous hand gestures, we conduct experiments using the IPN Hand dataset. Our proposed method achieves a frame mAP@0.5 of 66.52% with a 6-frame clip input, outperforming conventional YOLOv5 by 7.89%, demonstrating the effectiveness of our approach.","PeriodicalId":338734,"journal":{"name":"2023 18th International Conference on Machine Vision and Applications (MVA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128485669","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
David Freire-Obregón, J. Lorenzo-Navarro, Oliverio J. Santana, D. Hernández-Sosa, M. C. Santana
{"title":"An X3D Neural Network Analysis for Runner’s Performance Assessment in a Wild Sporting Environment","authors":"David Freire-Obregón, J. Lorenzo-Navarro, Oliverio J. Santana, D. Hernández-Sosa, M. C. Santana","doi":"10.23919/MVA57639.2023.10215918","DOIUrl":"https://doi.org/10.23919/MVA57639.2023.10215918","url":null,"abstract":"We present a transfer learning analysis on a sporting environment of the expanded 3D (X3D) neural networks. Inspired by action quality assessment methods in the literature, our method uses an action recognition network to estimate athletes’ cumulative race time (CRT) during an ultra-distance competition. We evaluate the performance considering the X3D, a family of action recognition networks that expand a small 2D image classification architecture along multiple network axes, including space, time, width, and depth. We demonstrate that the resulting neural network can provide remarkable performance for short input footage, with a mean absolute error of 12 minutes and a half when estimating the CRT for runners who have been active from 8 to 20 hours. Our most significant discovery is that X3D achieves state-of-the-art performance while requiring almost seven times less memory to achieve better precision than previous work.","PeriodicalId":338734,"journal":{"name":"2023 18th International Conference on Machine Vision and Applications (MVA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130883285","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"BandRe: Rethinking Band-Pass Filters for Scale-Wise Object Detection Evaluation","authors":"Yosuke Shinya","doi":"10.23919/MVA57639.2023.10216132","DOIUrl":"https://doi.org/10.23919/MVA57639.2023.10216132","url":null,"abstract":"Scale-wise evaluation of object detectors is important for real-world applications. However, existing metrics are either coarse or not sufficiently reliable. In this paper, we propose novel scale-wise metrics that strike a balance between fineness and reliability, using a filter bank consisting of triangular and trapezoidal band-pass filters. We conduct experiments with two methods on two datasets and show that the proposed metrics can highlight the differences between the methods and between the datasets. Code is available at https://github.com/shinya7y/UniverseNet.","PeriodicalId":338734,"journal":{"name":"2023 18th International Conference on Machine Vision and Applications (MVA)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124496541","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yuki Kondo, N. Ukita, Takayuki Yamaguchi, Haoran Hou, Mu-Yi Shen, Chia-Chi Hsu, En-Ming Huang, Yu-Chen Huang, Yuelong Xia, Chien-Yao Wang, Chun-Yi Lee, Da Huo, Marc A. Kastner, Tingwei Liu, Yasutomo Kawanishi, Takatsugu Hirayama, Takahiro Komamizu, I. Ide, Yosuke Shinya, Xinyao Liu, Guang Liang, S. Yasui
{"title":"MVA2023 Small Object Detection Challenge for Spotting Birds: Dataset, Methods, and Results","authors":"Yuki Kondo, N. Ukita, Takayuki Yamaguchi, Haoran Hou, Mu-Yi Shen, Chia-Chi Hsu, En-Ming Huang, Yu-Chen Huang, Yuelong Xia, Chien-Yao Wang, Chun-Yi Lee, Da Huo, Marc A. Kastner, Tingwei Liu, Yasutomo Kawanishi, Takatsugu Hirayama, Takahiro Komamizu, I. Ide, Yosuke Shinya, Xinyao Liu, Guang Liang, S. Yasui","doi":"10.23919/MVA57639.2023.10215935","DOIUrl":"https://doi.org/10.23919/MVA57639.2023.10215935","url":null,"abstract":"Small Object Detection (SOD) is an important machine vision topic because (i) a variety of real-world applications require object detection for distant objects and (ii) SOD is a challenging task due to the noisy, blurred, and less-informative image appearances of small objects. This paper proposes a new SOD dataset consisting of 39,070 images including 137,121 bird instances, which is called the Small Object Detection for Spotting Birds (SOD4SB) dataset. The detail of the challenge with the SOD4SB dataset 1 is introduced in this paper. In total, 223 participants joined this challenge. This paper briefly introduces the award-winning methods. The dataset 2, the baseline code 3, and the website for evaluation on the public testset 4 are publicly available.","PeriodicalId":338734,"journal":{"name":"2023 18th International Conference on Machine Vision and Applications (MVA)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131741645","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"TomatoDIFF: On–plant Tomato Segmentation with Denoising Diffusion Models *","authors":"Marija Ivanovska, Vitomir Štruc, J. Pers","doi":"10.23919/MVA57639.2023.10215774","DOIUrl":"https://doi.org/10.23919/MVA57639.2023.10215774","url":null,"abstract":"Artificial intelligence applications enable farmers to optimize crop growth and production while reducing costs and environmental impact. Computer vision-based algorithms in particular, are commonly used for fruit segmentation, enabling in-depth analysis of the harvest quality and accurate yield estimation. In this paper, we propose TomatoDIFF, a novel diffusion-based model for semantic segmentation of on-plant tomatoes. When evaluated against other competitive methods, our model demonstrates state-of-the-art (SOTA) performance, even in challenging environments with highly occluded fruits. Additionally, we introduce Tomatopia, a new, large and challenging dataset of greenhouse tomatoes. The dataset comprises high-resolution RGB-D images and pixel-level annotations of the fruits. The source code of TomatoDIFF and Tomatopia are available at https://github.com/MIvanovska/TomatoDIFF.","PeriodicalId":338734,"journal":{"name":"2023 18th International Conference on Machine Vision and Applications (MVA)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127519277","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Lifelong Change Detection: Continuous Domain Adaptation for Small Object Change Detection in Everyday Robot Navigation","authors":"Koji Takeda, Kanji Tanaka, Yoshimasa Nakamura","doi":"10.23919/MVA57639.2023.10215686","DOIUrl":"https://doi.org/10.23919/MVA57639.2023.10215686","url":null,"abstract":"The recently emerging research area in robotics, ground view change detection, suffers from its ill-posed-ness because of visual uncertainty combined with complex nonlinear perspective projection. To regularize the ill-posed-ness, the commonly applied supervised learning methods (e.g., CSCD-Net) rely on manually annotated high-quality object-class-specific priors. In this work, we consider general application domains where no manual annotation is available and present a fully self-supervised approach. The proposed approach adopts the powerful and versatile idea that object changes detected during everyday robot navigation can be reused as additional priors to improve future change detection tasks. Furthermore, a robustified framework is implemented and verified experimentally in a new challenging practical application scenario: ground-view small object change detection.","PeriodicalId":338734,"journal":{"name":"2023 18th International Conference on Machine Vision and Applications (MVA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129433232","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Astrid Barreiro, Mariusz Trzeciakiewicz, A. Hilsmann, P. Eisert
{"title":"Automatic Reconstruction of Semantic 3D Models from 2D Floor Plans","authors":"Astrid Barreiro, Mariusz Trzeciakiewicz, A. Hilsmann, P. Eisert","doi":"10.23919/MVA57639.2023.10215746","DOIUrl":"https://doi.org/10.23919/MVA57639.2023.10215746","url":null,"abstract":"Digitalization of existing buildings and the creation of 3D BIM models for them has become crucial for many tasks. Of particular importance are floor plans, which contain information about building layouts and are vital for processes such as construction, maintenance or refurbishing. However, this data is not always available in digital form, especially for older buildings constructed before CAD tools were widely available, or lacks semantic information. The digitalization of such information usually requires an expert to reconstruct the layouts by hand, which is a cumbersome and error-prone process. In this paper, we present a pipeline for reconstruction of vectorized 3D models from scanned 2D plans, aiming at increasing the efficiency of this process. The method presented achieves state-of-the-art results in the public dataset CubiCasa5k [8], and shows good generalization to different types of plans. Our vectorization approach is particularly effective, outperforming previous methods.","PeriodicalId":338734,"journal":{"name":"2023 18th International Conference on Machine Vision and Applications (MVA)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132923933","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Tackling Face Verification Edge Cases: In-Depth Analysis and Human-Machine Fusion Approach","authors":"Martin Knoche, G. Rigoll","doi":"10.23919/MVA57639.2023.10216168","DOIUrl":"https://doi.org/10.23919/MVA57639.2023.10216168","url":null,"abstract":"Nowadays, face recognition systems surpass human performance on several datasets. However, there are still edge cases that the machine can’t correctly classify. This paper investigates the effect of a combination of machine and human operators in the face verification task. First, we look closer at the edge cases for several state-of-the-art models to discover common datasets’ challenging settings. Then, we conduct a study with 60 participants on these selected tasks with humans and provide an extensive analysis. Finally, we demonstrate that combining machine and human decisions can further improve the performance of state-of-the-art face verification systems on various benchmark datasets. Code and data are publicly available on GitHub 1.","PeriodicalId":338734,"journal":{"name":"2023 18th International Conference on Machine Vision and Applications (MVA)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132014759","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}