2023 18th International Conference on Machine Vision and Applications (MVA)最新文献

筛选
英文 中文
MFFPN: an Anchor-Free Method for Patent Drawing Object Detection MFFPN:一种无锚点的专利图纸对象检测方法
2023 18th International Conference on Machine Vision and Applications (MVA) Pub Date : 2023-07-23 DOI: 10.23919/MVA57639.2023.10216017
Yu-Hsien Chen, Chih-Yi Chiu
{"title":"MFFPN: an Anchor-Free Method for Patent Drawing Object Detection","authors":"Yu-Hsien Chen, Chih-Yi Chiu","doi":"10.23919/MVA57639.2023.10216017","DOIUrl":"https://doi.org/10.23919/MVA57639.2023.10216017","url":null,"abstract":"A patent document may contain meaningful drawings that can be used for image retrieval. However, labeling drawing locations manually is time-consuming. Since this work is similar to object detection, some object detection techniques can be employed to facilitate it. In this paper, we propose a new anchor-free object detection method for this purpose. The proposed method contains two parts, namely, max filtering feature pyramid network (MFFPN) and dilated sample selection loss (DSSL). We replace feature pyramid network and path aggregation network by 3D max pooling for multi-scale feature fusion with MFFPN. By using DSSL, we can adaptively select training samples according to the ground truth size. Experimental results show the proposed method can achieve a better performance compared with the state-of-the-art anchor-free methods on Taiwan patent dataset.","PeriodicalId":338734,"journal":{"name":"2023 18th International Conference on Machine Vision and Applications (MVA)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122688087","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ASD-EVNet: An Ensemble Vision Network based on Facial Expression for Autism Spectrum Disorder Recognition 基于面部表情的集成视觉网络ASD-EVNet用于自闭症谱系障碍识别
2023 18th International Conference on Machine Vision and Applications (MVA) Pub Date : 2023-07-23 DOI: 10.23919/MVA57639.2023.10215688
Assil Jaby, Md Baharul Islam, Md Atiqur Rahman Ahad
{"title":"ASD-EVNet: An Ensemble Vision Network based on Facial Expression for Autism Spectrum Disorder Recognition","authors":"Assil Jaby, Md Baharul Islam, Md Atiqur Rahman Ahad","doi":"10.23919/MVA57639.2023.10215688","DOIUrl":"https://doi.org/10.23919/MVA57639.2023.10215688","url":null,"abstract":"Autism Spectrum Disorder (ASD) is a neurodevelopmental disorder that affects individuals’ social interaction, communication, and behavior. Early diagnosis and intervention are critical for the well-being and development of children with ASD. Available methods for diagnosing ASD are unpredictable (or with limited accuracy) or require significant time and resources. We aim to enhance the precision of ASD diagnosis by utilizing facial expressions, a readily accessible and limited time-consuming approach. This paper presents ASD Ensemble Vision Network (ASD-EVNet) for recognizing ASD based on facial expressions. The model utilizes three Vision Transformer (ViT) architectures, pre-trained on imageNet-21K and fine-tuned on the ASD dataset. We also develop an extensive collection of facial expression-based ASD dataset for children (FADC). The ensemble learning model was then created by combining the predictions of the three ViT models and feeding it to a classifier. Our experiments demonstrate that the proposed ensemble learning model outperforms and achieves state-of-the-art results in detecting ASD based on facial expressions.","PeriodicalId":338734,"journal":{"name":"2023 18th International Conference on Machine Vision and Applications (MVA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128883697","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
YOLOv5 with Mixed Backbone for Efficient Spatio-Temporal Hand Gesture Localization and Recognition 基于混合主干的YOLOv5高效时空手势定位与识别
2023 18th International Conference on Machine Vision and Applications (MVA) Pub Date : 2023-07-23 DOI: 10.23919/MVA57639.2023.10215605
Luis Acevedo-Bringas, Gibran Benitez-Garcia, J. Olivares-Mercado, Hiroki Takahashi
{"title":"YOLOv5 with Mixed Backbone for Efficient Spatio-Temporal Hand Gesture Localization and Recognition","authors":"Luis Acevedo-Bringas, Gibran Benitez-Garcia, J. Olivares-Mercado, Hiroki Takahashi","doi":"10.23919/MVA57639.2023.10215605","DOIUrl":"https://doi.org/10.23919/MVA57639.2023.10215605","url":null,"abstract":"Spatio-temporal Hand Gesture Localization and Recognition (SHGLR) refers to analyzing the spatial and temporal aspects of hand movements for detecting and identifying hand gestures in a video. Current state-of-the-art approaches for SHGLR utilize large and complex architectures that result in a high computational cost. To address this issue, we present a new efficient method based on a mixed backbone for YOLOv5. We decided to use it since it is a lightweight and one-stage framework. We designed a mixed backbone that combines 2D and 3D convolutions to obtain temporal information from previous frames. The proposed method offers an efficient way to perform SHGLR on videos by inflating specific convolutions of the backbone while keeping a similar computational cost to the conventional YOLOv5. Due to its challenging and continuous hand gestures, we conduct experiments using the IPN Hand dataset. Our proposed method achieves a frame mAP@0.5 of 66.52% with a 6-frame clip input, outperforming conventional YOLOv5 by 7.89%, demonstrating the effectiveness of our approach.","PeriodicalId":338734,"journal":{"name":"2023 18th International Conference on Machine Vision and Applications (MVA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128485669","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An X3D Neural Network Analysis for Runner’s Performance Assessment in a Wild Sporting Environment 野外运动环境下跑步者成绩评价的X3D神经网络分析
2023 18th International Conference on Machine Vision and Applications (MVA) Pub Date : 2023-07-22 DOI: 10.23919/MVA57639.2023.10215918
David Freire-Obregón, J. Lorenzo-Navarro, Oliverio J. Santana, D. Hernández-Sosa, M. C. Santana
{"title":"An X3D Neural Network Analysis for Runner’s Performance Assessment in a Wild Sporting Environment","authors":"David Freire-Obregón, J. Lorenzo-Navarro, Oliverio J. Santana, D. Hernández-Sosa, M. C. Santana","doi":"10.23919/MVA57639.2023.10215918","DOIUrl":"https://doi.org/10.23919/MVA57639.2023.10215918","url":null,"abstract":"We present a transfer learning analysis on a sporting environment of the expanded 3D (X3D) neural networks. Inspired by action quality assessment methods in the literature, our method uses an action recognition network to estimate athletes’ cumulative race time (CRT) during an ultra-distance competition. We evaluate the performance considering the X3D, a family of action recognition networks that expand a small 2D image classification architecture along multiple network axes, including space, time, width, and depth. We demonstrate that the resulting neural network can provide remarkable performance for short input footage, with a mean absolute error of 12 minutes and a half when estimating the CRT for runners who have been active from 8 to 20 hours. Our most significant discovery is that X3D achieves state-of-the-art performance while requiring almost seven times less memory to achieve better precision than previous work.","PeriodicalId":338734,"journal":{"name":"2023 18th International Conference on Machine Vision and Applications (MVA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130883285","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
BandRe: Rethinking Band-Pass Filters for Scale-Wise Object Detection Evaluation BandRe:对尺度目标检测评估的带通滤波器的反思
2023 18th International Conference on Machine Vision and Applications (MVA) Pub Date : 2023-07-21 DOI: 10.23919/MVA57639.2023.10216132
Yosuke Shinya
{"title":"BandRe: Rethinking Band-Pass Filters for Scale-Wise Object Detection Evaluation","authors":"Yosuke Shinya","doi":"10.23919/MVA57639.2023.10216132","DOIUrl":"https://doi.org/10.23919/MVA57639.2023.10216132","url":null,"abstract":"Scale-wise evaluation of object detectors is important for real-world applications. However, existing metrics are either coarse or not sufficiently reliable. In this paper, we propose novel scale-wise metrics that strike a balance between fineness and reliability, using a filter bank consisting of triangular and trapezoidal band-pass filters. We conduct experiments with two methods on two datasets and show that the proposed metrics can highlight the differences between the methods and between the datasets. Code is available at https://github.com/shinya7y/UniverseNet.","PeriodicalId":338734,"journal":{"name":"2023 18th International Conference on Machine Vision and Applications (MVA)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124496541","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
MVA2023 Small Object Detection Challenge for Spotting Birds: Dataset, Methods, and Results MVA2023小目标检测挑战:数据集,方法和结果
2023 18th International Conference on Machine Vision and Applications (MVA) Pub Date : 2023-07-18 DOI: 10.23919/MVA57639.2023.10215935
Yuki Kondo, N. Ukita, Takayuki Yamaguchi, Haoran Hou, Mu-Yi Shen, Chia-Chi Hsu, En-Ming Huang, Yu-Chen Huang, Yuelong Xia, Chien-Yao Wang, Chun-Yi Lee, Da Huo, Marc A. Kastner, Tingwei Liu, Yasutomo Kawanishi, Takatsugu Hirayama, Takahiro Komamizu, I. Ide, Yosuke Shinya, Xinyao Liu, Guang Liang, S. Yasui
{"title":"MVA2023 Small Object Detection Challenge for Spotting Birds: Dataset, Methods, and Results","authors":"Yuki Kondo, N. Ukita, Takayuki Yamaguchi, Haoran Hou, Mu-Yi Shen, Chia-Chi Hsu, En-Ming Huang, Yu-Chen Huang, Yuelong Xia, Chien-Yao Wang, Chun-Yi Lee, Da Huo, Marc A. Kastner, Tingwei Liu, Yasutomo Kawanishi, Takatsugu Hirayama, Takahiro Komamizu, I. Ide, Yosuke Shinya, Xinyao Liu, Guang Liang, S. Yasui","doi":"10.23919/MVA57639.2023.10215935","DOIUrl":"https://doi.org/10.23919/MVA57639.2023.10215935","url":null,"abstract":"Small Object Detection (SOD) is an important machine vision topic because (i) a variety of real-world applications require object detection for distant objects and (ii) SOD is a challenging task due to the noisy, blurred, and less-informative image appearances of small objects. This paper proposes a new SOD dataset consisting of 39,070 images including 137,121 bird instances, which is called the Small Object Detection for Spotting Birds (SOD4SB) dataset. The detail of the challenge with the SOD4SB dataset 1 is introduced in this paper. In total, 223 participants joined this challenge. This paper briefly introduces the award-winning methods. The dataset 2, the baseline code 3, and the website for evaluation on the public testset 4 are publicly available.","PeriodicalId":338734,"journal":{"name":"2023 18th International Conference on Machine Vision and Applications (MVA)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131741645","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
TomatoDIFF: On–plant Tomato Segmentation with Denoising Diffusion Models * 番茄diff:基于去噪扩散模型的番茄切分方法*
2023 18th International Conference on Machine Vision and Applications (MVA) Pub Date : 2023-07-03 DOI: 10.23919/MVA57639.2023.10215774
Marija Ivanovska, Vitomir Štruc, J. Pers
{"title":"TomatoDIFF: On–plant Tomato Segmentation with Denoising Diffusion Models *","authors":"Marija Ivanovska, Vitomir Štruc, J. Pers","doi":"10.23919/MVA57639.2023.10215774","DOIUrl":"https://doi.org/10.23919/MVA57639.2023.10215774","url":null,"abstract":"Artificial intelligence applications enable farmers to optimize crop growth and production while reducing costs and environmental impact. Computer vision-based algorithms in particular, are commonly used for fruit segmentation, enabling in-depth analysis of the harvest quality and accurate yield estimation. In this paper, we propose TomatoDIFF, a novel diffusion-based model for semantic segmentation of on-plant tomatoes. When evaluated against other competitive methods, our model demonstrates state-of-the-art (SOTA) performance, even in challenging environments with highly occluded fruits. Additionally, we introduce Tomatopia, a new, large and challenging dataset of greenhouse tomatoes. The dataset comprises high-resolution RGB-D images and pixel-level annotations of the fruits. The source code of TomatoDIFF and Tomatopia are available at https://github.com/MIvanovska/TomatoDIFF.","PeriodicalId":338734,"journal":{"name":"2023 18th International Conference on Machine Vision and Applications (MVA)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127519277","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Lifelong Change Detection: Continuous Domain Adaptation for Small Object Change Detection in Everyday Robot Navigation 终身变化检测:机器人日常导航中小目标变化检测的连续域自适应
2023 18th International Conference on Machine Vision and Applications (MVA) Pub Date : 2023-06-28 DOI: 10.23919/MVA57639.2023.10215686
Koji Takeda, Kanji Tanaka, Yoshimasa Nakamura
{"title":"Lifelong Change Detection: Continuous Domain Adaptation for Small Object Change Detection in Everyday Robot Navigation","authors":"Koji Takeda, Kanji Tanaka, Yoshimasa Nakamura","doi":"10.23919/MVA57639.2023.10215686","DOIUrl":"https://doi.org/10.23919/MVA57639.2023.10215686","url":null,"abstract":"The recently emerging research area in robotics, ground view change detection, suffers from its ill-posed-ness because of visual uncertainty combined with complex nonlinear perspective projection. To regularize the ill-posed-ness, the commonly applied supervised learning methods (e.g., CSCD-Net) rely on manually annotated high-quality object-class-specific priors. In this work, we consider general application domains where no manual annotation is available and present a fully self-supervised approach. The proposed approach adopts the powerful and versatile idea that object changes detected during everyday robot navigation can be reused as additional priors to improve future change detection tasks. Furthermore, a robustified framework is implemented and verified experimentally in a new challenging practical application scenario: ground-view small object change detection.","PeriodicalId":338734,"journal":{"name":"2023 18th International Conference on Machine Vision and Applications (MVA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129433232","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Automatic Reconstruction of Semantic 3D Models from 2D Floor Plans 从二维平面图自动重建语义三维模型
2023 18th International Conference on Machine Vision and Applications (MVA) Pub Date : 2023-06-02 DOI: 10.23919/MVA57639.2023.10215746
Astrid Barreiro, Mariusz Trzeciakiewicz, A. Hilsmann, P. Eisert
{"title":"Automatic Reconstruction of Semantic 3D Models from 2D Floor Plans","authors":"Astrid Barreiro, Mariusz Trzeciakiewicz, A. Hilsmann, P. Eisert","doi":"10.23919/MVA57639.2023.10215746","DOIUrl":"https://doi.org/10.23919/MVA57639.2023.10215746","url":null,"abstract":"Digitalization of existing buildings and the creation of 3D BIM models for them has become crucial for many tasks. Of particular importance are floor plans, which contain information about building layouts and are vital for processes such as construction, maintenance or refurbishing. However, this data is not always available in digital form, especially for older buildings constructed before CAD tools were widely available, or lacks semantic information. The digitalization of such information usually requires an expert to reconstruct the layouts by hand, which is a cumbersome and error-prone process. In this paper, we present a pipeline for reconstruction of vectorized 3D models from scanned 2D plans, aiming at increasing the efficiency of this process. The method presented achieves state-of-the-art results in the public dataset CubiCasa5k [8], and shows good generalization to different types of plans. Our vectorization approach is particularly effective, outperforming previous methods.","PeriodicalId":338734,"journal":{"name":"2023 18th International Conference on Machine Vision and Applications (MVA)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132923933","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Tackling Face Verification Edge Cases: In-Depth Analysis and Human-Machine Fusion Approach 解决人脸验证边缘案例:深度分析与人机融合方法
2023 18th International Conference on Machine Vision and Applications (MVA) Pub Date : 2023-04-17 DOI: 10.23919/MVA57639.2023.10216168
Martin Knoche, G. Rigoll
{"title":"Tackling Face Verification Edge Cases: In-Depth Analysis and Human-Machine Fusion Approach","authors":"Martin Knoche, G. Rigoll","doi":"10.23919/MVA57639.2023.10216168","DOIUrl":"https://doi.org/10.23919/MVA57639.2023.10216168","url":null,"abstract":"Nowadays, face recognition systems surpass human performance on several datasets. However, there are still edge cases that the machine can’t correctly classify. This paper investigates the effect of a combination of machine and human operators in the face verification task. First, we look closer at the edge cases for several state-of-the-art models to discover common datasets’ challenging settings. Then, we conduct a study with 60 participants on these selected tasks with humans and provide an extensive analysis. Finally, we demonstrate that combining machine and human decisions can further improve the performance of state-of-the-art face verification systems on various benchmark datasets. Code and data are publicly available on GitHub 1.","PeriodicalId":338734,"journal":{"name":"2023 18th International Conference on Machine Vision and Applications (MVA)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132014759","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信