2023 18th International Conference on Machine Vision and Applications (MVA)最新文献

筛选
英文 中文
Multi-Prior Based Multi-Scale Condition Network for Single-Image HDR Reconstruction 基于多先验的单图像HDR重构多尺度条件网络
2023 18th International Conference on Machine Vision and Applications (MVA) Pub Date : 2023-07-23 DOI: 10.23919/MVA57639.2023.10216063
Haorong Jiang, Fengshan Zhao, Junda Liao, Qin Liu, T. Ikenaga
{"title":"Multi-Prior Based Multi-Scale Condition Network for Single-Image HDR Reconstruction","authors":"Haorong Jiang, Fengshan Zhao, Junda Liao, Qin Liu, T. Ikenaga","doi":"10.23919/MVA57639.2023.10216063","DOIUrl":"https://doi.org/10.23919/MVA57639.2023.10216063","url":null,"abstract":"High Dynamic Range (HDR) imaging aims to reconstruct the natural appearance of real-world scenes by expanding the bit depth of captured images. However, due to the imaging pipeline of off-the-shelf cameras, information loss in over-exposed areas and noise in under-exposed areas pose significant challenges for single-image HDR imaging. As a result, the key to success lies in restoring over-exposed regions and denoising under-exposed regions. In this paper, a multi-prior based multi-scale condition network is proposed to address this issue. (1) Three types of prior knowledge modulate the intermediate features in the reconstruction network from different perspectives, resulting in improved modulation effects. (2) Multi-scale fusion extracts and integrates deep semantic information from various priors. Experiments on the NTIRE HDR challenge dataset demonstrate that the proposed method achieves state-of-the-art quantitative results.","PeriodicalId":338734,"journal":{"name":"2023 18th International Conference on Machine Vision and Applications (MVA)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128020555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TinyPedSeg: A Tiny Pedestrian Segmentation Benchmark for Top-Down Drone Images TinyPedSeg:一个微小的行人分割基准自上而下无人机图像
2023 18th International Conference on Machine Vision and Applications (MVA) Pub Date : 2023-07-23 DOI: 10.23919/MVA57639.2023.10215829
Y. Sahin, Elvin Abdinli, M. A. Aydin, Gozde Unal
{"title":"TinyPedSeg: A Tiny Pedestrian Segmentation Benchmark for Top-Down Drone Images","authors":"Y. Sahin, Elvin Abdinli, M. A. Aydin, Gozde Unal","doi":"10.23919/MVA57639.2023.10215829","DOIUrl":"https://doi.org/10.23919/MVA57639.2023.10215829","url":null,"abstract":"The usage of Unmanned Aerial Vehicles (UAVs) has significantly increased in various fields such as surveillance, agriculture, transportation, and military operations. However, the integration of UAVs in these applications requires the ability to navigate autonomously and detect/segment objects in real-time, which can be achieved through the use of neural networks. Despite object detection for RGB images/videos obtained from UAVs are widely studied before, limited effort has been made for segmentation from top-down aerial images. Considering the case in which the UAV is extremely high from the ground, the task can be formed as tiny object segmentation. Thus, inspired from the TinyPerson dataset which focuses on person detection from UAVs, we present TinyPedSeg, which contains 2563 pedestrians in 320 images. Specialized only in pedestrian segmentation, our dataset presents more informativeness than other UAV segmentation datasets. The dataset and the baseline codes are available at https://github.com/ituvisionlab/tinypedseg","PeriodicalId":338734,"journal":{"name":"2023 18th International Conference on Machine Vision and Applications (MVA)","volume":"258 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132021178","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Safe height estimation of deformable objects for picking robots by detecting multiple potential contact points 基于多个潜在接触点的可变形物体拾取机器人安全高度估计
2023 18th International Conference on Machine Vision and Applications (MVA) Pub Date : 2023-07-23 DOI: 10.23919/MVA57639.2023.10215690
Jaesung Yang, Daisuke Hagihara, Kiyoto Ito, Nobuhiro Chihara
{"title":"Safe height estimation of deformable objects for picking robots by detecting multiple potential contact points","authors":"Jaesung Yang, Daisuke Hagihara, Kiyoto Ito, Nobuhiro Chihara","doi":"10.23919/MVA57639.2023.10215690","DOIUrl":"https://doi.org/10.23919/MVA57639.2023.10215690","url":null,"abstract":"Object sorting in logistics warehouses is still carried out manually, and there is a great need for automation with arm robots. It is desirable that target objects be carefully placed in situations where careful handling of products is important. We propose a method for estimating the height of picked object with a single depth camera to achieve precise placing of items such as stacking, especially for objects that are deformable, e.g., bags. The proposed method detects multiple potential contact points of a picked object to estimate the appropriate height to place the object using the point-cloud difference before and after picking. The validity of the proposed method was verified using 26 cases in which deformable objects were placed inside a container, and it was confirmed that object-height estimation is possible with an average error of 3.2 mm.","PeriodicalId":338734,"journal":{"name":"2023 18th International Conference on Machine Vision and Applications (MVA)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124094200","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Video Anomaly Detection Using Encoder-Decoder Networks with Video Vision Transformer and Channel Attention Blocks 基于视频视觉变压器和信道注意块的编码器-解码器网络的视频异常检测
2023 18th International Conference on Machine Vision and Applications (MVA) Pub Date : 2023-07-23 DOI: 10.23919/MVA57639.2023.10215921
Shimpei Kobayashi, A. Hizukuri, R. Nakayama
{"title":"Video Anomaly Detection Using Encoder-Decoder Networks with Video Vision Transformer and Channel Attention Blocks","authors":"Shimpei Kobayashi, A. Hizukuri, R. Nakayama","doi":"10.23919/MVA57639.2023.10215921","DOIUrl":"https://doi.org/10.23919/MVA57639.2023.10215921","url":null,"abstract":"A surveillance camera has been introduced in various locations for public safety. However, security personnel who have to keep observing surveillance camera movies with few abnormal events would be boring. The purpose of this study is to develop a computerized anomaly detection method for the surveillance camera movies. Our database consisted of three public datasets for anomaly detection: UCSD Pedestrian 1, 2, and CUHK Avenue datasets. In the proposed network, channel attention blocks were introduced to TransAnomaly which is one of the anomaly detections to focus important channel information. The areas under the receiver operating characteristic curves (AUCs) with the proposed network were 0.827 for UCSD Pedestrian 1, 0.964 for UCSD Pedestrian 2, and 0.854 for CUHK Avenue, respectively. The AUCs for the proposed network were greater than those for a conventional TransAnomaly without channel attention blocks (0.767, 0.934, and 0.839).","PeriodicalId":338734,"journal":{"name":"2023 18th International Conference on Machine Vision and Applications (MVA)","volume":"161 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125656568","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Transformer with Task Selection for Continual Learning 持续学习的任务选择转换器
2023 18th International Conference on Machine Vision and Applications (MVA) Pub Date : 2023-07-23 DOI: 10.23919/MVA57639.2023.10215673
Sheng-Kai Huang, Chun-Rong Huang
{"title":"Transformer with Task Selection for Continual Learning","authors":"Sheng-Kai Huang, Chun-Rong Huang","doi":"10.23919/MVA57639.2023.10215673","DOIUrl":"https://doi.org/10.23919/MVA57639.2023.10215673","url":null,"abstract":"The goal of continual learning is to let the models continuously learn the new incoming knowledge without catastrophic forgetting. To address this issue, we propose a transformer-based framework with the task selection module. The task selection module will select corresponding task tokens to assist the learning of incoming samples of new tasks. For previous samples, the selected task tokens can retain the previous knowledge to assist the prediction of samples of learned classes. Compared with the state-of-the-art methods, our method achieves good performance on the CIFAR-100 dataset especially for the testing of the last task to show that our method can better prevent catastrophic forgetting.","PeriodicalId":338734,"journal":{"name":"2023 18th International Conference on Machine Vision and Applications (MVA)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130938811","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Investigating self-supervised learning for Skin Lesion Classification 研究皮肤病变分类的自监督学习
2023 18th International Conference on Machine Vision and Applications (MVA) Pub Date : 2023-07-23 DOI: 10.23919/MVA57639.2023.10215580
Takumi Morita, X. Han
{"title":"Investigating self-supervised learning for Skin Lesion Classification","authors":"Takumi Morita, X. Han","doi":"10.23919/MVA57639.2023.10215580","DOIUrl":"https://doi.org/10.23919/MVA57639.2023.10215580","url":null,"abstract":"Skin cancer is one of the most common cancer worldwide, and is growing as a rising global health issue due to the damage of the natural protection from harmful ultraviolet radiation. Early diagnosis and proper treatment even for the deadliest malignant melanoma can greatly increase the survival rate. Thus, computer-aided diagnosis for skin lesions has been actively explored and made remarkable progress in medical practices benefiting from the the great advance of the deep convolution neural networks in vision tasks. However, most studies in skin lesion/cancer recognition and detection focus on reconstructing a robust prediction model with the annotated training samples in a fully-supervised manner, and cannot make full use of the available unlabeled data. This study investigates self-supervised learning using large amount of unlabeled skin lesion images to train a good initial network for representation learning, and transfer the knowledge of the initial model to the supervised skin lesion classification task with small number of annotated samples for enhancing the performance. Specifically, we employ a negative sample-free self-supervised framework by leveraging the interaction learning of the online and target networks for enforcing representative robustness with only positive samples. Moreover, according to the observation of the potential variations in the target skin images, we select the adaptive augmentation methods to produce the transformed positive views for self-supervised learning. Extensive experiments on two benchmark skin lesion datasets demonstrated that the proposed self-supervised pre-training can stably improve the recognition performance with different numbers of the labeled images compared with the baseline models.","PeriodicalId":338734,"journal":{"name":"2023 18th International Conference on Machine Vision and Applications (MVA)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132750801","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Age Prediction From Face Images Via Contrastive Learning 基于对比学习的人脸图像年龄预测
2023 18th International Conference on Machine Vision and Applications (MVA) Pub Date : 2023-07-23 DOI: 10.23919/MVA57639.2023.10216074
Yeongnam Chae, Poulami Raha, Mijung Kim, B. Stenger
{"title":"Age Prediction From Face Images Via Contrastive Learning","authors":"Yeongnam Chae, Poulami Raha, Mijung Kim, B. Stenger","doi":"10.23919/MVA57639.2023.10216074","DOIUrl":"https://doi.org/10.23919/MVA57639.2023.10216074","url":null,"abstract":"This paper presents a novel approach for accurately estimating age from face images, which overcomes the challenge of collecting a large dataset of individuals with the same identity at different ages. Instead, we leverage readily available face datasets of different people at different ages and aim to extract age-related features using contrastive learning. Our method emphasizes these relevant features while suppressing identity-related features using a combination of cosine similarity and triplet margin losses. We demonstrate the effectiveness of our proposed approach by achieving state-of-the-art performance on two public datasets, FG-NET and MORPH II.","PeriodicalId":338734,"journal":{"name":"2023 18th International Conference on Machine Vision and Applications (MVA)","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115370006","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hierarchical Spatio-Temporal Neural Network with Displacement Based Refinement for Monocular Head Pose Prediction 基于位移的分层时空神经网络单眼头姿预测
2023 18th International Conference on Machine Vision and Applications (MVA) Pub Date : 2023-07-23 DOI: 10.23919/MVA57639.2023.10216167
Zhe Xu, Yuan Li, Yuhong Li, Songlin Du, T. Ikenaga
{"title":"Hierarchical Spatio-Temporal Neural Network with Displacement Based Refinement for Monocular Head Pose Prediction","authors":"Zhe Xu, Yuan Li, Yuhong Li, Songlin Du, T. Ikenaga","doi":"10.23919/MVA57639.2023.10216167","DOIUrl":"https://doi.org/10.23919/MVA57639.2023.10216167","url":null,"abstract":"Head pose prediction aims to forecast future head pose given observed sequence, which plays an increasingly important role in human computer interaction, virtual reality, and driver monitoring. However, since there are many moving possibilities, current head pose works, mainly focusing on estimation, fail to provide sufficient temporal information to meet the high demands for accurate predictions. This paper proposes (A) a Spatio-Temporal Encoder (STE), (B) a displacement based offset generating module, and (C) a time step feature aggregation module. The STE extracts spatial information via Transformer and temporal information according to the time order of frames. The displacement based offset generating module utilizes displacement information through a frequency domain process between adjacent frames to generate an offset to refine the prediction result. Furthermore, the time step feature aggregation module integrates time step features based on the information density and hierarchically extracts past motion information as prior knowledge to capture the motion recurrence. Extensive experiments have shown that the proposed network outperforms related methods, achieving a Mean Absolute Error (MAE) of 4.5865° on simple background sequences and 7.1325° on complex background sequences.","PeriodicalId":338734,"journal":{"name":"2023 18th International Conference on Machine Vision and Applications (MVA)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115430653","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Joint Learning with Group Relation and Individual Action 群体关系与个体行为的共同学习
2023 18th International Conference on Machine Vision and Applications (MVA) Pub Date : 2023-07-23 DOI: 10.23919/MVA57639.2023.10215994
Chihiro Nakatani, Hiroaki Kawashima, N. Ukita
{"title":"Joint Learning with Group Relation and Individual Action","authors":"Chihiro Nakatani, Hiroaki Kawashima, N. Ukita","doi":"10.23919/MVA57639.2023.10215994","DOIUrl":"https://doi.org/10.23919/MVA57639.2023.10215994","url":null,"abstract":"This paper proposes a method for group relation learning. Different from related work in which the manual annotation of group activities is required for supervised learning, we propose group relation learning without group activity annotation through recognition of individual action that can be more easily annotated than group activities defined with complex inter-people relationships. Our method extracts features informative for recognizing the action of each person by conditioning the group relation with the location of this person. A variety of experimental results demonstrate that our method outperforms SOTA methods quantitatively and qualitatively on two public datasets.","PeriodicalId":338734,"journal":{"name":"2023 18th International Conference on Machine Vision and Applications (MVA)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114423563","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Leveraging Embedding Information to Create Video Capsule Endoscopy Datasets 利用嵌入信息创建视频胶囊内窥镜数据集
2023 18th International Conference on Machine Vision and Applications (MVA) Pub Date : 2023-07-23 DOI: 10.23919/MVA57639.2023.10215919
Pere Gilabert, C. Malagelada, Hagen Wenzek, Jordi Vitrià, S. Seguí
{"title":"Leveraging Embedding Information to Create Video Capsule Endoscopy Datasets","authors":"Pere Gilabert, C. Malagelada, Hagen Wenzek, Jordi Vitrià, S. Seguí","doi":"10.23919/MVA57639.2023.10215919","DOIUrl":"https://doi.org/10.23919/MVA57639.2023.10215919","url":null,"abstract":"As the field of deep learning continues to expand, it has become increasingly apparent that large volumes of data are needed to train algorithms effectively. This is particularly challenging in the endoscopic capsule field, where obtaining and labeling sufficient data can be expensive and time-consuming. To overcome these challenges, we have developed an automatic method of video selection that uses the diversity of unlabeled videos to identify the most relevant videos for labeling. The findings indicate a significant improvement in performance with the implementation of this new methodology. The system selects relevant and diverse videos, achieving high accuracy in the classification task. This translates to less workload for annotators as they can label fewer videos while maintaining the same accuracy level in the classification task.","PeriodicalId":338734,"journal":{"name":"2023 18th International Conference on Machine Vision and Applications (MVA)","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114795101","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信