Proceedings of the 2023 6th International Conference on Machine Vision and Applications最新文献

筛选
英文 中文
Object-Based Vehicle Color Recognition in Uncontrolled Environment 非受控环境下基于目标的车辆颜色识别
Panumate Chetprayoon, Theerat Sakdejayont, Monchai Lertsutthiwong
{"title":"Object-Based Vehicle Color Recognition in Uncontrolled Environment","authors":"Panumate Chetprayoon, Theerat Sakdejayont, Monchai Lertsutthiwong","doi":"10.1145/3589572.3589585","DOIUrl":"https://doi.org/10.1145/3589572.3589585","url":null,"abstract":"The demand for vehicle recognition significantly increases with impact on many businesses in recent decades. This paper focuses on a vehicle color attribute. A novel method for vehicle color recognition is introduced to overcome three challenges of vehicle color recognition. The first challenge is an uncontrolled environment such as shadow, brightness, and reflection. Second, similar color is hard to be taken into account. Third, few research works dedicate to multi-color vehicle recognition. Previous works can provide only color information of the whole vehicle, but not at vehicle part level. In this study, a new approach for recognizing the colors of vehicles at the part level is introduced. It utilizes object detection techniques to identify the colors based on the different objects (e.g. parts of a vehicle in this research). In addition, a novel generic post-processing is proposed to improve robustness in the uncontrolled environment and support not only single-color but also multi-color vehicles. Experimental results show that it can effectively identify the color under the three challenges addressed above with 99 % accuracy for single-color vehicle and outperforms the other seven baseline models, and 76 % accuracy for multi-color vehicle.","PeriodicalId":296325,"journal":{"name":"Proceedings of the 2023 6th International Conference on Machine Vision and Applications","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114827819","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SkeletonGAN: Fine-Grained Pose Synthesis of Human-Object Interactions 骷髅:人与物体交互的细粒度姿势合成
Qixuan Sun, Nanxi Chen, Ruipeng Zhang, Jiamao Li, Xiaolin Zhang
{"title":"SkeletonGAN: Fine-Grained Pose Synthesis of Human-Object Interactions","authors":"Qixuan Sun, Nanxi Chen, Ruipeng Zhang, Jiamao Li, Xiaolin Zhang","doi":"10.1145/3589572.3589579","DOIUrl":"https://doi.org/10.1145/3589572.3589579","url":null,"abstract":"Synthesizing Human-Object Interactions (HOI) is a challenging problem since the human body has a complex and versatile representation. Existing solutions can generate individual objects or faces very well but still face difficulty in generating realistic human bodies and their interaction with multiple objects. In this work, we focus on synthesizing human poses based on HOI descriptive triplets and introduce a novel perspective that decomposes every action between humans and objects into sub-actions of human body parts to generate body poses in a fine-grained way. We propose SkeletonGAN, a conditional generative adversarial model to perform a body-parts-level control over the interaction between humans and objects. SkeletonGAN is trained and evaluated using the HICO-DET dataset, which is a knowledge base consisting of complex interaction poses of various human-object actions in realistic scenarios. We show through qualitative and quantitative evaluations that this model is capable of generating diverse and plausible poses consistent with the given semantic features, and especially our model can also predict the relative position of the object with the body pose. We also explore synthesizing composite poses that include co-occurring human actions, indicating that the model can learn multimodal relationships between human poses and the given conditional semantic features.","PeriodicalId":296325,"journal":{"name":"Proceedings of the 2023 6th International Conference on Machine Vision and Applications","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130193441","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-Scale Feature Enhancement Network for Face Forgery Detection 人脸伪造检测的多尺度特征增强网络
Zhiyuan Ma, Xue Mei, Hao Chen, Jienan Shen
{"title":"Multi-Scale Feature Enhancement Network for Face Forgery Detection","authors":"Zhiyuan Ma, Xue Mei, Hao Chen, Jienan Shen","doi":"10.1145/3589572.3589577","DOIUrl":"https://doi.org/10.1145/3589572.3589577","url":null,"abstract":"Nowadays, synthesizing realistic fake face images and videos becomes easy benefiting from the advance in generation technology. With the popularity of face forgery, abuse of the technology occurs from time to time, which promotes the research on face forgery detection to be an emergency. To deal with the potential risks, we propose a face forgery detection method based on multi-scale feature enhancement. Specifically, we analyze the forgery traces from the perspective of texture and frequency domain, respectively. We find that forgery traces are hard to be perceived by human eyes but noticeable in shallow layers of CNNs and middle-frequency domain and high-frequency domain. Hence, to reserve more forgery information, we design a texture feature enhancement module and a frequency domain feature enhancement module, respectively. The experiments on FaceForensics++ dataset and Celeb-DF dataset show that our method exceeds most existing networks and methods, which proves that our method has strong classification ability.","PeriodicalId":296325,"journal":{"name":"Proceedings of the 2023 6th International Conference on Machine Vision and Applications","volume":"236 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124593534","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Multistage Framework for Detection of Very Small Objects 一种用于微小目标检测的多级框架
Duleep Rathgamage Don, Ramazan S. Aygun, M. Karakaya
{"title":"A Multistage Framework for Detection of Very Small Objects","authors":"Duleep Rathgamage Don, Ramazan S. Aygun, M. Karakaya","doi":"10.1145/3589572.3589574","DOIUrl":"https://doi.org/10.1145/3589572.3589574","url":null,"abstract":"Small object detection is one of the most challenging problems in computer vision. Algorithms based on state-of-the-art object detection methods such as R-CNN, SSD, FPN, and YOLO fail to detect objects of very small sizes. In this study, we propose a novel method to detect very small objects, smaller than 8×8 pixels, that appear in a complex background. The proposed method is a multistage framework consisting of an unsupervised algorithm and three separately trained supervised algorithms. The unsupervised algorithm extracts ROIs from a high-resolution image. Then the ROIs are upsampled using SRGAN, and the enhanced ROIs are detected by our two-stage cascade classifier based on two ResNet50 models. The maximum size of the images used for training the proposed framework is 32×32 pixels. The experiments are conducted using rescaled German Traffic Sign Recognition Benchmark dataset (GTSRB) and downsampled German Traffic Sign Detection Benchmark dataset (GTSDB). Unlike MS COCO and DOTA datasets, the resulting GTSDB turns out to be very challenging for any small object detection algorithm due to not only the size of objects of interest but the complex textures of the background as well. Our experimental results show that the proposed method detects small traffic signs with an average precision of 0.332 at the intersection over union of 0.3.","PeriodicalId":296325,"journal":{"name":"Proceedings of the 2023 6th International Conference on Machine Vision and Applications","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133576939","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On the use of synthetic images in deep learning for defect recognition in industrial infrastructures 基于深度学习的合成图像在工业基础设施缺陷识别中的应用
Clément Mailhé, A. Ammar, F. Chinesta
{"title":"On the use of synthetic images in deep learning for defect recognition in industrial infrastructures","authors":"Clément Mailhé, A. Ammar, F. Chinesta","doi":"10.1145/3589572.3589584","DOIUrl":"https://doi.org/10.1145/3589572.3589584","url":null,"abstract":"The use of synthetic images in deep learning for object detection applications is recognized as a key technological lever in reducing time and cost constraints associated with data-driven processes. In this work, the applicability of training an instance recognition algorithm on a synthetic database in an industrial context is assessed based on the detection of dents in pipes. Photo-realistic artificial images are procedurally generated using a rendering software and used for the training of the YOLOv5 object recognition algorithm. Its prediction effectiveness is assessed on a small test set in different configurations to identify improvement steps towards the reliable use of artificial data in computer-vision.","PeriodicalId":296325,"journal":{"name":"Proceedings of the 2023 6th International Conference on Machine Vision and Applications","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127262188","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Automatically Design Lightweight Neural Architectures for Facial Expression Recognition 面部表情识别的自动设计轻量级神经结构
Xiaoyu Han
{"title":"Automatically Design Lightweight Neural Architectures for Facial Expression Recognition","authors":"Xiaoyu Han","doi":"10.1145/3589572.3589587","DOIUrl":"https://doi.org/10.1145/3589572.3589587","url":null,"abstract":"Facial expression recognition (FER) is a popular direction researched in the field of human-computer interaction. Recently, most of the work in the direction of FER are with the help of convolutional neutral networks (CNNs). However, most of the CNNs used for FER are designed by humans, and the design process is time-consuming and highly relies on the domain expertise. To address this problem, some methods are proposed based on neural architecture search (NAS), which can automatically design neural architectures. Nevertheless, those methods mainly focus on the accuracy of the recognition, but the model size of the designed architecture is often large, which limits the deployment of the architecture on devices with limited computing resources, such as mobile devices. In this paper, a novel approach named AutoFER-L is proposed for automatically designing lightweight CNNs for FER. Specifically, the accuracy of recognition and the model size are both considered in the objective functions, thus the resulting architectures can be both accurate and lightweight. We conduct experiments on CK+ and FER2013, which are popular benchmark datasets for FER. The experimental results show that the CNN architectures designed by the proposed method are more accurate and lighter than the handcrafted models and the models derived by standard NAS.","PeriodicalId":296325,"journal":{"name":"Proceedings of the 2023 6th International Conference on Machine Vision and Applications","volume":"2013 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121551512","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Detection of Conversational Health in a Multimodal Conversation Graph by Measuring Emotional Concordance 基于情绪一致性的多模态会话图会话健康检测
Kruthika Suresh, Mayuri D Patil, Shrikar Madhu, Yousha Mahamuni, Bhaskarjyoti Das
{"title":"Detection of Conversational Health in a Multimodal Conversation Graph by Measuring Emotional Concordance","authors":"Kruthika Suresh, Mayuri D Patil, Shrikar Madhu, Yousha Mahamuni, Bhaskarjyoti Das","doi":"10.1145/3589572.3589588","DOIUrl":"https://doi.org/10.1145/3589572.3589588","url":null,"abstract":"With the advent of social media and technology, the increased connections between individuals and organizations have led to a similar increase in the number of conversations. These conversations, in most cases are bimodal in nature, consisting of both images and text. Existing work in multimodal conversation typically focuses on individual utterances rather than the overall dialogue. The aspect of conversational health is important in many real world conversational uses cases including the emerging world of Metaverse. The work described in this paper investigates conversational health from the viewpoint of emotional concordance in bimodal conversations modelled as graphs. Using this framework, an existing multimodal dialogue dataset has been reformatted as a graph dataset that is labelled with the emotional concordance score. In this work, determination of conversational health has been framed as a graph classification problem. A graph neural network based model using algorithms such as Graph Convolution Network and Graph Attention Network is then used to detect the emotional concordance or discordance based upon the multimodal conversation that is provided. The model proposed in this paper achieves an overall F1 Score of 0.71 for equally sized class training and testing size, which offers improved results compared to previous models using the same benchmark dataset.","PeriodicalId":296325,"journal":{"name":"Proceedings of the 2023 6th International Conference on Machine Vision and Applications","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123492524","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Vision-based mobile analysis of roadside guardrail structures 基于视觉的路边护栏结构移动分析
Csaba Beleznai, Kai Göbel, C. Stefan, P. Dorninger, A. Pusica
{"title":"Vision-based mobile analysis of roadside guardrail structures","authors":"Csaba Beleznai, Kai Göbel, C. Stefan, P. Dorninger, A. Pusica","doi":"10.1145/3589572.3589597","DOIUrl":"https://doi.org/10.1145/3589572.3589597","url":null,"abstract":"Vision-based analysis of the roadside infrastructure is a research field of growing relevance, since autonomous driving, roadside asset digitization and mapping are key emerging applications. The advancement of Deep Learning for vision-based environment perception represents a core enabling technology to interpret scenes in terms of its objects and their spatial relations. In this paper we present a multi-sensory mobile analysis systemic concept, which targets the structural classification of roadside guardrail structures, and allows for digital measurements within the scene surrounding the guardrail objects. We propose an RGB-D vision-based analysis pipeline to perform semantic segmentation and metric dimension estimation of key structural elements of a given guardrail segment. We demonstrate that the semantic segmentation task can be fully learned in the synthetic domain and deployed with a high accuracy in the real domain. Based on guardrail structural measurements aggregated and tracked over time, our pipeline estimates one or several type-labels for the observed guardrail structure, based on a prior catalog of all possible types. The paper presents qualitative and quantitative results from experiments using our measurement vehicle and covering 100km in total. Obtained results demonstrate that the presented mobile analysis framework can well delineate roadside guardrail structures spatially, and able to propose a limited set of type-candidates. The paper also discusses failure modes and possible future improvements towards accomplishing digital mapping and recognition of safety-critical roadside assets.","PeriodicalId":296325,"journal":{"name":"Proceedings of the 2023 6th International Conference on Machine Vision and Applications","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122736685","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Efficient Noisy Label Learning Method with Semi-supervised Learning: An Efficient Noisy Label Learning Method with Semi-supervised Learning 半监督学习的高效噪声标签学习方法半监督学习的高效噪声标签学习方法
Jihee Kim, Sangki Park, Si-Dong Roh, Ki-Seok Chung
{"title":"An Efficient Noisy Label Learning Method with Semi-supervised Learning: An Efficient Noisy Label Learning Method with Semi-supervised Learning","authors":"Jihee Kim, Sangki Park, Si-Dong Roh, Ki-Seok Chung","doi":"10.1145/3589572.3589596","DOIUrl":"https://doi.org/10.1145/3589572.3589596","url":null,"abstract":"Even though deep learning models make success in many application areas, it is well-known that they are vulnerable to data noise. Therefore, researches on a model that detects and removes noisy data or the one that operates robustly against noisy data have been actively conducted. However, most existing approaches have limitations in either that important information could be left out while noisy data are cleaned up or that prior information on the dataset is required while such information may not be easily available. In this paper, we propose an effective semi-supervised learning method with model ensemble and parameter scheduling techniques. Our experiment results show that the proposed method achieves the best accuracy under 20% and 40% noise-ratio conditions. The proposed model is robust to data noise, suffering from only 2.08% of accuracy degradation when the noise ratio increases from 20% to 60% on CIFAR-10. We additionally perform an ablation study to verify net accuracy enhancement by applying one technique after another.","PeriodicalId":296325,"journal":{"name":"Proceedings of the 2023 6th International Conference on Machine Vision and Applications","volume":"96 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123341431","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Predicting Stenosis in Coronary Arteries based on Deep Neural Network using Non-Contrast and Contrast Cardiac CT images 基于非对比和对比心脏CT图像的深度神经网络预测冠状动脉狭窄
Masaki Aono, Testuya Asakawa, Hiroki Shinoda, K. Shimizu, T. Komoda
{"title":"Predicting Stenosis in Coronary Arteries based on Deep Neural Network using Non-Contrast and Contrast Cardiac CT images","authors":"Masaki Aono, Testuya Asakawa, Hiroki Shinoda, K. Shimizu, T. Komoda","doi":"10.1145/3589572.3589595","DOIUrl":"https://doi.org/10.1145/3589572.3589595","url":null,"abstract":"In this paper, we demonstrate two different methods to predict stenosis, given non-contrast and contrast heart CT scan images, respectively. As far as we know, non-contrast heart CT images have been hardly used for predicting stenosis, since non-contrast CT images generally do not show the coronary arteries (LCX, LAD, RCA, LMT) distinctively. However, if it is possible to predict stenosis with non-contrast CT images, we believe it is beneficial for patients because they do not suffer from side effects of contrast agents. Our demonstration for non-contrast CT image depends upon the relationship between calcification and stenosis. According to physicians, 90% of stenosis accompanies calcification in coronary arteries. On the other hand, we have also conducted experiments with contrast heart CT scan images, where coronary arteries are rendered as “straightened circumferentially”. This second approach using contrast CT image can be reduced to binary classification problem. From our experiments, we demonstrate that our two approaches defined as multi-label, multi-class classification problem using non-contrast CT images and binary classification problem using contrast CT images, respectively, with deep neural networks as classifiers, are very promising. We also note that our data in non-contrast and contrast CT images have both able-bodied (or healthy) subjects as well as patients, which makes us believe it is practical when the methods are incorporated into supporting a real stenosis diagnosis system.","PeriodicalId":296325,"journal":{"name":"Proceedings of the 2023 6th International Conference on Machine Vision and Applications","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121281955","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信