{"title":"Object-Based Vehicle Color Recognition in Uncontrolled Environment","authors":"Panumate Chetprayoon, Theerat Sakdejayont, Monchai Lertsutthiwong","doi":"10.1145/3589572.3589585","DOIUrl":"https://doi.org/10.1145/3589572.3589585","url":null,"abstract":"The demand for vehicle recognition significantly increases with impact on many businesses in recent decades. This paper focuses on a vehicle color attribute. A novel method for vehicle color recognition is introduced to overcome three challenges of vehicle color recognition. The first challenge is an uncontrolled environment such as shadow, brightness, and reflection. Second, similar color is hard to be taken into account. Third, few research works dedicate to multi-color vehicle recognition. Previous works can provide only color information of the whole vehicle, but not at vehicle part level. In this study, a new approach for recognizing the colors of vehicles at the part level is introduced. It utilizes object detection techniques to identify the colors based on the different objects (e.g. parts of a vehicle in this research). In addition, a novel generic post-processing is proposed to improve robustness in the uncontrolled environment and support not only single-color but also multi-color vehicles. Experimental results show that it can effectively identify the color under the three challenges addressed above with 99 % accuracy for single-color vehicle and outperforms the other seven baseline models, and 76 % accuracy for multi-color vehicle.","PeriodicalId":296325,"journal":{"name":"Proceedings of the 2023 6th International Conference on Machine Vision and Applications","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114827819","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SkeletonGAN: Fine-Grained Pose Synthesis of Human-Object Interactions","authors":"Qixuan Sun, Nanxi Chen, Ruipeng Zhang, Jiamao Li, Xiaolin Zhang","doi":"10.1145/3589572.3589579","DOIUrl":"https://doi.org/10.1145/3589572.3589579","url":null,"abstract":"Synthesizing Human-Object Interactions (HOI) is a challenging problem since the human body has a complex and versatile representation. Existing solutions can generate individual objects or faces very well but still face difficulty in generating realistic human bodies and their interaction with multiple objects. In this work, we focus on synthesizing human poses based on HOI descriptive triplets and introduce a novel perspective that decomposes every action between humans and objects into sub-actions of human body parts to generate body poses in a fine-grained way. We propose SkeletonGAN, a conditional generative adversarial model to perform a body-parts-level control over the interaction between humans and objects. SkeletonGAN is trained and evaluated using the HICO-DET dataset, which is a knowledge base consisting of complex interaction poses of various human-object actions in realistic scenarios. We show through qualitative and quantitative evaluations that this model is capable of generating diverse and plausible poses consistent with the given semantic features, and especially our model can also predict the relative position of the object with the body pose. We also explore synthesizing composite poses that include co-occurring human actions, indicating that the model can learn multimodal relationships between human poses and the given conditional semantic features.","PeriodicalId":296325,"journal":{"name":"Proceedings of the 2023 6th International Conference on Machine Vision and Applications","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130193441","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multi-Scale Feature Enhancement Network for Face Forgery Detection","authors":"Zhiyuan Ma, Xue Mei, Hao Chen, Jienan Shen","doi":"10.1145/3589572.3589577","DOIUrl":"https://doi.org/10.1145/3589572.3589577","url":null,"abstract":"Nowadays, synthesizing realistic fake face images and videos becomes easy benefiting from the advance in generation technology. With the popularity of face forgery, abuse of the technology occurs from time to time, which promotes the research on face forgery detection to be an emergency. To deal with the potential risks, we propose a face forgery detection method based on multi-scale feature enhancement. Specifically, we analyze the forgery traces from the perspective of texture and frequency domain, respectively. We find that forgery traces are hard to be perceived by human eyes but noticeable in shallow layers of CNNs and middle-frequency domain and high-frequency domain. Hence, to reserve more forgery information, we design a texture feature enhancement module and a frequency domain feature enhancement module, respectively. The experiments on FaceForensics++ dataset and Celeb-DF dataset show that our method exceeds most existing networks and methods, which proves that our method has strong classification ability.","PeriodicalId":296325,"journal":{"name":"Proceedings of the 2023 6th International Conference on Machine Vision and Applications","volume":"236 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124593534","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Duleep Rathgamage Don, Ramazan S. Aygun, M. Karakaya
{"title":"A Multistage Framework for Detection of Very Small Objects","authors":"Duleep Rathgamage Don, Ramazan S. Aygun, M. Karakaya","doi":"10.1145/3589572.3589574","DOIUrl":"https://doi.org/10.1145/3589572.3589574","url":null,"abstract":"Small object detection is one of the most challenging problems in computer vision. Algorithms based on state-of-the-art object detection methods such as R-CNN, SSD, FPN, and YOLO fail to detect objects of very small sizes. In this study, we propose a novel method to detect very small objects, smaller than 8×8 pixels, that appear in a complex background. The proposed method is a multistage framework consisting of an unsupervised algorithm and three separately trained supervised algorithms. The unsupervised algorithm extracts ROIs from a high-resolution image. Then the ROIs are upsampled using SRGAN, and the enhanced ROIs are detected by our two-stage cascade classifier based on two ResNet50 models. The maximum size of the images used for training the proposed framework is 32×32 pixels. The experiments are conducted using rescaled German Traffic Sign Recognition Benchmark dataset (GTSRB) and downsampled German Traffic Sign Detection Benchmark dataset (GTSDB). Unlike MS COCO and DOTA datasets, the resulting GTSDB turns out to be very challenging for any small object detection algorithm due to not only the size of objects of interest but the complex textures of the background as well. Our experimental results show that the proposed method detects small traffic signs with an average precision of 0.332 at the intersection over union of 0.3.","PeriodicalId":296325,"journal":{"name":"Proceedings of the 2023 6th International Conference on Machine Vision and Applications","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133576939","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On the use of synthetic images in deep learning for defect recognition in industrial infrastructures","authors":"Clément Mailhé, A. Ammar, F. Chinesta","doi":"10.1145/3589572.3589584","DOIUrl":"https://doi.org/10.1145/3589572.3589584","url":null,"abstract":"The use of synthetic images in deep learning for object detection applications is recognized as a key technological lever in reducing time and cost constraints associated with data-driven processes. In this work, the applicability of training an instance recognition algorithm on a synthetic database in an industrial context is assessed based on the detection of dents in pipes. Photo-realistic artificial images are procedurally generated using a rendering software and used for the training of the YOLOv5 object recognition algorithm. Its prediction effectiveness is assessed on a small test set in different configurations to identify improvement steps towards the reliable use of artificial data in computer-vision.","PeriodicalId":296325,"journal":{"name":"Proceedings of the 2023 6th International Conference on Machine Vision and Applications","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127262188","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Automatically Design Lightweight Neural Architectures for Facial Expression Recognition","authors":"Xiaoyu Han","doi":"10.1145/3589572.3589587","DOIUrl":"https://doi.org/10.1145/3589572.3589587","url":null,"abstract":"Facial expression recognition (FER) is a popular direction researched in the field of human-computer interaction. Recently, most of the work in the direction of FER are with the help of convolutional neutral networks (CNNs). However, most of the CNNs used for FER are designed by humans, and the design process is time-consuming and highly relies on the domain expertise. To address this problem, some methods are proposed based on neural architecture search (NAS), which can automatically design neural architectures. Nevertheless, those methods mainly focus on the accuracy of the recognition, but the model size of the designed architecture is often large, which limits the deployment of the architecture on devices with limited computing resources, such as mobile devices. In this paper, a novel approach named AutoFER-L is proposed for automatically designing lightweight CNNs for FER. Specifically, the accuracy of recognition and the model size are both considered in the objective functions, thus the resulting architectures can be both accurate and lightweight. We conduct experiments on CK+ and FER2013, which are popular benchmark datasets for FER. The experimental results show that the CNN architectures designed by the proposed method are more accurate and lighter than the handcrafted models and the models derived by standard NAS.","PeriodicalId":296325,"journal":{"name":"Proceedings of the 2023 6th International Conference on Machine Vision and Applications","volume":"2013 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121551512","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kruthika Suresh, Mayuri D Patil, Shrikar Madhu, Yousha Mahamuni, Bhaskarjyoti Das
{"title":"Detection of Conversational Health in a Multimodal Conversation Graph by Measuring Emotional Concordance","authors":"Kruthika Suresh, Mayuri D Patil, Shrikar Madhu, Yousha Mahamuni, Bhaskarjyoti Das","doi":"10.1145/3589572.3589588","DOIUrl":"https://doi.org/10.1145/3589572.3589588","url":null,"abstract":"With the advent of social media and technology, the increased connections between individuals and organizations have led to a similar increase in the number of conversations. These conversations, in most cases are bimodal in nature, consisting of both images and text. Existing work in multimodal conversation typically focuses on individual utterances rather than the overall dialogue. The aspect of conversational health is important in many real world conversational uses cases including the emerging world of Metaverse. The work described in this paper investigates conversational health from the viewpoint of emotional concordance in bimodal conversations modelled as graphs. Using this framework, an existing multimodal dialogue dataset has been reformatted as a graph dataset that is labelled with the emotional concordance score. In this work, determination of conversational health has been framed as a graph classification problem. A graph neural network based model using algorithms such as Graph Convolution Network and Graph Attention Network is then used to detect the emotional concordance or discordance based upon the multimodal conversation that is provided. The model proposed in this paper achieves an overall F1 Score of 0.71 for equally sized class training and testing size, which offers improved results compared to previous models using the same benchmark dataset.","PeriodicalId":296325,"journal":{"name":"Proceedings of the 2023 6th International Conference on Machine Vision and Applications","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123492524","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Csaba Beleznai, Kai Göbel, C. Stefan, P. Dorninger, A. Pusica
{"title":"Vision-based mobile analysis of roadside guardrail structures","authors":"Csaba Beleznai, Kai Göbel, C. Stefan, P. Dorninger, A. Pusica","doi":"10.1145/3589572.3589597","DOIUrl":"https://doi.org/10.1145/3589572.3589597","url":null,"abstract":"Vision-based analysis of the roadside infrastructure is a research field of growing relevance, since autonomous driving, roadside asset digitization and mapping are key emerging applications. The advancement of Deep Learning for vision-based environment perception represents a core enabling technology to interpret scenes in terms of its objects and their spatial relations. In this paper we present a multi-sensory mobile analysis systemic concept, which targets the structural classification of roadside guardrail structures, and allows for digital measurements within the scene surrounding the guardrail objects. We propose an RGB-D vision-based analysis pipeline to perform semantic segmentation and metric dimension estimation of key structural elements of a given guardrail segment. We demonstrate that the semantic segmentation task can be fully learned in the synthetic domain and deployed with a high accuracy in the real domain. Based on guardrail structural measurements aggregated and tracked over time, our pipeline estimates one or several type-labels for the observed guardrail structure, based on a prior catalog of all possible types. The paper presents qualitative and quantitative results from experiments using our measurement vehicle and covering 100km in total. Obtained results demonstrate that the presented mobile analysis framework can well delineate roadside guardrail structures spatially, and able to propose a limited set of type-candidates. The paper also discusses failure modes and possible future improvements towards accomplishing digital mapping and recognition of safety-critical roadside assets.","PeriodicalId":296325,"journal":{"name":"Proceedings of the 2023 6th International Conference on Machine Vision and Applications","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122736685","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jihee Kim, Sangki Park, Si-Dong Roh, Ki-Seok Chung
{"title":"An Efficient Noisy Label Learning Method with Semi-supervised Learning: An Efficient Noisy Label Learning Method with Semi-supervised Learning","authors":"Jihee Kim, Sangki Park, Si-Dong Roh, Ki-Seok Chung","doi":"10.1145/3589572.3589596","DOIUrl":"https://doi.org/10.1145/3589572.3589596","url":null,"abstract":"Even though deep learning models make success in many application areas, it is well-known that they are vulnerable to data noise. Therefore, researches on a model that detects and removes noisy data or the one that operates robustly against noisy data have been actively conducted. However, most existing approaches have limitations in either that important information could be left out while noisy data are cleaned up or that prior information on the dataset is required while such information may not be easily available. In this paper, we propose an effective semi-supervised learning method with model ensemble and parameter scheduling techniques. Our experiment results show that the proposed method achieves the best accuracy under 20% and 40% noise-ratio conditions. The proposed model is robust to data noise, suffering from only 2.08% of accuracy degradation when the noise ratio increases from 20% to 60% on CIFAR-10. We additionally perform an ablation study to verify net accuracy enhancement by applying one technique after another.","PeriodicalId":296325,"journal":{"name":"Proceedings of the 2023 6th International Conference on Machine Vision and Applications","volume":"96 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123341431","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Masaki Aono, Testuya Asakawa, Hiroki Shinoda, K. Shimizu, T. Komoda
{"title":"Predicting Stenosis in Coronary Arteries based on Deep Neural Network using Non-Contrast and Contrast Cardiac CT images","authors":"Masaki Aono, Testuya Asakawa, Hiroki Shinoda, K. Shimizu, T. Komoda","doi":"10.1145/3589572.3589595","DOIUrl":"https://doi.org/10.1145/3589572.3589595","url":null,"abstract":"In this paper, we demonstrate two different methods to predict stenosis, given non-contrast and contrast heart CT scan images, respectively. As far as we know, non-contrast heart CT images have been hardly used for predicting stenosis, since non-contrast CT images generally do not show the coronary arteries (LCX, LAD, RCA, LMT) distinctively. However, if it is possible to predict stenosis with non-contrast CT images, we believe it is beneficial for patients because they do not suffer from side effects of contrast agents. Our demonstration for non-contrast CT image depends upon the relationship between calcification and stenosis. According to physicians, 90% of stenosis accompanies calcification in coronary arteries. On the other hand, we have also conducted experiments with contrast heart CT scan images, where coronary arteries are rendered as “straightened circumferentially”. This second approach using contrast CT image can be reduced to binary classification problem. From our experiments, we demonstrate that our two approaches defined as multi-label, multi-class classification problem using non-contrast CT images and binary classification problem using contrast CT images, respectively, with deep neural networks as classifiers, are very promising. We also note that our data in non-contrast and contrast CT images have both able-bodied (or healthy) subjects as well as patients, which makes us believe it is practical when the methods are incorporated into supporting a real stenosis diagnosis system.","PeriodicalId":296325,"journal":{"name":"Proceedings of the 2023 6th International Conference on Machine Vision and Applications","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121281955","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}