Minh Dinh, Vu L. Bui, Doanh C. Bui, Duong Phi Long, Nguyen D. Vo, Khang Nguyen
{"title":"Performance Evaluation of Optimizers for Deformable-DETR in Natural Disaster Damage Assessment","authors":"Minh Dinh, Vu L. Bui, Doanh C. Bui, Duong Phi Long, Nguyen D. Vo, Khang Nguyen","doi":"10.1109/MAPR56351.2022.9924933","DOIUrl":"https://doi.org/10.1109/MAPR56351.2022.9924933","url":null,"abstract":"Global natural disasters are becoming more frequent and severe as a result of climate change. Recent advances in computer vision, particularly deep learning-based techniques and unmanned aerial vehicle (UAV) remote sensing, can aid disaster response teams in assessing the damage. Prior methods appear to be ineffective or were designed with inductive biases, making them difficult to conduct during the disaster damage assessment. In this paper, we investigate deep-learning-based methods capable of rapidly assessing building damage that follows natural disasters. Furthermore, we examine Deformable DETR, which is an improvement upon DETR, an object detection method based on the Transformer architecture, in terms of efficiency and convergence time, while inheriting DETR’s simple implementation and adaptable architecture, making it suitable for the task of damage detection. We also experimented and analyzed the performance of several optimizers to improve the performance of Deformable DETR.","PeriodicalId":138642,"journal":{"name":"2022 International Conference on Multimedia Analysis and Pattern Recognition (MAPR)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122409666","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
H. Nguyen, Hong-Quan Nguyen, Thuy-Binh Nguyen, Van-Chien Pham, Thi-Lan Le
{"title":"Exploiting matching local information for person re-identification","authors":"H. Nguyen, Hong-Quan Nguyen, Thuy-Binh Nguyen, Van-Chien Pham, Thi-Lan Le","doi":"10.1109/MAPR56351.2022.9924686","DOIUrl":"https://doi.org/10.1109/MAPR56351.2022.9924686","url":null,"abstract":"Person re-identification task with the main aim is to associate the instances of the same person captured by different cameras in a surveillance camera network usually employs the detection results. As a consequence, misalignment of detected bounding boxes and background information are the two main factors that lead to reducing the performance of person re-identification.To tackle with these challenges, the state-of-art in person re-identification methods proposed to employ attention mechanism or body parts detection. However, these methods have high complexity and computational cost, which can be reduced by using Earth Movers Distance (EMD) instead. Therefore, this paper formulates local matching as a distance calculation of two probability distributions and applies Earth Movers Distance (EMD) to compute the optimal matching between two sets of stripes in order to address an issue in the AlignedReID++ method. Different experiments have been conducted on both single-shot and multi-shot person re-identification. The obtained results have shown the improved performance of the proposed method compared with the baseline method. The matching rates at rank1 obtained by the proposed method are 49.59%, 83.36%, and 78.47% on VIPeR, Marketl501-Partial, and DukeMTMCReID-Partial, respectively.","PeriodicalId":138642,"journal":{"name":"2022 International Conference on Multimedia Analysis and Pattern Recognition (MAPR)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121793452","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Doanh C. Bui, N. Nguyen, Nguyen D. Vo, Uyen Han Thuy Thai, Khang Nguyen
{"title":"Vi-DRSNet: A Novel Hybrid Model for Vietnamese Image Captioning in Healthcare Domain","authors":"Doanh C. Bui, N. Nguyen, Nguyen D. Vo, Uyen Han Thuy Thai, Khang Nguyen","doi":"10.1109/MAPR56351.2022.9924781","DOIUrl":"https://doi.org/10.1109/MAPR56351.2022.9924781","url":null,"abstract":"Image Captioning is an exciting topic that attracts the research community from both computer vision and natural language processing fields. In this paper, we present a novel hybrid model, which is an effective combination of three modules: Dual-level Collaborative, Meshed-memory Decoder and Adaptive Decoder. In detail, we use Dual-level Collaborative for integrating grid features and region features. Besides, Meshed-memory Decoder is also employed to take advantage of all encoder outputs. Finally, the idea of an Adaptive Decoder is applied for embedding the Vietnamese linguistic aspect into decoding steps. Our approach achieves competitive results compared to other methods on the public and private tests of the VieCap4H benchmark without using any data augmentation method.","PeriodicalId":138642,"journal":{"name":"2022 International Conference on Multimedia Analysis and Pattern Recognition (MAPR)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124379024","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
T. Than, Duc Khanh Duy Danh, Huu Luong Nguyen, Minh-Son Nguyen
{"title":"Researching and Implementing the Posture Recognition Algorithm of the Elderly on Jetson Nano","authors":"T. Than, Duc Khanh Duy Danh, Huu Luong Nguyen, Minh-Son Nguyen","doi":"10.1109/MAPR56351.2022.9924968","DOIUrl":"https://doi.org/10.1109/MAPR56351.2022.9924968","url":null,"abstract":"Falls are a common phenomenon among the elderly. Falling not only causes serious physiological injuries such as fractures, head injuries, etc., but also causes psychological damage to the elderly. In addition to prevention, detecting falls in a timely manner can help limit the consequences of falls. In this paper, we present a fall detection method for the elderly using a neural network on Jetson Nano. The fall recognition model is built based on the Convolutional Neural Network (CNN) deep learning model. The model has functions like object body shape recognition, body recognition, and integrates a trained OpenPose algorithm model that allows receiving human body parts from which allows to predict object behavior through a Feed-Forward Networks (FFN). The experimental results on the real data set collected by us show that the proposed model is suitable for detecting falls in the elderly with an accuracy of 89.07% and the frame per second (FPS) on the Jetson Nano is 2.49.","PeriodicalId":138642,"journal":{"name":"2022 International Conference on Multimedia Analysis and Pattern Recognition (MAPR)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129994608","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Manh-Khanh Ngo Huu, V. Ngo, Thanh-Danh Nguyen, Vinh-Tiep Nguyen, T. Ngo
{"title":"Antique Photo Restoration and Colorization via Generative Model","authors":"Manh-Khanh Ngo Huu, V. Ngo, Thanh-Danh Nguyen, Vinh-Tiep Nguyen, T. Ngo","doi":"10.1109/MAPR56351.2022.9924704","DOIUrl":"https://doi.org/10.1109/MAPR56351.2022.9924704","url":null,"abstract":"In the past, many photographs of famous historical figures and moments were captured in back and white photos. Those captures are often distorted by the limitation of the old-style camera and the negative influence of the poor storing environment. It is obvious that the restoration and colorization of those images can make history lively. Since manually retouching images is time-consuming and hard to be done by people without aesthetic senses, many researchers have proposed models that automatically remove the artifacts in the old photos. However, these methods only solve either image restoration or colorization tasks which cannot fully address the task of image retouching. Consequently, in this work, we propose an effective end-to-end framework, named AIRC, for image retouching. Besides, previous works often use synthesized old photos for training but these pseudo datasets can not replicate exactly the real antique photo and prevent the trained model from being used in reality. To this end, we also introduce a new antique synthetic dataset, namely OldifiedScenes, that resembles real old photos by blending with paper and artifact textures. Quantitative and qualitative results are provided to demonstrate the effectiveness of our proposed method.","PeriodicalId":138642,"journal":{"name":"2022 International Conference on Multimedia Analysis and Pattern Recognition (MAPR)","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126963511","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"High-quality 3D Clothing Reconstruction and Virtual-Try-On: Pants case","authors":"Thanh Tuan Thai, Youngsik Yun, Heejune Ahn","doi":"10.1109/MAPR56351.2022.9924990","DOIUrl":"https://doi.org/10.1109/MAPR56351.2022.9924990","url":null,"abstract":"Virtual try-on (VTON) is filling the gap between online and offline shopping. This paper extends Cloth3D, which uses top clothing only, and proposes a pipeline for high-resolution virtual try-on for pants based on 3D clothing reconstruction. In-shop pants image is first reconstructed into 3D by finding the SMPL body model fitted to the pants and building the clothing model. Then, the clothing model is reposed to the human reference image and projected to a 2D image to get 3D warped pants. These warped pants and the identities from the reference person image are going through the blending network to get the try-on. Moreover, a target segmentation is also estimated for control input for the blending (in-painting) network. Our experiments and evaluation on a new fashion dataset show natural VTON results for service.","PeriodicalId":138642,"journal":{"name":"2022 International Conference on Multimedia Analysis and Pattern Recognition (MAPR)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130646592","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Layout-invariant license plate detection and recognition","authors":"Thi-Anh-Loan Trinh, T. Pham, Van-Dung Hoang","doi":"10.1109/MAPR56351.2022.9924802","DOIUrl":"https://doi.org/10.1109/MAPR56351.2022.9924802","url":null,"abstract":"Many current automatic license plate (LP) recognition systems are designed to handle a fixed form of LPs. In the present work, we develop an effective system using deep convolutional neuron network (CNN) that can process LPs with different layouts (e.g., variable character lengths, diverse colors, square-like and rectangular shapes). Firstly, we make an attempt of gathering a sufficient large and diverse Vietnamese LP dataset and manually creating the annotations for images. Secondly, a CNN model is derived to detect the LPs in images and predict the LP’s shape (i.e., one-row or two-row form). Thirdly, we design an efficient and unified CNN model to predict the characters of an input LP image patch. The proposed system has been extensively validated on two datasets (Vietnamese and Chinese LPs), demonstrating promising accuracy (e.g., 95% – 99%) and real-time CPU inference in comparison with the state-of-the-art approaches.","PeriodicalId":138642,"journal":{"name":"2022 International Conference on Multimedia Analysis and Pattern Recognition (MAPR)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116085798","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
H. Nguyen, Tai Quang Dinh Nguyen, Hien Nguyen Thi, B. Lap, Thi-Thu-Hong Phan
{"title":"The Use of Machine Learning Algorithms for Evaluating Water Quality Index: A Survey and Perspective","authors":"H. Nguyen, Tai Quang Dinh Nguyen, Hien Nguyen Thi, B. Lap, Thi-Thu-Hong Phan","doi":"10.1109/MAPR56351.2022.9924736","DOIUrl":"https://doi.org/10.1109/MAPR56351.2022.9924736","url":null,"abstract":"The quality of water is determined by its components, called the water parameters. The effect of each parameter on the water quality is different. To assess the water quality, sampling and measuring the value of these parameters are required. The water quality index (WQI) is a special indicator that integrates the value of many parameters into a single value. This value can be used to reflect effectively the quality of water. In this study, we present a survey on the application of machine learning (ML) method to estimate the WQI. A case study is also conducted to illustrate the use of the ML algorithm in the context.","PeriodicalId":138642,"journal":{"name":"2022 International Conference on Multimedia Analysis and Pattern Recognition (MAPR)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130405696","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Doan Duy, B. H. Hoang, Duy Xuan Bach Nguyen, Toan Nguyen Mau
{"title":"An Implementation of Low-Cost Auto-Balancing Embedded System for Safety Mechanisms*","authors":"Doan Duy, B. H. Hoang, Duy Xuan Bach Nguyen, Toan Nguyen Mau","doi":"10.1109/MAPR56351.2022.9924743","DOIUrl":"https://doi.org/10.1109/MAPR56351.2022.9924743","url":null,"abstract":"Auto-balancing systems have been introduced for decades and applied popularly in industry. However, applying auto-balancing systems in daily applications is still limited due to its high cost. This paper introduces an implementation of low cost auto-balancing embedded system toward typical applications such as safety chairs for kids, soft chairs for patients. To this end, we applies the 3-DOF Stewart model to design the balancing mechanism with the inverse Kinematics algorithm and Kalman filter for high accuracy of controlling. The implemented system is evaluated in different scenarios of object balancing with a relatively fast response, which shows the potential of this research for the targeted applications.","PeriodicalId":138642,"journal":{"name":"2022 International Conference on Multimedia Analysis and Pattern Recognition (MAPR)","volume":"103 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128207096","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Nguyen, Thi-Thu-Hong Le, Hoang-Bach Nguyen, Thanh-Tung Phan, Chi-Thanh Nguyen, Hai Vu
{"title":"Improving the Hand Pose Estimation from Egocentric Vision via HOPE-Net and Mask R-CNN","authors":"S. Nguyen, Thi-Thu-Hong Le, Hoang-Bach Nguyen, Thanh-Tung Phan, Chi-Thanh Nguyen, Hai Vu","doi":"10.1109/MAPR56351.2022.9924768","DOIUrl":"https://doi.org/10.1109/MAPR56351.2022.9924768","url":null,"abstract":"Hand pose estimation is the task of predicting the position and orientation of the hand and fingers relative to some coordinate system. It is an important task or input for applications in robotics, medical or human-computer interaction. In recent years, the success of deep convolutional neural networks and the popularity of low-cost consumer wearable cameras have made hand pose estimation on egocentric images using deep neural networks a hot topic in the computer vision field. This paper proposes a novel deep model for accurate 2D hand pose estimation that combines HOPE-Net, which estimates hand pose, and Mask R-CNN, which provides hand detection and segmentation to localize the hand in the image. First, HOPENet is used to predict the initial 2D hand pose, and the hand features are extracted from an image with a hand in the center, which is cropped from the original image based on Mask RCNN’s output. Then, we combine the initial 2D hand pose and the hand features into a fully connected layer to predict the 2D hand pose correctly. Our experiments show that the proposed model outperforms the original HOPE-Net in 2D hand pose estimation. The proposed method’s mean endpoint error (mEPE) is 48.82 pixels, while the mEPE of the 2D HOPE-Net predictor is 86.30 pixels on the First-Person Hand Action dataset.","PeriodicalId":138642,"journal":{"name":"2022 International Conference on Multimedia Analysis and Pattern Recognition (MAPR)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134391127","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}