Khanh Nguyen Quoc, Dan Pham Van, Van Pham Thi Bich
{"title":"An efficient method to improve the accuracy of Vietnamese vehicle license plate recognition in unconstrained environment","authors":"Khanh Nguyen Quoc, Dan Pham Van, Van Pham Thi Bich","doi":"10.1109/MAPR53640.2021.9585279","DOIUrl":"https://doi.org/10.1109/MAPR53640.2021.9585279","url":null,"abstract":"Background: Most previous studies in automatic license plate recognition (ALPR) focused on recognizing license plate (LP) in constrained environment where cameras are installed in front of LPs and other conditions such as lighting, weather, and image quality are satisfied. Besides, recent studies on ALPR in Vietnam have conducted in small datasets and have not covered various cases of Vietnamese LPs.Aim: To develop a model for ALPR that is effective in unconstrained environment in Vietnam.Method: We propose two improvements: We apply the idea of the key-point detection problem for LP detection part, and use a segmentation free approach based on encoder decoder network for the LP optical character recognition (OCR) part. We train and evaluate models in a large dataset collected from unconstrained environment.Results: Our results show improvements in LP detection accuracy with mean IOU mIOU = 95.01% and precision P75 = 99, 5%. The accuracy in LP OCR was up to Accseq = 99.28% at sequence level and Accchar = 99.7% at character level.Conclusion: We provide a large dataset of Vietnamese LP images that can be effectively used to evaluate ALPR systems in Vietnam, and proposes improvement techniques to tackle problems of ALPR in unconstrained environment in Vietnam.","PeriodicalId":233540,"journal":{"name":"2021 International Conference on Multimedia Analysis and Pattern Recognition (MAPR)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122782576","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Duc-Tuan Luu, Qu Dong, Lam M. Nguyen, Ngoc-Khanh Nguyen, Tiep V. Nguyen, M. Tran
{"title":"Beauty Moment Rendering via Face Happiness Scoring","authors":"Duc-Tuan Luu, Qu Dong, Lam M. Nguyen, Ngoc-Khanh Nguyen, Tiep V. Nguyen, M. Tran","doi":"10.1109/MAPR53640.2021.9585257","DOIUrl":"https://doi.org/10.1109/MAPR53640.2021.9585257","url":null,"abstract":"Nowadays, with the rapid development of technology and an energetic lifestyle, people tend to use electronic devices to capture moments. Thus, the need to have a delightful selfie is increasing. This work introduces a Beauty Moment Rendering via Face Happiness Scoring framework, which aims to generate a key-frame that summarizes a short selfie video. Given a set of consecutive photos as input, our approach selects a single image and renders a face for each person at their happiest moment. In order to select a suitable key-frame, we decide to combine the eyes and emotion information as our aggregated score. First, the method detects and tracks the faces in the video frames. Then, the face that has the highest score of each person will be rendered on the result key-frame with the eye gazes being redirected based on a generative model. To the best of our knowledge, we are among the first to address the problem, which comprises video summarization and face beautification.","PeriodicalId":233540,"journal":{"name":"2021 International Conference on Multimedia Analysis and Pattern Recognition (MAPR)","volume":"106 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127532704","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Manh-Khanh Ngo Huu, Sy-Tuyen Ho, Vinh-Tiep Nguyen, T. Ngo
{"title":"Multilingual-GAN: A Multilingual GAN-based Approach for Handwritten Generation","authors":"Manh-Khanh Ngo Huu, Sy-Tuyen Ho, Vinh-Tiep Nguyen, T. Ngo","doi":"10.1109/MAPR53640.2021.9585285","DOIUrl":"https://doi.org/10.1109/MAPR53640.2021.9585285","url":null,"abstract":"Handwritten Text Recognition (HTR) is a difficult problem because of the diversity of calligraphic styles. To enhance the accuracy of HTR systems, a large amount of training data is required. The previous methods aim at generating handwritten images from input strings via RNN models such as LSTM or GRU. However, these methods require a predefined alphabet corresponding to a given language. Thus, they can not well adapt to a new languages. To address this problem, we propose an Image2Image-based method named Multilingual-GAN, which translates a printed text image into a handwritten style one. The main advantage of this approach is that the model does not depend on any language alphabets. Therefore, our model can be used on a new language without re-training on a new dataset. The quantitative results demonstrate that our proposed method outperforms other state-of-the-art models. Code is available at https://github.com/HoSyTuyen/MultilingualGAN","PeriodicalId":233540,"journal":{"name":"2021 International Conference on Multimedia Analysis and Pattern Recognition (MAPR)","volume":"79 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127189560","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ta Thi Kim Hue, N. T. Linh, Minh Nguyen-Duc, T. Hoang
{"title":"Data Hiding in Bit-plane Medical Image Using Chaos-based Steganography","authors":"Ta Thi Kim Hue, N. T. Linh, Minh Nguyen-Duc, T. Hoang","doi":"10.1109/MAPR53640.2021.9585243","DOIUrl":"https://doi.org/10.1109/MAPR53640.2021.9585243","url":null,"abstract":"This paper proposes digital medical image steganographic schemes based on chaotic maps. Logistic and Cat maps are applied to choose positions of the embedded bits in cover images where the bits of the secret message will be hidden. Our schemes provide mutual substitution methods in a chaotic manner in order to choose pseudo-random positions. Three spatial steganographic algorithms are respectively called ISLM, ISCM and ISLCM using insertion methodology that provides a large embedding capacity and imperceptible steganographic image. The combination of a Logistic map and a Cat map is appropriate for the inserted message becoming secure against message recovery attacks. Those are efficient schemes with a high level of security requirements.","PeriodicalId":233540,"journal":{"name":"2021 International Conference on Multimedia Analysis and Pattern Recognition (MAPR)","volume":"103 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123354870","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Viet-Duc Le, Van-Nam Hoang, Tien Nguyen, Van-Hung Le, Thanh-Hai Tran, Hai Vu, Thi-Lan Le
{"title":"A Unified Deep Framework for Hand Pose Estimation and Dynamic Hand Action Recognition from First-Person RGB Videos","authors":"Viet-Duc Le, Van-Nam Hoang, Tien Nguyen, Van-Hung Le, Thanh-Hai Tran, Hai Vu, Thi-Lan Le","doi":"10.1109/MAPR53640.2021.9585280","DOIUrl":"https://doi.org/10.1109/MAPR53640.2021.9585280","url":null,"abstract":"Understanding hand action from the first-person video has emerged recently thanks to its wide potential applications such as hand rehabilitation, augmented reality. The majority of works mainly reply on RGB images. Compared with RGB images, hand joints have certain advantages as they are robust to illuminations and appearance variation. However, previous works for hand action recognition usually employed hand joints that are manually determined. This paper presents a unified framework for both hand pose estimation and hand action recognition from first-person RGB images. First, our framework estimates 3D hand joints from every RGB image using a combination of Resnet and a Graphical convolutional network. Then, an adaptation of a SOTA method PA-ResGCN for the human skeleton is proposed for hand action recognition from estimated hand joints. Our framework takes advantage of efficient graphical networks to model graph-like human hand structure in both phases: hand pose estimation and hand action recognition. We evaluate the proposed framework on the First Person Hand Action Benchmark (FPHAB). The experiments show that the proposed framework outperforms different SOTA methods on both hand pose estimation and hand action recognition tasks.","PeriodicalId":233540,"journal":{"name":"2021 International Conference on Multimedia Analysis and Pattern Recognition (MAPR)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115619811","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Thi-Oanh Ha, Hoang-Nhat Tran, Hong-Quan Nguyen, Thanh-Hai Tran, Phuong-Dung Nguyen, H. Doan, V. Nguyen, Hai Vu, Thi-Lan Le
{"title":"Improvement of People Counting by Pairing Head and Face Detections from Still Images","authors":"Thi-Oanh Ha, Hoang-Nhat Tran, Hong-Quan Nguyen, Thanh-Hai Tran, Phuong-Dung Nguyen, H. Doan, V. Nguyen, Hai Vu, Thi-Lan Le","doi":"10.1109/MAPR53640.2021.9585270","DOIUrl":"https://doi.org/10.1109/MAPR53640.2021.9585270","url":null,"abstract":"Video or image-based people counting in real-time has multiple applications in intelligent transportation, density estimation or class management, and so on. This problem is usually carried out by detecting people using conventional detectors. However, this approach can be failed when people stay in various postures or are occluded by each other. In this paper, we notice that even a main part of human body is occluded, their face and head are still observable. We then propose a method that counts people based on face and head detection and pairing. Instead of deploying only face or head detector, we apply both detectors as in many cases the human does not turn his/her face to camera then head detector takes advantage. Otherwise, face detector produces reliable results. The fact of combining both head and face detection results will lead to duplicated responses for one person. We then propose a simple yet effective alignment technique to pair a face with a head of a person. Subsequently, the remaining heads and faces which are not paired with any other faces or heads will be added to our people counter to increase the true positive rate. We evaluate our proposed method on four datasets (Hollywood, Casablanca, Wider Face, and our own dataset). The experimental results show an improvement of average precision and recall comparing to the original head or face detectors.","PeriodicalId":233540,"journal":{"name":"2021 International Conference on Multimedia Analysis and Pattern Recognition (MAPR)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122620951","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
T. Pham, Thi Thanh Thuy Pham, Sy Tuong Hoang, Viet-Cuong Ta
{"title":"Exploring Efficiency of GAN-based Generated URLs for Phishing URL Detection","authors":"T. Pham, Thi Thanh Thuy Pham, Sy Tuong Hoang, Viet-Cuong Ta","doi":"10.1109/MAPR53640.2021.9585287","DOIUrl":"https://doi.org/10.1109/MAPR53640.2021.9585287","url":null,"abstract":"The URL (Uniform Resource Locator) is used to refer to the resources on the Internet by giving hyperlinks to the websites. Different resources are referenced by different network addresses or different URLs. As a result, embedding malware on websites by using malicious URLs is one of the most dangerous types of cyberattacks today and poses a serious threats to the safety of systems. In order to detect the phishing URLs, the most commonly used approach recently is using deep learning networks with a large number of URL samples, including both malign and benign ones for training the deep networks. However, the available URL databases have a modest number of samples. In addition, the disadvantage of these databases is the imbalance distribution of malicious and non-malicious URL strings. In fact, it is difficult to collect or update malicious URLs because these URLs only exist for a short time, after being detected they are changed again and again. In order to solve this challenge, in this work, we propose to train a GAN network named WGAN-GP for generating malicious URLs from the available phishing URL data. We then integrate the generated phishing URL data into the existing URL database and perform two URL classifiers of LSTM and GRU to give the comparative results. The experiments on different quantities of URL samples show the improvement for URL classification by using WGAN-GP and LSTM classifier.","PeriodicalId":233540,"journal":{"name":"2021 International Conference on Multimedia Analysis and Pattern Recognition (MAPR)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114880944","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An improved spiking network conversion for image classification","authors":"Thu Quyen Nguyen, Q. Pham, Chi Hoang-Phuong, Quang Hieu Dang, Duc Minh Nguyen, Hoang Nguyen-Huy","doi":"10.1109/MAPR53640.2021.9585199","DOIUrl":"https://doi.org/10.1109/MAPR53640.2021.9585199","url":null,"abstract":"Image classification is always an interesting problem due to its practical applications in real life. With a capability of self-learning features, modern Convolution Neural Network (CNN) models can achieve high accuracy on large and complex benchmark datasets. However, due to their high computation costs, the CNN models experience energy consumption problems during training and implementation of the hardware which limits their utilisation in mobile and embedded applications. Recently, the Spiking Neural Network (SNN) has been proposed to overcome drawbacks of the CNN models. Like the biological nervous system, the SNN’s neurons communicate with each other by sending spike trains. A neuron is only calculated when a new input spike arrives. As a result, it turns the networks into an energy-saving mode which is suitable for implementation on hardware devices. To avoid the difficulty of the SNN direct training, an indirect training approach is proposed in this work. A proposed CNN model is firstly trained with the RMSprop algorithm then the optimised weights and bias are mapped to the SNN model converted from the proposed CNN model. Experimental results confirm that our model achieves the best accuracy of 93.5% when compared to state-of-the-art SNN approaches on the Fashion- MNIST dataset.","PeriodicalId":233540,"journal":{"name":"2021 International Conference on Multimedia Analysis and Pattern Recognition (MAPR)","volume":"118 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122465175","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Title Page","authors":"","doi":"10.1109/mapr53640.2021.9585291","DOIUrl":"https://doi.org/10.1109/mapr53640.2021.9585291","url":null,"abstract":"","PeriodicalId":233540,"journal":{"name":"2021 International Conference on Multimedia Analysis and Pattern Recognition (MAPR)","volume":"123 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133545754","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Quoc Trung Pham, Thu Quyen Nguyen, Chi Hoang-Phuong, Quang Hieu Dang, Duc Minh Nguyen, Hoang Nguyen-Huy
{"title":"A review of SNN implementation on FPGA","authors":"Quoc Trung Pham, Thu Quyen Nguyen, Chi Hoang-Phuong, Quang Hieu Dang, Duc Minh Nguyen, Hoang Nguyen-Huy","doi":"10.1109/MAPR53640.2021.9585245","DOIUrl":"https://doi.org/10.1109/MAPR53640.2021.9585245","url":null,"abstract":"Spiking Neural Network (SNN), the next generation of Neural Network, is supposed to be more energy-saving than the previous generation represented by Convolution Neural Network (CNN). Although CNNs have shown impressive results on various tasks such as natural language processing, image classification, or voice recognition using Graphical Processing Units (GPUs) for training, it is expensive and is not suitable for hardware implementation. The emergence of SNNs is a solution for CNNs in terms of energy consumption. In the dozen types of hardware, Field Programmable Gate Arrays (FPGAs) is a promising approach for SNN implementation on hardware. This paper provides a survey of a number of FGPA-based SNN implementations focused on some aspects such as neuron models, network architecture, training algorithms and applications. The survey provides the reader with a compact and informative insight into recent efforts in this domain.","PeriodicalId":233540,"journal":{"name":"2021 International Conference on Multimedia Analysis and Pattern Recognition (MAPR)","volume":"06 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130735292","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}