Dai Duong Tran, Truong Thinh Le, Minh Tam Duong, Minh Pham, Minh-Son Nguyen
{"title":"FPGA Design for Deep Q-Network: A case study in Cartpole Environment","authors":"Dai Duong Tran, Truong Thinh Le, Minh Tam Duong, Minh Pham, Minh-Son Nguyen","doi":"10.1109/MAPR56351.2022.9925007","DOIUrl":"https://doi.org/10.1109/MAPR56351.2022.9925007","url":null,"abstract":"Deep Reinforcement Learning is a subfield of Machine Learning that combines Reinforcement Learning and Deep Learning. In Reinforcement Learning, an agent interacts with the environment by giving an action then the environment will return a reward and state based on the action of the agent. The object of the agent is to maximize reward. For Deep Leaning, an agent uses a neural network to learning then make a decision or action. This paper introduces an FPGA design for Deep Q-Network to accelerate the execution time of Deep Reinforcement Learning including both training and inference phases. This paper uses the Cartpole environment as a case study and uses the target device Virtex7 VC707 FPGA.","PeriodicalId":138642,"journal":{"name":"2022 International Conference on Multimedia Analysis and Pattern Recognition (MAPR)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132094038","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Trong-Thuan Nguyen, Dung Truong, Nguyen D. Vo, Khang Nguyen
{"title":"GaDocNet: Rethinking the Anchoring Scheme and Loss Function in Vietnamese Document Images","authors":"Trong-Thuan Nguyen, Dung Truong, Nguyen D. Vo, Khang Nguyen","doi":"10.1109/MAPR56351.2022.9924997","DOIUrl":"https://doi.org/10.1109/MAPR56351.2022.9924997","url":null,"abstract":"In recent years, page object detection has received much attention from document image understanding. However, its application has many limitations in Vietnamese document images. In this paper, we address the page object detection problem in the Vietnamese document image. Specially, we experiment with four state-of-the-art object detection methods: Dynamic Faster R-CNN, Guided Anchoring Faster R-CNN, PointRend, and CascadeTabNet on the Vietnamese image document dataset named UIT-DODV. UIT-DODV dataset is the first Vietnamese document image dataset with four objects: Table, Figure, Caption, and Formula. In addition, we further evaluate the bounding box regression loss functions of the IoU family. Then we propose the EIoU loss function for efficiently page object detection in Vietnamese document images. Based on the preliminary experimental results, we present GaDocNet along with the EIoU loss function. The proposal achieves 76.1%, which is 1.6% higher than the baseline on the UIT-DODV dataset. Moreover, we evaluate with Deformable DETR, PAA, Reppoints, Foveabox, FSAF, and ATSS on UIT-DODV. The empirical evaluation points out the advantages of our approach, which is the foundation for further works.","PeriodicalId":138642,"journal":{"name":"2022 International Conference on Multimedia Analysis and Pattern Recognition (MAPR)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114604597","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Human action recognition from inertial sensors with Transformer","authors":"Trung-Hieu Le, Thanh-Hai Tran, Cuong Pham","doi":"10.1109/MAPR56351.2022.9924794","DOIUrl":"https://doi.org/10.1109/MAPR56351.2022.9924794","url":null,"abstract":"Human action recognition is an attractive research topic because it opens many practical applications such as healthcare, entertainment or robot interaction. Hand gestures in particular are becoming one of the most convenient means of communication between humans and machines. In this study, transformer model - a deep learning neural network developed primarily for the natural language processing and vision tasks, is investigated for analysis of time-series signals. The self-attention mechanism inherent in the transformer expresses individual dependencies between signal values within time series. As a result, it can boost the performance of state-of-the-art convolutional neural networks in terms of memory requirement and computational times. We evaluate the proposed method on three published sensor datasets (CMDFALL, C-MHAD and DaLiAc) and showed that the proposed method achieves better performance than conventional ones, specifically on the S3 group in the CMDFall data set, the F1 Score is 19.04 % higher than that of the conventional method. On C-MHAD dataset, the accuracy is up to 99.56 %. The results confirms the role of transformer models for human activity recognition.","PeriodicalId":138642,"journal":{"name":"2022 International Conference on Multimedia Analysis and Pattern Recognition (MAPR)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125147868","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"COVID-Net Network and Application on Support Diagnosis COVID-19 over X-ray Images","authors":"Thanh-Ha Do, H. Le, Trung-Hieu Ha","doi":"10.1109/MAPR56351.2022.9924841","DOIUrl":"https://doi.org/10.1109/MAPR56351.2022.9924841","url":null,"abstract":"Early diagnosis through X-ray images is the diagnosis with low cost, often used in hospitals to assist doctors in making health treatment plans. This paper presents a new approach for supporting the diagnosis of Covid-19 based on chest X-ray images. Specifically, this paper proposes using the Covid-net model for classifying the damage as Covid-19 or other causes. Data augmentation using seam carving was also researched and evaluated with different energy functions. The experimented results done on different databases are promising.","PeriodicalId":138642,"journal":{"name":"2022 International Conference on Multimedia Analysis and Pattern Recognition (MAPR)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126141239","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Framework for Evaluating Video Summary Approaches","authors":"Tien-Dung Mai, Tien Do, Duy-Dinh Le","doi":"10.1109/MAPR56351.2022.9924934","DOIUrl":"https://doi.org/10.1109/MAPR56351.2022.9924934","url":null,"abstract":"Video summarization is a crucial task to solve the explosion of video data. The goal of video summarizing is to create a shortened version of the original video while retaining its essential and pertinent content. In general, a video summary system is composed of three primary modules: shot boundary detection, shot scoring, and shot selection. However, existing research focuses exclusively on a single module, necessitating a comprehensive assessment when methods are changed across modules. In this study, we provide a framework for evaluating alternative techniques for the video summary problem that permits multiple combinations in different modules to evaluate the significance of adding method stages in video summaries. The analysis and combination results of the framework reveal that the combination of Uniform and DSNet Anchor-free provides state-of-the-art performance on the SumMe dataset. We also provide the framework source code1 for the community.","PeriodicalId":138642,"journal":{"name":"2022 International Conference on Multimedia Analysis and Pattern Recognition (MAPR)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126145040","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Detecting Reflectional Symmetry of Binary Shapes Based on Generalized R-Transform","authors":"Thanh Tuan Nguyen, T. Nguyen, Thanh-Hai Tran","doi":"10.1109/MAPR56351.2022.9924894","DOIUrl":"https://doi.org/10.1109/MAPR56351.2022.9924894","url":null,"abstract":"Analyzing reflectionally symmetric features inside an image is one of the important processes for recognizing the peculiar appearance of natural and man-made objects, biological patterns, etc. In this work, we will point out an efficient detector of reflectionally symmetric shapes by addressing a class of projection-based signatures that are structured by a generalized $mathcal{R}_{fm}$-transform model. To this end, we will firstly prove the $mathcal{R}_{fm^{-}}$transform in accordance with reflectional symmetry detection. Then different corresponding $mathcal{R}_{fm}$-signatures of binary shapes are evaluated in order to determine which the corresponding exponentiation of the $mathcal{R}_{fm}$-transform is the best for the detection. Experimental results of detecting on single/compound contour-based shapes have validated that the exponentiation of 10 is the most discriminatory, with over 2.7% better performance on the multiple-axis shapes in comparison with the conventional one. Additionally, the proposed detector also outperforms most of other existing methods. This finding should be recommended for applications in practice.","PeriodicalId":138642,"journal":{"name":"2022 International Conference on Multimedia Analysis and Pattern Recognition (MAPR)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125985279","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Efficient Construction for Path Expression in XML data","authors":"Uyen Han Thuy Thai","doi":"10.1109/MAPR56351.2022.9924945","DOIUrl":"https://doi.org/10.1109/MAPR56351.2022.9924945","url":null,"abstract":"XML and semi-structured data have been widely and significantly used as a standard for representing and exchanging data. Generally, they are modeled as a labeled directed graph. With a given path expression, it is desirable to choose nodes or node-sets fast and efficiently. To serve this purpose, the A(k)-index created a graph index based on the concept of bisimilarity. The naive approach of the A(k)-index that requires an exhaustive scan method for all partitions takes an expensive index construction cost. In this paper, we propose the New ImpIndexa new approach of index construction to reduce time by marking method. This technique marks the significant couple of partitions in scan work, then traverses only on these partitions and ignores the others for the next iterations. A salient property of the New ImpIndex is still stable with the big database and the large value of k. Moreover, we associate this approach with our previously proposed one: the Old ImpIndex to create the Asso ImpIndex for achieving higher performance. We experimentally demonstrate that our proposed algorithms: New ImpIndex and Asso ImpIndex show more advantages than the existing approach.","PeriodicalId":138642,"journal":{"name":"2022 International Conference on Multimedia Analysis and Pattern Recognition (MAPR)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130687128","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Tutorial","authors":"Michael H. Molenda, D. Subramony","doi":"10.4324/9781315194721-8","DOIUrl":"https://doi.org/10.4324/9781315194721-8","url":null,"abstract":"","PeriodicalId":138642,"journal":{"name":"2022 International Conference on Multimedia Analysis and Pattern Recognition (MAPR)","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128751426","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}