{"title":"A robust framework for mathematical formula detection","authors":"M. Tran, Tri Pham, Tien Nguyen, Tien Do, T. Ngo","doi":"10.1109/MAPR53640.2021.9585197","DOIUrl":"https://doi.org/10.1109/MAPR53640.2021.9585197","url":null,"abstract":"Mathematical formulas identification is a crucial step in the pipeline of many tasks such as mathematical information retrieval, storing digital science documents, etc. For basic mathematical formulas recognition, all these tasks need to detect the bounding boxes of mathematical expression as a prerequisite step. Currently, deep learning-based object detection methods work well for mathematical formula detection (MFD). These methods are divided into two categories: anchor self-study and anchor not self-study. The anchor self-study method is efficient with large quantity labels but not so well with small quantities, whereas the second type of method works better with small quantities. Therefore, we proposed an algorithm that keeps the good prediction of each type and then merges both into final results. To demonstrate the hypothesis, we select two typical object detection methods: YOLOv5 and Faster RCNN as the representation of two kind approaches to building an MFD framework. Our experiment results on ICDAR2021-MFD1 achieved the F1 score of the whole system is 89.3 while the single detector just reached 74.2, 88.9 (Faster RCNN and YOLOv5 respectively) that proving the effectiveness of the proposal.","PeriodicalId":233540,"journal":{"name":"2021 International Conference on Multimedia Analysis and Pattern Recognition (MAPR)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126534853","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Danh H. Vu, Long-Thuy Nguyen, Van-Tuan Nguyen, Thanh-Hai Tran, V. Dao, Hai Vu
{"title":"Boundary delineation of reflux esophagitis lesions from endoscopic images using color and texture","authors":"Danh H. Vu, Long-Thuy Nguyen, Van-Tuan Nguyen, Thanh-Hai Tran, V. Dao, Hai Vu","doi":"10.1109/MAPR53640.2021.9585290","DOIUrl":"https://doi.org/10.1109/MAPR53640.2021.9585290","url":null,"abstract":"Automatic assessment of medical images and endoscopic images in particular is an attractive research topic recent years. To achieve this goal, many tasks must be conducted for example lesions detection, segmentation and classification. In order to design suitable models for such tasks, it would be preferable to know at first: i) which characteristics that differentiate a lesion from a normal region; ii) how large is the boundary of these two regions that still allows to distinguish them. This paper presents an in-depth study of the role of color and texture features for delineation of boundary between a lesion region and a background region. To this end, from the groundtruth contour of a manually segmented lesion, we first expand two margins in two directions. We name inner margin in the lesion region and outer margin in the background region. We then extract color dependent features in different color spaces (HSV, RGB, Lab) and texture features (LBP, HOG, GLCM) on these two margins. Finally we deploy the Support Vector Machine (SVM) technique to classify two classes (lesion and non-lesion). Extensive experiments conducted on a dataset of endoscopic images answer to our aforementioned questions and give some suggestions for designing suitable models of lesion detection in the future.","PeriodicalId":233540,"journal":{"name":"2021 International Conference on Multimedia Analysis and Pattern Recognition (MAPR)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124995559","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Khanh Ho, H. Le, K. Nguyen, Thua Nguyen, Tien Do, T. Ngo, Thanh-Son Nguyen
{"title":"Unweighted Bipartite Matching For Robust Vehicle Counting","authors":"Khanh Ho, H. Le, K. Nguyen, Thua Nguyen, Tien Do, T. Ngo, Thanh-Son Nguyen","doi":"10.1109/MAPR53640.2021.9585273","DOIUrl":"https://doi.org/10.1109/MAPR53640.2021.9585273","url":null,"abstract":"Intelligent Transportation System (ITS) plays an essential role in smart cities. Through ITS, local authorities could handle enormous traffic flows with minimal effort and solve traffic-related problems such as traffic congestion or traffic regulation violating behaviours. In this work, we designed a system that has the ability to count vehicles moving in specific directions on the road. Such automated systems also have to deal with the diverse weather and instabilities in captured media, making current tracking algorithms become prone to errors. This problem is even more challenging in Vietnam and other developing countries, where traffic on the road is much more complex with the presence of small vehicles such as bicycles and motorbikes, thus tracking algorithms would be more likely to fail. Our proposed method for Track Joining was built on top of deepSORT, incorporating Taylor Expansion and Unweighted Bipartite Maximum Matching to predict missing movements or identify duplicated vehicle tracks, then attempt to merge them. In HCMC AI City Challenge 20201, our whole system outperforms other approaches by achieving the lowest overall RMSE score: an average of 1.39 fails per video segment on a benchmark dataset.","PeriodicalId":233540,"journal":{"name":"2021 International Conference on Multimedia Analysis and Pattern Recognition (MAPR)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131827252","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Exploring Zero-shot Cross-lingual Aspect-based Sentiment Analysis using Pre-trained Multilingual Language Models","authors":"Khoa Thi-Kim Phan, D. Hao, D. Thin, N. Nguyen","doi":"10.1109/MAPR53640.2021.9585242","DOIUrl":"https://doi.org/10.1109/MAPR53640.2021.9585242","url":null,"abstract":"Aspect-based sentiment analysis (ABSA) has received much attention in the Natural Language Processing research community. Most of the proposed methods are conducted exclusively in English and high-resources languages. Leveraging resources available from English and transferring to low-resources languages seems to be an immediate solution. In this paper, we investigate the performance of zero-shot cross-lingual transfer learning based on pre-trained multilingual models (mBERT and XLM-R) for two main sub-tasks in the ABSA problem: Aspect Category Detection and Opinion Target Expression. We experiment on the benchmark data sets of six languages as English, Russian, Dutch, Spanish, Turkish, and French. The experimental results demonstrated that using the XLM-R model can yield relatively acceptable results for the zero-shot cross-lingual scenario.","PeriodicalId":233540,"journal":{"name":"2021 International Conference on Multimedia Analysis and Pattern Recognition (MAPR)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115722820","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Siamese Attention and Point Adaptive Network for Visual Tracking","authors":"T. Dinh, Long Tran Quoc, Kien Thai Trung","doi":"10.1109/MAPR53640.2021.9585250","DOIUrl":"https://doi.org/10.1109/MAPR53640.2021.9585250","url":null,"abstract":"Siamese-based trackers have achieved excellent performance on visual object tracking. Most of the existing trackers usually compute the features of the target template and search image independently and rely on either a multi-scale searching scheme or pre-defined anchor boxes to accurately estimate the scale and aspect ratio of a target. This paper proposes Siamese attention and point adaptive head network referred to as SiamAPN for Visual Tracking. Siamese attention includes self-attention and cross-attention for feature enhancement and aggregating rich contextual inter-dependencies between the target template and the search image. And Point head network for bounding box prediction is both proposal and anchor-free. The proposed framework is simple and effective. Extensive experiments on visual tracking benchmarks, including OTB100, UAV123, and VOT2018, demonstrate that our tracker achieves state-of-the-art performance and runs at 45 FPS.","PeriodicalId":233540,"journal":{"name":"2021 International Conference on Multimedia Analysis and Pattern Recognition (MAPR)","volume":"110 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127992178","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Thanh Thi Hien Duong, Huu Manh Nguyen, Hai Nghiem Thi, Thi-Lan Le, Phi-Le Nguyen, Q. Nguyen
{"title":"Visual-guided audio source separation: an empirical study","authors":"Thanh Thi Hien Duong, Huu Manh Nguyen, Hai Nghiem Thi, Thi-Lan Le, Phi-Le Nguyen, Q. Nguyen","doi":"10.1109/MAPR53640.2021.9585244","DOIUrl":"https://doi.org/10.1109/MAPR53640.2021.9585244","url":null,"abstract":"Real-world video scenes are usually very complicated as they are mixtures of many different audio-visual objects. Humans with normal hearing ability can easily locate, identify and differentiate sound sources which are heard simultaneously. However, this is an extremely difficult task for machines as the creation of machine listening algorithms that can automatically separate sound sources in difficult mixing conditions has remained very challenging. In this paper, we consider the use of a visual-guided audio source separation approach for separating sounds of different instruments in the video, where detected visual objects are used to assist the sound separation process. We particularly investigate the use of different object detectors for the task. In addition, as an empirical study, we analyze the effect of training datasets on separation performance. Finally, experiment results obtained from a benchmark dataset MUSIC confirm the advantages of the new object detector investigated in the paper.","PeriodicalId":233540,"journal":{"name":"2021 International Conference on Multimedia Analysis and Pattern Recognition (MAPR)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123503008","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}