Chelsea Piers, C. Perlich, Sofus A. Macskassy, A. Caro, Andrew G. Reece, Austin R. Benson, Behzad Golshan, Benjamin Bachman, Bin Wu, Bryan Perozzi, Carl Shan, Chuanren Liu, David W. Miller, D. Cheng, Diana Palsetia, Fang Jin, Furong Li, Genaro Hernandez, Hengshu Zhu, Himabindu Lakkaraju, Huanhuan Sun, Isaac McCreery, Jacqueline A. Fairley, Joyce C. Ho, J. Adebayo, Lan Vu, Layla Pournajaf, Ling Chen, Marija Stankova, Maryam Hasan, M. Ghassemi, M. Conway, Pablo Rivas, P. Lofgren, Peter Landwehr, R. Yuan, R. Khandpur, Sabina Tomkins, Shuai Yuan, T. Babaie, T. Chakraborty, Tevin Brown, Tristan Naumann, Vrushank Vora, Wei Lu, Xiao He, Xinran He, Yanchi Liu, Ying Wei, Yuan Quan, Yuan Yao, Zhe Chen, Bing Liu, Sofus A. Macskassy
{"title":"General Co-Chairs","authors":"Chelsea Piers, C. Perlich, Sofus A. Macskassy, A. Caro, Andrew G. Reece, Austin R. Benson, Behzad Golshan, Benjamin Bachman, Bin Wu, Bryan Perozzi, Carl Shan, Chuanren Liu, David W. Miller, D. Cheng, Diana Palsetia, Fang Jin, Furong Li, Genaro Hernandez, Hengshu Zhu, Himabindu Lakkaraju, Huanhuan Sun, Isaac McCreery, Jacqueline A. Fairley, Joyce C. Ho, J. Adebayo, Lan Vu, Layla Pournajaf, Ling Chen, Marija Stankova, Maryam Hasan, M. Ghassemi, M. Conway, Pablo Rivas, P. Lofgren, Peter Landwehr, R. Yuan, R. Khandpur, Sabina Tomkins, Shuai Yuan, T. Babaie, T. Chakraborty, Tevin Brown, Tristan Naumann, Vrushank Vora, Wei Lu, Xiao He, Xinran He, Yanchi Liu, Ying Wei, Yuan Quan, Yuan Yao, Zhe Chen, Bing Liu, Sofus A. Macskassy","doi":"10.1109/iwat.2011.5752294","DOIUrl":"https://doi.org/10.1109/iwat.2011.5752294","url":null,"abstract":"Social Media Mahyar Shirvanimoghaddam Jingge Zhu • Communication and Storage Coding • Coding Theory • Coded and Distributed Computing • Combinatorics and Information Theory • Communication Theory • Compressed Sensing and Sparsity • Cryptography and Security • Detection and Estimation • Deep Learning for Networks • Distributed Storage • Emerging Applications of IT • Information Theory and Statistics • Information Theory in Biology • Information Theory in CS • Information Theory in Data Science • Learning Theory • Network Coding and Applications • Network Data Analysis • Network Information Theory • Pattern Recognition and ML • Privacy in Information Processing • Quantum Information Theory • Shannon Theory • Signal Processing • Source Coding and Data Compression • Wireless Communication Call for Papers","PeriodicalId":233540,"journal":{"name":"2021 International Conference on Multimedia Analysis and Pattern Recognition (MAPR)","volume":"752 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123872490","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nam Vu Hoai, Huong Mai Nguyen, Duc Cuong Pham, A. Tran, Khanh Nguyen Trong, Cuong Pham, Viet Hung Nguyen
{"title":"Landslide Detection with Unmanned Aerial Vehicles","authors":"Nam Vu Hoai, Huong Mai Nguyen, Duc Cuong Pham, A. Tran, Khanh Nguyen Trong, Cuong Pham, Viet Hung Nguyen","doi":"10.1109/MAPR53640.2021.9585261","DOIUrl":"https://doi.org/10.1109/MAPR53640.2021.9585261","url":null,"abstract":"Landslide is one of the most dangerous disasters, especially for countries with large mountainous terrain. It causes a great damage to lives, infrastructure and environments, such as traffic congestion and high accidents. Therefore, automated landslide detection is an important task for warning and reducing its consequences such as blocked traffic or traffic accidents. For instance, people approaching the disaster area can adjust their routes to avoid blocked roads, or dangerous traffic signs can be positioned in time to warn the traffic participants to avoid the interrupted road ahead. This paper proposes a method to detect blocked roads caused by landslide by utilizing images captured from Unmanned Aerial Vehicles (UAV). The proposed method comprises of three components: road segmentation, blocked road candidate extraction, and blocked road classification, which is leveraged by a multi-stage convolutional neural network model. Our experiments demonstrate that the proposed method can surpass over several state-of-the art methods on our self-collected dataset of 400 images captured with an UAV.","PeriodicalId":233540,"journal":{"name":"2021 International Conference on Multimedia Analysis and Pattern Recognition (MAPR)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115770677","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
D. M. Nguyen, Anh Nguyen, H. M. Tran, Trong Nhan Le, T. Quan
{"title":"Physical Transferable Attack against Black-box Face Recognition Systems","authors":"D. M. Nguyen, Anh Nguyen, H. M. Tran, Trong Nhan Le, T. Quan","doi":"10.1109/MAPR53640.2021.9585256","DOIUrl":"https://doi.org/10.1109/MAPR53640.2021.9585256","url":null,"abstract":"Recent studies have shown that machine learning models in general and deep neural networks like CNN, in particular, are vulnerable to adversarial attacks. Specifically, in terms of face recognition, one can easily deceive deep learning networks by adding a visually imperceptible adversarial perturbation to the input images. However, most of these works assume the ideal scenario where the attackers have perfect information about the victim model and the attack is performed in the digital domain, which is not a realistic assumption. As a result, these methods often poorly (or even impossible to) transfer to the real world. To address this issue, we propose a novel physical transferable attack method on deep face recognition systems that can work in real-world settings without any knowledge about the victim model. Our experiments on various state-of-the-art models with various architectures and training losses show non-trivial attack success rates. With the observed results, we believe that our method can enable further studies on improving adversarial robustness as well as security of deep face recognition systems.","PeriodicalId":233540,"journal":{"name":"2021 International Conference on Multimedia Analysis and Pattern Recognition (MAPR)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122020104","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Anh-Khoa Nguyen Vu, Thanh-Danh Nguyen, Vinh-Tiep Nguyen, T. Ngo
{"title":"DF-FSOD: A Novel Approach for Few-shot Object Detection via Distinguished Features","authors":"Anh-Khoa Nguyen Vu, Thanh-Danh Nguyen, Vinh-Tiep Nguyen, T. Ngo","doi":"10.1109/MAPR53640.2021.9585248","DOIUrl":"https://doi.org/10.1109/MAPR53640.2021.9585248","url":null,"abstract":"Few-shot object detection (FSOD) is a challenging task in which detectors are trained to recognize unseen objects with limited training data. The majority of existing methods are evaluated on the benchmarks built with a fixed quantity of base and novel classes categories. To be specific, the number of base classes is larger than the novel ones. This positively affects the performance evaluated on novel data. However, there are not many works focusing on the effect of such dominated categories on the performance of FSOD models. In this paper, we investigate the efficiency of the detectors in different ratios of base and novel categories in the novel phase. Based on our findings of the affection between base and novel classes, we present a new approach: Distinguished Features for FSOD (DF-FSOD), which encourages the detector to learn distinguished features to capture novel objects via base-class expansion better. In the end, our proposed method outperforms average 4% AP@50 on PASCAL VOC compared to the previous works on the unseen classes when extremely scare labeled data.","PeriodicalId":233540,"journal":{"name":"2021 International Conference on Multimedia Analysis and Pattern Recognition (MAPR)","volume":"130 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115466506","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A deep learning based fracture detection in arm bone X-ray images","authors":"Hoai Phuong Nguyen, T. Hoang, Huy Hoang Nguyen","doi":"10.1109/MAPR53640.2021.9585292","DOIUrl":"https://doi.org/10.1109/MAPR53640.2021.9585292","url":null,"abstract":"A large number of arm fracture-related injuries are reported in hospitals and clinics around the world. In this paper, we propose a novel deep learning based fracture detection in arm bone X-ray images. First, we preprocess the Xray image by using an algorithm that is a combination of the YOLACT++ for image segmentation and Contrast Limited Adaptive Histogram Equalization for image contrast enhancement. Then, YOLOv4 is trained on a small dataset with four data augmentation techniques to identify and locate the position of bone fracture on X-ray images. The topmost result obtained is 81.91% by using our proposed method. Experimental results also confirm that our method outperforms the Faster-RCNN based solution while implementing on the small dataset.","PeriodicalId":233540,"journal":{"name":"2021 International Conference on Multimedia Analysis and Pattern Recognition (MAPR)","volume":"95 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125452357","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"FastTacotron: A Fast, Robust and Controllable Method for Speech Synthesis","authors":"D. V. Sang, Lam Thu","doi":"10.1109/MAPR53640.2021.9585267","DOIUrl":"https://doi.org/10.1109/MAPR53640.2021.9585267","url":null,"abstract":"Recent state-of-the-art neural text-to-speech synthesis models have significantly improved the quality of synthesized speech. However, the previous methods have remained several problems. While autoregressive models suffer from slow inference speed, non-autoregressive models usually have a complicated, time and memory-consuming training pipeline. This paper proposes a novel model called FastTacotron, which is an improved text-to-speech method based on ForwardTacotron. The proposed model uses the recurrent Tacotron architecture but replacing its autoregressive attentive part with a single forward pass to accelerate the inference speed. The model also replaces the attention mechanism in Tacotron with a length regulator like the one in FastSpeech for parallel mel-spectrogram generation. Moreover, we introduce more prosodic information of speech (e.g., pitch, energy, and more accurate duration) as conditional inputs to make the duration predictor more accurate. Experiments show that our model matches state-of-the-art models in terms of speech quality and inference speed, nearly eliminates the problem of word skipping and repeating in particularly hard cases, and possible to control the speed and pitch of the generated utterance. More importantly, our model can converge just in few hours of training, which is up to 11.2x times faster than existing methods. Furthermore, the memory requirement of our model grows linearly with sequence length, which makes it possible to predict complete articles at one time with the model. Audio samples can be found in https://bit.ly/3xguaCW.","PeriodicalId":233540,"journal":{"name":"2021 International Conference on Multimedia Analysis and Pattern Recognition (MAPR)","volume":"381 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133380305","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A More Focus on Multi-degradation Method for Single Image Super-Resolution","authors":"Ngoc-Khanh Nguyen, Thanh-Danh Nguyen, Vinh-Tiep Nguyen","doi":"10.1109/MAPR53640.2021.9585260","DOIUrl":"https://doi.org/10.1109/MAPR53640.2021.9585260","url":null,"abstract":"Single Image Super-resolution (SISR) aims at reconstructing a High-Resolution (HR) image from a Low-Resolution (LR) one. Recent works, especially in the deep learning-based approach, mainly define and resolve the problems of LR images degraded by a fixed degradation kernel, typically bicubic interpolation. However, this assumption can hardly be practical since an input image may suffer from many other deteriorations (e.g. blur or noise). Previous works tackle such multi-degradations by proposing new models, targeting at lessening the restrictions of learning-based method and taking advantages of CNN architecture. Unfortunately, they ignore the existing state-of-the-art CNN-based SISR models that are trained on a fixed degradation kernel. In this work, we introduce a context-extending module that generates on-the-fly more realistic types of degradation. We also come up with a comprehensive cross-degradation loss function enabling the model to better adapt real-world conditions. With this proposal, we can generalize arbitrary end-to-end learning-based networks. Evaluating by Peak Signal-to-Noise Ratio (PSNR) metric, our proposed method outperforms the EDSR baseline a significant amount of 34.5% (from 20.14dB to 27.09dB) on noisy images meanwhile sustaining the comparable results on the bicubic downsampling factor.","PeriodicalId":233540,"journal":{"name":"2021 International Conference on Multimedia Analysis and Pattern Recognition (MAPR)","volume":"128 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121759599","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Minh-Thang Nguyen, Thi-Lan Le, Lan Huong Nguyen Thi, T. Nguyen
{"title":"DS-YOLOv5: Deformable and Scalable YOLOv5 for Mathematical Formula Detection in Scientific Documents","authors":"Minh-Thang Nguyen, Thi-Lan Le, Lan Huong Nguyen Thi, T. Nguyen","doi":"10.1109/MAPR53640.2021.9585254","DOIUrl":"https://doi.org/10.1109/MAPR53640.2021.9585254","url":null,"abstract":"Mathematical formula detection (MFD) is a prerequisite step for the digitization of scientific documents. The MFD task has two key challenges, i.e. a large scale span between embedded formula and isolated formula, and a huge variation of the ratio between height and width. However, the detection accuracy of the most existing approaches rely on page segmentation still needs improvement due to the errors of complex documents. In this work, to solve the important problem of scale variation, we aim to assess the performance of a multi-scaled deformable method for the MFD task based on deformable convolution, image representation, and YOLOv5 detector. For the experimental study, the proposed method has been evaluated on the Marmot dataset, which is an existing benchmark. In our evaluation, the experimental results show that the proposed method outperforms previous methods on the Marmot dataset by a large margin. Moreover, we accomplished correct detection accuracy of 82.42% on embedded formulas and 90.69% on isolated formulas on the Marmot dataset, which results in a significant error reduction.","PeriodicalId":233540,"journal":{"name":"2021 International Conference on Multimedia Analysis and Pattern Recognition (MAPR)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114302068","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ngoc Phu Doan, Nguyen Duc Anh Pham, Hung-Manh Pham, Huu Trung Nguyen, Thuy Anh Nguyen, H. H. Nguyen
{"title":"Real-time Sleeping Posture Recognition For Smart Hospital Beds","authors":"Ngoc Phu Doan, Nguyen Duc Anh Pham, Hung-Manh Pham, Huu Trung Nguyen, Thuy Anh Nguyen, H. H. Nguyen","doi":"10.1109/MAPR53640.2021.9585289","DOIUrl":"https://doi.org/10.1109/MAPR53640.2021.9585289","url":null,"abstract":"Unsuitable sleeping positions are the important contributors that result in bad sleep quality and even serious long-term consequences. Many studies emphasize that pressure sensor-based solutions are effective on the in-bed postures assessment in both home and hospital environments. Surprisingly, none of the studies considers Edge computing-based solution for body pose recognition on smart hospital beds. In this paper, we propose the development of a real-time sleeping posture recognition algorithm which is a combination of a preprocessing technique and an EfficientNet B0 based classifier with an AM-Softmax loss function. Experimental results confirm that our proposed method can gain the accuracy of over 99 % in 5-fold as well as 10-fold cross-validation and 95.32% in the Leave-One-Subject-Out (LOSO) validation for 17 sleeping postures, which greatly surpasses the previous method in the same task. Furthermore, our solution can satisfy the real-time requirement for various data sampling rates when deploying on the Edge computing-based smart hospital bed.","PeriodicalId":233540,"journal":{"name":"2021 International Conference on Multimedia Analysis and Pattern Recognition (MAPR)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133487010","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Thu-Hien Le, Hoang-Nhat Tran, Phuong-Dung Nguyen, Hong-Quan Nguyen, Thuy-Binh Nguyen, Thi-Lan Le, Thanh-Hai Tran, Hai Vu
{"title":"Locality and Relative Distance-Aware Non-local Networks for Hand-Raising Detection in Classroom Video","authors":"Thu-Hien Le, Hoang-Nhat Tran, Phuong-Dung Nguyen, Hong-Quan Nguyen, Thuy-Binh Nguyen, Thi-Lan Le, Thanh-Hai Tran, Hai Vu","doi":"10.1109/MAPR53640.2021.9585284","DOIUrl":"https://doi.org/10.1109/MAPR53640.2021.9585284","url":null,"abstract":"Detecting and understanding interactions between students and teachers in classroom is an important criterion for computer vision-based educational assistive systems. Recently, deep long-range spatial dependencies modeling techniques, such as non-local networks, have been proven to be very effective for such tasks. Yet, regarding global context generation, we analyze that the non-local operation only compares pixels using their values, which cannot pertain to structural information. In this paper, we first extend the non-local module to corporate locality attributes. We further observe that each query is treated uniformly to generate the attention map. Hence, we incorporate distance-wise representations with an efficient implementation into the non-local formulas. The proposed locality and relative distance-aware non-local module is integrated into an object detection architecture namely Libra-RCNN and is evaluated through our experiments on a pre-access hand-raising gesture dataset. Our straightforward modification achieves 0.8% and 2.8% higher performance compared to the baseline Libra-RCNN model, in terms of mAP 0.5 and mAP 0.75 respectively.","PeriodicalId":233540,"journal":{"name":"2021 International Conference on Multimedia Analysis and Pattern Recognition (MAPR)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132567966","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}