Qisheng He, Soumyanil Banerjee, L. Schwiebert, Ming Dong
{"title":"AgileGCN: Accelerating Deep GCN with Residual Connections using Structured Pruning","authors":"Qisheng He, Soumyanil Banerjee, L. Schwiebert, Ming Dong","doi":"10.1109/MIPR54900.2022.00011","DOIUrl":"https://doi.org/10.1109/MIPR54900.2022.00011","url":null,"abstract":"Deep Graph Convolutional Networks (GCNs) with multiple layers have been used for applications such as point cloud classification and semantic segmentation and achieved state-of-the-art results. However, they are computationally expensive and have a high run-time latency. In this paper, we propose AgileGCN, a novel framework to compress and accelerate deep GCN models with residual connections using structured pruning. Specifically, in each residual structure of a deep GCN, channel sampling and padding are applied to the input and output channels of a convolutional layer, respectively, to significantly reduce its floating point operations (FLOPs) and number of parameters. Experimental results on two benchmark point cloud datasets demonstrate that AgileGCN achieves significant FLOPs and parameters reduction while maintaining the performance of the unpruned models for both point cloud classification and segmentation.","PeriodicalId":228640,"journal":{"name":"2022 IEEE 5th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"192 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114658395","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Luntian Mou, Yiyuan Zhao, Chao Zhou, Baocai Yin, W. Gao, Ramesh C. Jain
{"title":"A Review of Personalized Health Navigation for Drivers","authors":"Luntian Mou, Yiyuan Zhao, Chao Zhou, Baocai Yin, W. Gao, Ramesh C. Jain","doi":"10.1109/MIPR54900.2022.00059","DOIUrl":"https://doi.org/10.1109/MIPR54900.2022.00059","url":null,"abstract":"Driving activities occupy more and more time for moderns and often can elicit bad states like stress, fatigue, or anger, which can significantly impact road safety and driver health. Therefore, health issues caused by driving should be taken seriously. Whichever the combination of bad health states, it may lead to serious consequences during driving, as evidenced by the large number of traffic accidents that occur each year due to various health issues. As a result of rapid advances in multimedia and sensor technologies, driver health can be automatically detected using multimodal measurements. Therefore, a system that includes driver health detection and health navigation is needed to continuously monitor driver health states and navigate drivers to positive health states to ensure safe driving. In this article, we survey recent related works on driver health detection, as well as discuss some of the main challenges and promising areas to stimulate progress in personalized health navigation for drivers. Finally, we propose a cybernetic-based personalized health navigation framework for drivers (PHN-D), which provides a new paradigm in the field of driver health.","PeriodicalId":228640,"journal":{"name":"2022 IEEE 5th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120983103","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
T.-J. Liao, Jun-Cheng Chen, Shyh-Kang Jeng, Chun-Feng Tai
{"title":"Cross-Domain Knowledge Transfer for Skeleton-based Action Recognition based on Graph Convolutional Gradient Reversal Layer","authors":"T.-J. Liao, Jun-Cheng Chen, Shyh-Kang Jeng, Chun-Feng Tai","doi":"10.1109/MIPR54900.2022.00076","DOIUrl":"https://doi.org/10.1109/MIPR54900.2022.00076","url":null,"abstract":"For skeleton-based action recognition, since there usually exists many nuances between different datasets, including viewpoints, the number of available joints for a skele-ton, the type of actions, etc, it hinders to apply and leverage the knowledge of a pretrained model for one dataset to an-other except retraining a new model for the target dataset. To address this issue, we propose a cross-domain knowledge transfer module based on gradient reversal layer along with adaptive graph convolutional network to effectively transfer the knowledge from one domain to another. The adaptive graph convolution module allows the proposed method to adaptively learn the topological relation between joints and is very useful for the scenarios when the numbers of skele-ton joints for the two domains are different and the topo-logical correspondences of joints are not clearly specified. With extensive experiments from NTU-RGB+D 60 to the PKU, CITI3D, and NW datasets, the proposed approach achieves significantly better results than other state-of-the-art spatio-temporal graph convolutional network methods which are trained on the target dataset only, and this also demonstrates the effectiveness of the proposed approach.","PeriodicalId":228640,"journal":{"name":"2022 IEEE 5th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122038576","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Recursive Randomized Tree Coding of Speech","authors":"Hoontaek Oh, J. Gibson","doi":"10.1109/MIPR54900.2022.00020","DOIUrl":"https://doi.org/10.1109/MIPR54900.2022.00020","url":null,"abstract":"We study a recursively adaptive architecture for speech coding based on the concept of tree coding combined with recursive least squares lattice estimation of the autoregressive component and gradient based estimation of the moving average part of the short term prediction and gradient/autocorrelation based long term prediction algorithms, all adapting to minimize the perceptually weighted reconstruction error. The new idea of concatenated, randomized multitrees is introduced and explored. Voice activity detection (VAD) and comfort noise generation (CNG) are included to reduce the bit rate and the number of computations required. Performance is compared to the widely implemented and utilized AMR codec and we demonstrate comparable performance at bit rates of 4.5 to 7.5 kbits/s.","PeriodicalId":228640,"journal":{"name":"2022 IEEE 5th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129827622","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Automatic Evaluation of Machine Generated Feedback For Text and Image Data","authors":"Pratham Goyal, Anjali Raj, Puneet Kumar, Kishore Babu Nampalle","doi":"10.1109/MIPR54900.2022.00081","DOIUrl":"https://doi.org/10.1109/MIPR54900.2022.00081","url":null,"abstract":"In this paper, a novel system, ‘AutoEvaINet,’ has been developed for evaluating machine-generated feedback in response to multimodal input containing text and images. A new metric, ‘Automatically Evaluated Relevance Score’ (AER Score), has also been defined to automatically compute the similarity between human-generated comments and machine-generatedfeedback. The AutoEvalNet's architecture comprises a pre-trained feedback synthesis model and the proposed feedback evaluation model. It uses an ensemble of Bidirectional Encoder Representations from Transformers (BERT) and Global Vectors for Word Representation (GloVe) models to generate the embeddings of the ground-truth comment and machine-synthesized feedback using which the similarity score is calculated. The experiments have been performed on the MMFeed dataset. The generated feedback has been evaluated automatically using the AER score and manually by having the human users evaluate the feedbackfor relevance to the input and ground-truth comments. The values of the AER score and human evaluation scores are in line, affirming the AER score's applicability as an automatic evaluation measure for machine-generated text instead of human evaluation.","PeriodicalId":228640,"journal":{"name":"2022 IEEE 5th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130524550","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Active Genetic Learning with Evidential Uncertainty for Identifying Mushroom Toxicity","authors":"Oguz Aranay, P. Atrey","doi":"10.1109/MIPR54900.2022.00078","DOIUrl":"https://doi.org/10.1109/MIPR54900.2022.00078","url":null,"abstract":"Mushroom's classification as edible or poisonous is an important problem that can have a direct impact on hu-man life. However, most of the existing works do not in-clude model uncertainty in their analysis and suffer from over-confidence issue. To solve this problem, we propose a learning framework, called deep active genetic with evi-dential uncertainty (DAG-EU), to model the uncertainty of the class probability to classify mushrooms. The framework selects the data points with high uncertainty and the most influencing features by using genetic algorithms. The ex-perimental results on the mushrooms dataset demonstrate that the proposed framework can improve the model classi-fication accuracy by 2.3% compared to the methods in the same domain. Moreover, it outperforms the other models from literature by 3.6%.","PeriodicalId":228640,"journal":{"name":"2022 IEEE 5th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125780786","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Weakly Supervised Temporal Action Localization Through Contrastive Learning","authors":"Chengzhe Yang, Weigang Zhang","doi":"10.1109/MIPR54900.2022.00075","DOIUrl":"https://doi.org/10.1109/MIPR54900.2022.00075","url":null,"abstract":"In recent years, weakly-supervised temporal action localization (WS-TAL) with only video-level annotations, which aims to learn whether each untrimmed video contains action frames gains more attention. Existing most WS-TAL methods especially rely on features learned for action localization. Therefore, it is important to improve the ability to separate the frames of action instances from the background frames. To address this challenge, this paper introduces a framework that learns two extra constraints, Action-Background Learning and Action-Foreground Learning. The former aims at maximizing the discrepancy inside the feature of action and background while the latter avoids the misjudgement of action instance. We evaluate the proposed model on two benchmark datasets, and the experimental results show that the method could gain comparable performance with current state-of-the-art WS-TAL methods.","PeriodicalId":228640,"journal":{"name":"2022 IEEE 5th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127908313","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Facial Expression Recognition in the Wild: Dataset Configurations","authors":"Nathan Galea, D. Seychell","doi":"10.1109/MIPR54900.2022.00045","DOIUrl":"https://doi.org/10.1109/MIPR54900.2022.00045","url":null,"abstract":"Facial Expression Recognition (FER) in the wild has become an increasingly significant and focused area within computer vision, with many studies tackling different aspects to improve its recognition accuracy. This paper utilizes RAF-DB and AffectNet as the two leading datasets in the scene and compares the different experimental dataset configurations to state-of-theart techniques referred to as Amend Representation Module (ARM) and Self-Cure Network (SCN). The paper demonstrates how different dataset configurations should be the main focal point of improving the FER task and how there cannot be significant improvements in the FER task with a lack of a favorable dataset.","PeriodicalId":228640,"journal":{"name":"2022 IEEE 5th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134345763","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Highly Optimized GPU Batched Elasticnet Solver (BENS) with Application to Real- Time Keypoint Detection for Image Retrieval","authors":"Zheng Guo, Thanh Hong-Phuoc, N. Khan, L. Guan","doi":"10.1109/MIPR54900.2022.00070","DOIUrl":"https://doi.org/10.1109/MIPR54900.2022.00070","url":null,"abstract":"In this paper, we present a highly optimized GPU batched elastic-net solver (BENS) with application to real-time key-point detection for image retrieval. BENS was optimized to perform hundreds of thousands of small elastic-net fits by batching each fit from specific steps in the elastic-net computation into a large matrix multiplication which can be computed efficiently using the CUBLAS library. The main motivation for BENS was a real-time implementation of the Sparse-Coding Key-point detector (SCK) algorithm which has reaching applications in science, engineering, social science and medicine. When BENS was applied to accelerate SCK, we have achieved a 232x speed up compared to the original CPU implementation of SCK. To demonstrate the newly accelerated SCK algorithm, we conducted an Bo Vw based image retrieval experiment using SCK as the key-point detector.","PeriodicalId":228640,"journal":{"name":"2022 IEEE 5th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"215 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124294841","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kratika Bhagtani, Amit Kumar Singh Yadav, Emily R. Bartusiak, Ziyue Xiang, Ruiting Shao, Sriram Baireddy, E. Delp
{"title":"An Overview of Recent Work in Multimedia Forensics","authors":"Kratika Bhagtani, Amit Kumar Singh Yadav, Emily R. Bartusiak, Ziyue Xiang, Ruiting Shao, Sriram Baireddy, E. Delp","doi":"10.1109/MIPR54900.2022.00064","DOIUrl":"https://doi.org/10.1109/MIPR54900.2022.00064","url":null,"abstract":"In this paper, we review recent work in media forensics for digital images, video, audio, and documents.","PeriodicalId":228640,"journal":{"name":"2022 IEEE 5th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114750759","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}