2022 IEEE 5th International Conference on Multimedia Information Processing and Retrieval (MIPR)最新文献

筛选
英文 中文
AgileGCN: Accelerating Deep GCN with Residual Connections using Structured Pruning AgileGCN:利用结构化剪枝加速带有残余连接的深度GCN
Qisheng He, Soumyanil Banerjee, L. Schwiebert, Ming Dong
{"title":"AgileGCN: Accelerating Deep GCN with Residual Connections using Structured Pruning","authors":"Qisheng He, Soumyanil Banerjee, L. Schwiebert, Ming Dong","doi":"10.1109/MIPR54900.2022.00011","DOIUrl":"https://doi.org/10.1109/MIPR54900.2022.00011","url":null,"abstract":"Deep Graph Convolutional Networks (GCNs) with multiple layers have been used for applications such as point cloud classification and semantic segmentation and achieved state-of-the-art results. However, they are computationally expensive and have a high run-time latency. In this paper, we propose AgileGCN, a novel framework to compress and accelerate deep GCN models with residual connections using structured pruning. Specifically, in each residual structure of a deep GCN, channel sampling and padding are applied to the input and output channels of a convolutional layer, respectively, to significantly reduce its floating point operations (FLOPs) and number of parameters. Experimental results on two benchmark point cloud datasets demonstrate that AgileGCN achieves significant FLOPs and parameters reduction while maintaining the performance of the unpruned models for both point cloud classification and segmentation.","PeriodicalId":228640,"journal":{"name":"2022 IEEE 5th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"192 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114658395","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Review of Personalized Health Navigation for Drivers 驾驶员个性化健康导航技术综述
Luntian Mou, Yiyuan Zhao, Chao Zhou, Baocai Yin, W. Gao, Ramesh C. Jain
{"title":"A Review of Personalized Health Navigation for Drivers","authors":"Luntian Mou, Yiyuan Zhao, Chao Zhou, Baocai Yin, W. Gao, Ramesh C. Jain","doi":"10.1109/MIPR54900.2022.00059","DOIUrl":"https://doi.org/10.1109/MIPR54900.2022.00059","url":null,"abstract":"Driving activities occupy more and more time for moderns and often can elicit bad states like stress, fatigue, or anger, which can significantly impact road safety and driver health. Therefore, health issues caused by driving should be taken seriously. Whichever the combination of bad health states, it may lead to serious consequences during driving, as evidenced by the large number of traffic accidents that occur each year due to various health issues. As a result of rapid advances in multimedia and sensor technologies, driver health can be automatically detected using multimodal measurements. Therefore, a system that includes driver health detection and health navigation is needed to continuously monitor driver health states and navigate drivers to positive health states to ensure safe driving. In this article, we survey recent related works on driver health detection, as well as discuss some of the main challenges and promising areas to stimulate progress in personalized health navigation for drivers. Finally, we propose a cybernetic-based personalized health navigation framework for drivers (PHN-D), which provides a new paradigm in the field of driver health.","PeriodicalId":228640,"journal":{"name":"2022 IEEE 5th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120983103","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cross-Domain Knowledge Transfer for Skeleton-based Action Recognition based on Graph Convolutional Gradient Reversal Layer 基于图卷积梯度反转层的骨架动作识别跨领域知识转移
T.-J. Liao, Jun-Cheng Chen, Shyh-Kang Jeng, Chun-Feng Tai
{"title":"Cross-Domain Knowledge Transfer for Skeleton-based Action Recognition based on Graph Convolutional Gradient Reversal Layer","authors":"T.-J. Liao, Jun-Cheng Chen, Shyh-Kang Jeng, Chun-Feng Tai","doi":"10.1109/MIPR54900.2022.00076","DOIUrl":"https://doi.org/10.1109/MIPR54900.2022.00076","url":null,"abstract":"For skeleton-based action recognition, since there usually exists many nuances between different datasets, including viewpoints, the number of available joints for a skele-ton, the type of actions, etc, it hinders to apply and leverage the knowledge of a pretrained model for one dataset to an-other except retraining a new model for the target dataset. To address this issue, we propose a cross-domain knowledge transfer module based on gradient reversal layer along with adaptive graph convolutional network to effectively transfer the knowledge from one domain to another. The adaptive graph convolution module allows the proposed method to adaptively learn the topological relation between joints and is very useful for the scenarios when the numbers of skele-ton joints for the two domains are different and the topo-logical correspondences of joints are not clearly specified. With extensive experiments from NTU-RGB+D 60 to the PKU, CITI3D, and NW datasets, the proposed approach achieves significantly better results than other state-of-the-art spatio-temporal graph convolutional network methods which are trained on the target dataset only, and this also demonstrates the effectiveness of the proposed approach.","PeriodicalId":228640,"journal":{"name":"2022 IEEE 5th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122038576","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Recursive Randomized Tree Coding of Speech 语音递归随机树编码
Hoontaek Oh, J. Gibson
{"title":"Recursive Randomized Tree Coding of Speech","authors":"Hoontaek Oh, J. Gibson","doi":"10.1109/MIPR54900.2022.00020","DOIUrl":"https://doi.org/10.1109/MIPR54900.2022.00020","url":null,"abstract":"We study a recursively adaptive architecture for speech coding based on the concept of tree coding combined with recursive least squares lattice estimation of the autoregressive component and gradient based estimation of the moving average part of the short term prediction and gradient/autocorrelation based long term prediction algorithms, all adapting to minimize the perceptually weighted reconstruction error. The new idea of concatenated, randomized multitrees is introduced and explored. Voice activity detection (VAD) and comfort noise generation (CNG) are included to reduce the bit rate and the number of computations required. Performance is compared to the widely implemented and utilized AMR codec and we demonstrate comparable performance at bit rates of 4.5 to 7.5 kbits/s.","PeriodicalId":228640,"journal":{"name":"2022 IEEE 5th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129827622","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Automatic Evaluation of Machine Generated Feedback For Text and Image Data 机器生成的文本和图像数据反馈的自动评估
Pratham Goyal, Anjali Raj, Puneet Kumar, Kishore Babu Nampalle
{"title":"Automatic Evaluation of Machine Generated Feedback For Text and Image Data","authors":"Pratham Goyal, Anjali Raj, Puneet Kumar, Kishore Babu Nampalle","doi":"10.1109/MIPR54900.2022.00081","DOIUrl":"https://doi.org/10.1109/MIPR54900.2022.00081","url":null,"abstract":"In this paper, a novel system, ‘AutoEvaINet,’ has been developed for evaluating machine-generated feedback in response to multimodal input containing text and images. A new metric, ‘Automatically Evaluated Relevance Score’ (AER Score), has also been defined to automatically compute the similarity between human-generated comments and machine-generatedfeedback. The AutoEvalNet's architecture comprises a pre-trained feedback synthesis model and the proposed feedback evaluation model. It uses an ensemble of Bidirectional Encoder Representations from Transformers (BERT) and Global Vectors for Word Representation (GloVe) models to generate the embeddings of the ground-truth comment and machine-synthesized feedback using which the similarity score is calculated. The experiments have been performed on the MMFeed dataset. The generated feedback has been evaluated automatically using the AER score and manually by having the human users evaluate the feedbackfor relevance to the input and ground-truth comments. The values of the AER score and human evaluation scores are in line, affirming the AER score's applicability as an automatic evaluation measure for machine-generated text instead of human evaluation.","PeriodicalId":228640,"journal":{"name":"2022 IEEE 5th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130524550","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Active Genetic Learning with Evidential Uncertainty for Identifying Mushroom Toxicity 具有证据不确定性的主动遗传学习用于蘑菇毒性鉴定
Oguz Aranay, P. Atrey
{"title":"Active Genetic Learning with Evidential Uncertainty for Identifying Mushroom Toxicity","authors":"Oguz Aranay, P. Atrey","doi":"10.1109/MIPR54900.2022.00078","DOIUrl":"https://doi.org/10.1109/MIPR54900.2022.00078","url":null,"abstract":"Mushroom's classification as edible or poisonous is an important problem that can have a direct impact on hu-man life. However, most of the existing works do not in-clude model uncertainty in their analysis and suffer from over-confidence issue. To solve this problem, we propose a learning framework, called deep active genetic with evi-dential uncertainty (DAG-EU), to model the uncertainty of the class probability to classify mushrooms. The framework selects the data points with high uncertainty and the most influencing features by using genetic algorithms. The ex-perimental results on the mushrooms dataset demonstrate that the proposed framework can improve the model classi-fication accuracy by 2.3% compared to the methods in the same domain. Moreover, it outperforms the other models from literature by 3.6%.","PeriodicalId":228640,"journal":{"name":"2022 IEEE 5th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125780786","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Weakly Supervised Temporal Action Localization Through Contrastive Learning 基于对比学习的弱监督时间动作定位
Chengzhe Yang, Weigang Zhang
{"title":"Weakly Supervised Temporal Action Localization Through Contrastive Learning","authors":"Chengzhe Yang, Weigang Zhang","doi":"10.1109/MIPR54900.2022.00075","DOIUrl":"https://doi.org/10.1109/MIPR54900.2022.00075","url":null,"abstract":"In recent years, weakly-supervised temporal action localization (WS-TAL) with only video-level annotations, which aims to learn whether each untrimmed video contains action frames gains more attention. Existing most WS-TAL methods especially rely on features learned for action localization. Therefore, it is important to improve the ability to separate the frames of action instances from the background frames. To address this challenge, this paper introduces a framework that learns two extra constraints, Action-Background Learning and Action-Foreground Learning. The former aims at maximizing the discrepancy inside the feature of action and background while the latter avoids the misjudgement of action instance. We evaluate the proposed model on two benchmark datasets, and the experimental results show that the method could gain comparable performance with current state-of-the-art WS-TAL methods.","PeriodicalId":228640,"journal":{"name":"2022 IEEE 5th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127908313","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Facial Expression Recognition in the Wild: Dataset Configurations 野外面部表情识别:数据集配置
Nathan Galea, D. Seychell
{"title":"Facial Expression Recognition in the Wild: Dataset Configurations","authors":"Nathan Galea, D. Seychell","doi":"10.1109/MIPR54900.2022.00045","DOIUrl":"https://doi.org/10.1109/MIPR54900.2022.00045","url":null,"abstract":"Facial Expression Recognition (FER) in the wild has become an increasingly significant and focused area within computer vision, with many studies tackling different aspects to improve its recognition accuracy. This paper utilizes RAF-DB and AffectNet as the two leading datasets in the scene and compares the different experimental dataset configurations to state-of-theart techniques referred to as Amend Representation Module (ARM) and Self-Cure Network (SCN). The paper demonstrates how different dataset configurations should be the main focal point of improving the FER task and how there cannot be significant improvements in the FER task with a lack of a favorable dataset.","PeriodicalId":228640,"journal":{"name":"2022 IEEE 5th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134345763","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Highly Optimized GPU Batched Elasticnet Solver (BENS) with Application to Real- Time Keypoint Detection for Image Retrieval 一种高度优化的GPU批处理弹性网络求解器(BENS)及其在图像检索中的实时关键点检测中的应用
Zheng Guo, Thanh Hong-Phuoc, N. Khan, L. Guan
{"title":"A Highly Optimized GPU Batched Elasticnet Solver (BENS) with Application to Real- Time Keypoint Detection for Image Retrieval","authors":"Zheng Guo, Thanh Hong-Phuoc, N. Khan, L. Guan","doi":"10.1109/MIPR54900.2022.00070","DOIUrl":"https://doi.org/10.1109/MIPR54900.2022.00070","url":null,"abstract":"In this paper, we present a highly optimized GPU batched elastic-net solver (BENS) with application to real-time key-point detection for image retrieval. BENS was optimized to perform hundreds of thousands of small elastic-net fits by batching each fit from specific steps in the elastic-net computation into a large matrix multiplication which can be computed efficiently using the CUBLAS library. The main motivation for BENS was a real-time implementation of the Sparse-Coding Key-point detector (SCK) algorithm which has reaching applications in science, engineering, social science and medicine. When BENS was applied to accelerate SCK, we have achieved a 232x speed up compared to the original CPU implementation of SCK. To demonstrate the newly accelerated SCK algorithm, we conducted an Bo Vw based image retrieval experiment using SCK as the key-point detector.","PeriodicalId":228640,"journal":{"name":"2022 IEEE 5th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"215 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124294841","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Overview of Recent Work in Multimedia Forensics 多媒体取证的最新研究综述
Kratika Bhagtani, Amit Kumar Singh Yadav, Emily R. Bartusiak, Ziyue Xiang, Ruiting Shao, Sriram Baireddy, E. Delp
{"title":"An Overview of Recent Work in Multimedia Forensics","authors":"Kratika Bhagtani, Amit Kumar Singh Yadav, Emily R. Bartusiak, Ziyue Xiang, Ruiting Shao, Sriram Baireddy, E. Delp","doi":"10.1109/MIPR54900.2022.00064","DOIUrl":"https://doi.org/10.1109/MIPR54900.2022.00064","url":null,"abstract":"In this paper, we review recent work in media forensics for digital images, video, audio, and documents.","PeriodicalId":228640,"journal":{"name":"2022 IEEE 5th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114750759","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信