2023 IEEE International Conference on Multimedia and Expo Workshops (ICMEW)最新文献

筛选

英文中文

Expressive Speech-Driven Facial Animation with Controllable Emotions 具有可控情绪的表情语言驱动的面部动画

2023 IEEE International Conference on Multimedia and Expo Workshops (ICMEW) Pub Date : 2023-01-05 DOI: 10.1109/ICMEW59549.2023.00073

Yutong Chen, Junhong Zhao, Weiqiang Zhang

引用次数: 2

Skeletonmae: Spatial-Temporal Masked Autoencoders for Self-Supervised Skeleton Action Recognition 用于自监督骨骼动作识别的时空掩码自编码器

2023 IEEE International Conference on Multimedia and Expo Workshops (ICMEW) Pub Date : 2022-09-01 DOI: 10.1109/ICMEW59549.2023.00045

Wenhan Wu, Yilei Hua, Ce Zheng, Shi-Bao Wu, Chen Chen, Aidong Lu

{"title":"Skeletonmae: Spatial-Temporal Masked Autoencoders for Self-Supervised Skeleton Action Recognition","authors":"Wenhan Wu, Yilei Hua, Ce Zheng, Shi-Bao Wu, Chen Chen, Aidong Lu","doi":"10.1109/ICMEW59549.2023.00045","DOIUrl":"https://doi.org/10.1109/ICMEW59549.2023.00045","url":null,"abstract":"Self-supervised skeleton-based action recognition has attracted more attention in recent years. By utilizing the unlabeled data, more generalizable features can be learned to alleviate the overfitting problem and reduce the demand for massive labeled training data. Inspired by the MAE [1], we propose a spatial-temporal masked autoencoder framework for self-supervised 3D skeleton-based action recognition (SkeletonMAE). Following MAE's masking and reconstruction pipeline, we utilize a skeleton-based encoder-decoder transformer architecture to reconstruct the masked skeleton sequences. A novel masking strategy, named Spatial-Temporal Masking, is introduced in terms of both joint-level and frame-level for the skeleton sequence. This pre-training strategy makes the encoder output generalizable skeleton features with spatial and temporal dependencies. Given the unmasked skeleton sequence, the encoder is fine-tuned for the action recognition task. Extensive experiments show that our SkeletonMAE achieves remarkable performance and outperforms the state-of-the-art methods on both NTU RGB+D 60 and NTU RGB+D 120 datasets.","PeriodicalId":111482,"journal":{"name":"2023 IEEE International Conference on Multimedia and Expo Workshops (ICMEW)","volume":"242 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122523007","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Aesthetic Visual Question Answering of Photographs 摄影美学视觉问答

2023 IEEE International Conference on Multimedia and Expo Workshops (ICMEW) Pub Date : 2022-08-10 DOI: 10.1109/ICMEW59549.2023.00068

Xin Jin, Wu Zhou, Xinghui Zhou, Shuai Cui, Le Zhang, Jianwen Lv, Shu Zhao

引用次数: 0

首页上一页