{"title":"Unsupervised 3D Animal Canonical Pose Estimation with Geometric Self-Supervision","authors":"Xiaowei Dai, Shuiwang Li, Qijun Zhao, Hongyu Yang","doi":"10.1109/FG57933.2023.10042785","DOIUrl":"https://doi.org/10.1109/FG57933.2023.10042785","url":null,"abstract":"Although analyzing animal shape and pose has potential applications in many fields, there is little work on 3D animal pose estimation. This can be attributed to two aspects: the lack of large-scale well-annotated datasets, and perspective ambiguities which make it difficult to map 2D space to 3D space. To address data scarcity, we propose an unsupervised method to estimate 3D animal pose, given only 2D poses. To deal with perspective ambiguities, we introduce a canonical consistency loss and a camera consistency loss to impose geometric priors in the training process, and combine the reprojection loss and the 2D pose discriminator to enable self-supervised learning. Specifically, given a 2D pose, the pose generator network generates a corresponding 3D pose and the camera network estimates a camera rotation. During training, the generated 3D pose is randomly reprojected onto camera viewpoints to synthesize a new 2D pose. The synthesized 2D pose is decomposed into a 3D pose and a camera rotation, based on which consistency losses are imposed in both 3D canonical poses and camera rotations for self-supervised training. We evaluate the proposed method on real and synthetic datasets, i.e., SMAL and AcinoSet. The experimental results demonstrate the effectiveness of the proposed method and we achieve state-of-the-art performance among unsupervised algorithms for 3D animal canonical pose estimation.","PeriodicalId":318766,"journal":{"name":"2023 IEEE 17th International Conference on Automatic Face and Gesture Recognition (FG)","volume":"134 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123584100","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Learning Effective Global Receptive Field for Facial Expression Recognition","authors":"Jiayi Han, Ang Li, Donghong Han, Jianfeng Feng","doi":"10.1109/FG57933.2023.10042628","DOIUrl":"https://doi.org/10.1109/FG57933.2023.10042628","url":null,"abstract":"Facial expression recognition (FER) remains a challenging task despite years of effort because of the variations in view angles and human poses and the exclusion of expression-relevant facial parts. In this work, we propose to learn effective Global receptive field and Class-sensitive metrics for FER, namely GCNet which contains a Class-sensitive metric learning module (CSMLM) and mobile dilation modules (MDMs). CSMLM fully takes advantage of the variation in human faces to extract class-sensitive and spatially consistent features to improve the effectiveness of FER. MDM utilizes cascaded dilation convolution layers to achieve a global receptive field. However, directly adding a dilation convolution layer to a given sequence of convolution layers may face the gridding problem, which leads to sparse feature maps. In this work, we find the upper bound of the dilation rate of the additional convolution layer that avoids the gridding problem. Experiments show that the proposed approach reaches state-of-the-art (SOTA) performance on the RAF-DB, FER-Plus, and SFEW2.0 datasets.","PeriodicalId":318766,"journal":{"name":"2023 IEEE 17th International Conference on Automatic Face and Gesture Recognition (FG)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121630629","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Zero-Shot Style Transfer for Multimodal Data-Driven Gesture Synthesis","authors":"Mireille Fares, C. Pelachaud, Nicolas Obin","doi":"10.1109/FG57933.2023.10042658","DOIUrl":"https://doi.org/10.1109/FG57933.2023.10042658","url":null,"abstract":"We propose a multimodal speech driven approach to generate 2D upper-body gestures for virtual agents, in the communicative style of different speakers, seen or unseen by our model during training. Upper-body gestures of a source speaker are generated based on the content of his/her multimodal data - speech acoustics and text semantics. The synthesized source speaker's gestures are conditioned on the multimodal style representation of the target speaker. Our approach is zero-shot, and can generalize the style transfer to new unseen speakers, without any additional training. An objective evaluation is conducted to validate our approach.","PeriodicalId":318766,"journal":{"name":"2023 IEEE 17th International Conference on Automatic Face and Gesture Recognition (FG)","volume":"88 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127034756","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Joakim Gustafson, Éva Székely, Simon Alexandersson, J. Beskow
{"title":"Casual chatter or speaking up? Adjusting articulatory effort in generation of speech and animation for conversational characters","authors":"Joakim Gustafson, Éva Székely, Simon Alexandersson, J. Beskow","doi":"10.1109/FG57933.2023.10042520","DOIUrl":"https://doi.org/10.1109/FG57933.2023.10042520","url":null,"abstract":"Embodied conversational agents and social robots need to be able to generate spontaneous behavior in order to be believable in social interactions. We present a system that can generate spontaneous speech with supporting lip movements. The conversational TTS voice is trained on a podcast corpus that has been prosodically tagged (f0, speaking rate and energy) and transcribed (including tokens for breathing, fillers and laughter). We introduce a speech animation algorithm where articulatory effort can be adjusted. The speech animation is driven by time-stamped phonemes obtained from the internal alignment attention map of the TTS system, and we use prominence estimates from the synthesised speech waveform to modulate the lip- and jaw movements accordingly.","PeriodicalId":318766,"journal":{"name":"2023 IEEE 17th International Conference on Automatic Face and Gesture Recognition (FG)","volume":"40 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133604014","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Adversarial 3D Face Disentanglement of Identity and Expression","authors":"Yajie Gu, Nick E. Pears, Hao Sun","doi":"10.1109/FG57933.2023.10042602","DOIUrl":"https://doi.org/10.1109/FG57933.2023.10042602","url":null,"abstract":"We propose a new framework to decompose 3D facial shape into identity and expression. Existing 3D face disentanglement methods assume the presence of a corresponding neutral (i.e. identity) face for each subject. Our method designs an identity discriminator to obviate this requirement. This is a binary classifier that determines if two input faces are from the same identity, and encourages the synthesised identity face to have the same identity features as the input face and to approach the ‘apathy’ expression. To this end, we take advantage of adversarial learning to train a PointNet-based variational auto-encoder and discriminator. Comprehensive experiments are employed on CoMA, BU3DFE, and FaceScape datasets. Results demonstrate state-of-the-art performance with the option of operating in a more versatile application setting of no known neutral ground truths. Code is available at https://github.com/rmraaron/FaceExpDisentanglement.","PeriodicalId":318766,"journal":{"name":"2023 IEEE 17th International Conference on Automatic Face and Gesture Recognition (FG)","volume":"371 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116566900","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Laplacian ICP for Progressive Registration of 3D Human Head Meshes","authors":"Nick E. Pears, H. Dai, William Smith, Haobo Sun","doi":"10.1109/FG57933.2023.10042743","DOIUrl":"https://doi.org/10.1109/FG57933.2023.10042743","url":null,"abstract":"We present a progressive 3D registration framework that is a highly-efficient variant of classical non-rigid Iterative Closest Points (N-ICP). Since it uses the Laplace-Beltrami operator for deformation regularisation, we view the overall process as Laplacian ICP (L-ICP). This exploits a ‘small deformation per iteration’ assumption and is progressively coarse-to-fine, employing an increasingly flexible deformation model, an increasing number of correspondence sets, and increasingly sophisticated correspondence estimation. Correspondence matching is only permitted within predefined vertex subsets derived from domain-specific feature extractors. Additionally, we present a new benchmark and a pair of evaluation metrics for 3D non-rigid registration, based on annotation transfer. We use this to evaluate our framework on a publicly-available dataset of 3D human head scans (Headspace). The method is robust and only requires a small fraction of the computation time compared to the most popular classical approach, yet has comparable registration performance.","PeriodicalId":318766,"journal":{"name":"2023 IEEE 17th International Conference on Automatic Face and Gesture Recognition (FG)","volume":"104 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128150741","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Human Pose Estimation with Shape Aware Loss","authors":"Lin Fang, Shangfei Wang","doi":"10.1109/FG57933.2023.10042691","DOIUrl":"https://doi.org/10.1109/FG57933.2023.10042691","url":null,"abstract":"Although the mean square error (mse) of heatmap is an intuitive loss for heatmap-based human pose estimation, the joints localization accuracy may not be improved when heatmap mse reduces. In this paper, we show that a great cause for such misalignment is the unnecessary requirement from heatmap mse on the irrelevant Gaussian parameter, i.e. maximum. The coordinate prediction is precise as long as the probability distribution held by the predicted heatmap is a well-shaped Gaussian distribution and has the same center as the ground truth. However, heatmap mse unnecessarily requires the Gaussian distribution to hold the same maximum as the ground truth. Correspondingly, we introduce mse on the image gradients of the target and predicted heatmap (referred to as gradmap mse) to focus on the shape of the heatmap. Combining heatmap and gradmap mse, we propose a simple yet effective Shape Aware Loss (SAL) method. Being model-agnostic, our method can benefit various existing models. We apply SAL to the three latest network architectures and obtain performance improvements for all of them. Comparisons of the visualized predicted heatmaps further prove the effectiveness of the proposed method.","PeriodicalId":318766,"journal":{"name":"2023 IEEE 17th International Conference on Automatic Face and Gesture Recognition (FG)","volume":"203 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114840394","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Adversarial Deep Multi-Task Learning Using Semantically Orthogonal Spaces and Application to Facial Attributes Prediction","authors":"Arnaud Dapogny, Gauthier Tallec, Jules Bonnard, Edouard Yvinec, Kévin Bailly","doi":"10.1109/FG57933.2023.10042750","DOIUrl":"https://doi.org/10.1109/FG57933.2023.10042750","url":null,"abstract":"Deep learning-based multi-task approaches usually rely on factorizing representation layers up to a certain point, where the network splits into several heads, each one addressing a specific task. Depending on the inter-task correlation, such naive model may or may not allow the tasks to benefit from each others. In this paper, we propose a novel Semantic Orthogonality Spaces (SOS) method for multi-task problems, where each task is predicted using the information from a common subspace that factorizes information among all tasks, as well as a task-specific subspace. We enforce orthogonality between these tasks by applying soft orthogonality constraints, as well as adversarially-learned semantic orthogonality objectives that ensures that predicting one task requires the specific information related to that task. We demonstrate the effectiveness of SOS on synthetic data, as well as for large-scale facial attributes prediction. In particular, we use SOS to craft a lightweight architecture that provides high-end accuracies on CelebA database.","PeriodicalId":318766,"journal":{"name":"2023 IEEE 17th International Conference on Automatic Face and Gesture Recognition (FG)","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124833411","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Controllable Facial Micro-element Synthesis using Segmentation Maps","authors":"Yujin Kim, I. Park","doi":"10.1109/FG57933.2023.10042571","DOIUrl":"https://doi.org/10.1109/FG57933.2023.10042571","url":null,"abstract":"In facial image synthesis, the style of the source image is converted using a reference image, or images with different styles are synthesized by each attribute using a facial attribute segmentation map. However, previous works cannot deal with the fine areas because the style is changed mostly in large areas such as hair, eyes, and mouth. To overcome the limitation, we propose a novel method of synthesizing a facial image with micro-level facial elements. A deep learning-based high-resolution image synthesis model is employed after generating a label image from the face RGB image through skin micro-element segmentation and face attribute segmentation. In the process of generating a label image for synthesizing skin micro-elements, we propose a technique for controlling skin micro-elements, enabling the generation of various label images from a single face label image. Throughout the proposed method, the areas of skin micro-elements can be edited and different skin types can be simulated. The experimental results show that the generated face is significantly improved by applying the proposed method. Moreover, various faces can be synthesized by changing the types and stages of skin micro-elements.","PeriodicalId":318766,"journal":{"name":"2023 IEEE 17th International Conference on Automatic Face and Gesture Recognition (FG)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133258389","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hiroki Tanaka, Takeshi Saga, Kota Iwauchi, Satoshi Nakamura
{"title":"Acceptability and Trustworthiness of Virtual Agents by Effects of Theory of Mind and Social Skills Training","authors":"Hiroki Tanaka, Takeshi Saga, Kota Iwauchi, Satoshi Nakamura","doi":"10.1109/FG57933.2023.10042781","DOIUrl":"https://doi.org/10.1109/FG57933.2023.10042781","url":null,"abstract":"We constructed a social skills training system using virtual agents and developed a new training module for four basic tasks: declining, requesting, praising, and listening. Previous work demonstrated that a virtual agent's theory of mind influences the building of trust between agents and users. The purpose of this study is to explore the effect of trustworthiness, acceptability, familiarity, and likeability on the agents' theory of mind and the social skills training contents. In our experiment, 29 participants rated the trustworthiness and acceptability of the virtual agent after watching a video that featured levels of theory of mind and social skills training. Their ratings were obtained using self-evaluation measures at each stage. We confirmed that our users' trust and acceptability of the virtual agent were significantly changed depending on the level of the virtual agent's theory of mind. We also confirmed that the users' trust and acceptability in the trainer tended to improve after the social skills training.","PeriodicalId":318766,"journal":{"name":"2023 IEEE 17th International Conference on Automatic Face and Gesture Recognition (FG)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122230633","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}