S. Mozgai, Sarah Beland, Andrew Leeds, Jade G. Winn, Cari Kaurloto, D. Heylen, Arno Hartholt
{"title":"Toward a Scoping Review of Social Intelligence in Virtual Humans","authors":"S. Mozgai, Sarah Beland, Andrew Leeds, Jade G. Winn, Cari Kaurloto, D. Heylen, Arno Hartholt","doi":"10.1109/FG57933.2023.10042532","DOIUrl":"https://doi.org/10.1109/FG57933.2023.10042532","url":null,"abstract":"As the demand for socially intelligent Virtual Humans (VHs) increases, so follows the demand for effective and efficient cross-discipline collaboration that is required to bring these VHs “to life”. One avenue for increasing cross-discipline fluency is the aggregation and organization of seemingly disparate areas of research and development (e.g., graphics and emotion models) that are essential to the field of VH research. Our initial investigation (1) identifies and catalogues research streams concentrated in three multidisciplinary VH topic clusters within the domain of social intelligence, Emotion, Social Behavior, and The Face, (2) brings to the forefront key themes and prolific authors within each topic cluster, and (3) provides evidence that a full scoping review is warranted to further map the field, aggregate research findings, and identify gaps in the research. To enable collaboration, we provide full access to the refined VH cluster datasets, key word and author word clouds, as well as interactive evidence maps.","PeriodicalId":318766,"journal":{"name":"2023 IEEE 17th International Conference on Automatic Face and Gesture Recognition (FG)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130299181","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
R. Parameshwara, Ibrahim Radwan, Subramanian Ramanathan, Roland Goecke
{"title":"Examining Subject-Dependent and Subject-Independent Human Affect Inference from Limited Video Data","authors":"R. Parameshwara, Ibrahim Radwan, Subramanian Ramanathan, Roland Goecke","doi":"10.1109/FG57933.2023.10042798","DOIUrl":"https://doi.org/10.1109/FG57933.2023.10042798","url":null,"abstract":"Continuous human affect estimation from video data entails modelling the dynamic emotional state from a sequence of facial images. Though multiple affective video databases exist, they are limited in terms of data and dy-namic annotations, as assigning continuous affective labels to video data is subjective, onerous and tedious. While studies have established the existence of signature facial expressions corresponding to the basic categorical emotions, individual differences in emoting facial expressions nevertheless exist; factoring out these idiosyncrasies is critical for effective emotion inference. This work explores continuous human affect recognition using AFEW-VA, an ‘in-the-wild’ video dataset with limited data, employing subject-independent (SI) and subject-dependent (SD) settings. The SI setting involves the use of training and test sets with mutually exclusive subjects, while training and test samples corresponding to the same subject can occur in the SD setting. A novel, dynamically-weighted loss function is employed with a Convolutional Neural Network (CNN)-Long Short- Term Memory (LSTM) architecture to optimise dynamic affect prediction. Superior prediction is achieved in the SD setting, as compared to the SI counterpart.","PeriodicalId":318766,"journal":{"name":"2023 IEEE 17th International Conference on Automatic Face and Gesture Recognition (FG)","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132674468","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Video Inference for Human Mesh Recovery with Vision Transformer","authors":"Hanbyel Cho, Jaesung Ahn, Yooshin Cho, Junmo Kim","doi":"10.1109/FG57933.2023.10042731","DOIUrl":"https://doi.org/10.1109/FG57933.2023.10042731","url":null,"abstract":"Human Mesh Recovery (HMR) from an image is a challenging problem because of the inherent ambiguity of the task. Existing HMR methods utilized either temporal information or kinematic relationships to achieve higher accuracy, but there is no method using both. Hence, we propose “Video Inference for Human Mesh Recovery with Vision Transformer (HMR-ViT)” that can take into account both temporal and kinematic information. In HMR-ViT, a Temporal-kinematic Feature Image is constructed using feature vectors obtained from video frames by an image encoder. When generating the feature image, we use a Channel Rearranging Matrix (CRM) so that similar kinematic features could be located spatially close together. The feature image is then further encoded using Vision Transformer, and the SMPL pose and shape parameters are finally inferred using a regression network. Extensive evaluation on the 3DPW and Human3.6M datasets indicates that our method achieves a competitive performance in HMR.","PeriodicalId":318766,"journal":{"name":"2023 IEEE 17th International Conference on Automatic Face and Gesture Recognition (FG)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130479521","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"DisVAE: Disentangled Variational Autoencoder for High-Quality Facial Expression Features","authors":"Tianhao Wang, Mingyue Zhang, Lin Shang","doi":"10.1109/FG57933.2023.10042668","DOIUrl":"https://doi.org/10.1109/FG57933.2023.10042668","url":null,"abstract":"Facial expression feature extraction suffers from high inter-subject variations caused by identity-related personal attributes. The extracted expression features are consistently entangled with other identity-related features, which has an influence on related facial expression tasks such as recognition and editing. To achieve high-quality expression features, a Disentangled Variational Autoencoder (DisVAE) is proposed to disentangle expression and identity features. The identity features are removed from the facial features via facial image reconstruction firstly, and then the remaining features represent expression components. Extensive experiments on three public datasets have shown that the proposed DisVAE can effectively disentangle expression and identity features, and extract expression features without the interfere of identity attributes. The high-quality expression features improve the performance of facial expression recognition and can be well applied to facial expression editing.","PeriodicalId":318766,"journal":{"name":"2023 IEEE 17th International Conference on Automatic Face and Gesture Recognition (FG)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121471308","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improving Performance of Facial Biometrics With Quality-Driven Dataset Filtering","authors":"Iurii Medvedev, Nuno Gonçalves","doi":"10.1109/FG57933.2023.10042579","DOIUrl":"https://doi.org/10.1109/FG57933.2023.10042579","url":null,"abstract":"Advancements in deep learning techniques and availability of large scale face datasets led to significant performance gains in face recognition in recent years. Modern face recognition algorithms are trained on large-scale in-the-wild face datasets. At the same time, many facial biometric applications rely on controlled image acquisition and enrollment procedures (for instance, document security applications). That is why such face recognition approaches can demonstrate the deficiency of the performance in the target scenario (ICAO-compliant images). However, modern approaches for face image quality estimation may help to mitigate that problem. In this work, we introduce a strategy for filtering training datasets by quality metrics and demonstrate that it can lead to performance improvements in biometric applications that rely on face image modality. We filter the main academic datasets using the proposed filtering strategy and present performance metrics.","PeriodicalId":318766,"journal":{"name":"2023 IEEE 17th International Conference on Automatic Face and Gesture Recognition (FG)","volume":"95 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124631864","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ngoc Hoang Nguyen, Nhat Nguyen Xuan, T. Bui, Dao Huu Hung, S. Q. Truong, V. Hoang
{"title":"An efficient approach for real-time abnormal human behavior recognition on surveillance cameras","authors":"Ngoc Hoang Nguyen, Nhat Nguyen Xuan, T. Bui, Dao Huu Hung, S. Q. Truong, V. Hoang","doi":"10.1109/FG57933.2023.10042648","DOIUrl":"https://doi.org/10.1109/FG57933.2023.10042648","url":null,"abstract":"In recent years, abnormal human behavior recognition has become an attractive research topic of computer vision due to the rapid growth of demand to monitor human activities on closed-circuit television (CCTV) cameras. However, developing a deep learning-based model for abnormal/violent behavior recognition in surveillance systems is still quite challenging and costly due to inadequate data and model complexity. This paper presents an efficient approach to recognize violent behavior such as fighting, sexual harassment, and climbing fence in real-time on a multi-camera-one-edge-device system. Our approach develops a lightweight 3DCNN model trained by an effective optimization process to recognize human behavior from sequence frames of CCTV video signal input. In the optimization method, we utilize two advantages of deep learning techniques of knowledge distillation and contrastive learning to enhance the quality of the lightweight model on recognizing recorded human behaviors, which can help the student network learn distilled information from both the bigger model and contrastive object representations. We also establish a large CCTV human behavior video dataset containing 4,200 abnormal and 24,000 normal videos. The effectiveness of the proposed approach is shown by the high inference performance and impressive results evaluated on both public datasets the RWF-2000 dataset, the UCF101 dataset, and our collected datasets.","PeriodicalId":318766,"journal":{"name":"2023 IEEE 17th International Conference on Automatic Face and Gesture Recognition (FG)","volume":"323 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115909612","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nele Loecher, S. King, J. Cabo, T. Neal, Kristin A Kosyluk
{"title":"Assessing the Efficacy of a Self-Stigma Reduction Mental Health Program with Mobile Biometrics: Work-in-Progress","authors":"Nele Loecher, S. King, J. Cabo, T. Neal, Kristin A Kosyluk","doi":"10.1109/FG57933.2023.10042655","DOIUrl":"https://doi.org/10.1109/FG57933.2023.10042655","url":null,"abstract":"One of the strongest predictors of success in post-secondary education is student engagement. Unfortunately, people with psychiatric disabilities are less engaged in their campus communities. This work-in-progress paper details the disclosure-based self-stigma reduction program, Up To Me, which is developed to increase inclusion and engagement of people with mental illness on college campuses by teaching strategies to weigh costs and benefits of disclosing one's mental illness. Further, we elaborate on the program's evaluation mechanisms, which involve both self-reported and passively recorded smartphone sensor data. The latter reflects a unique merging of behavioral and computer sciences that serves to facilitate behavioral modeling using artificial intelligence as an objective measure of Up to Me outcomes. Similar to data collection for some activity and biometric recognition applications, we employ a publicly available and free-to-use smartphone sensor reading app to correlate self-reported well-being with Up to Me participant behaviors. We anticipate that the behavioral data gathered via smartphones will substantiate self-report data on Up to Me outcomes.","PeriodicalId":318766,"journal":{"name":"2023 IEEE 17th International Conference on Automatic Face and Gesture Recognition (FG)","volume":"90 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125637334","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Dynamic face imaging: a novel analysis framework for 4D social face perception and expression","authors":"L. Snoek, Rachael E. Jack, P. Schyns","doi":"10.1109/FG57933.2023.10042724","DOIUrl":"https://doi.org/10.1109/FG57933.2023.10042724","url":null,"abstract":"Measuring facial expressions is a notoriously difficult and time-consuming process, often involving manual labeling of low-dimensional descriptors such as Action Units (AUs). Computer vision algorithms provide automated alternatives for measuring and annotating face shape and expression from 2D images, but often ignore the complexities of dynamic 3D facial expressions. Moreover, open-source implementations are often difficult to use, preventing widespread adoption by the wider scientific community beyond computer vision. To address these issues, we develop dynamic face imaging, a novel analysis framework to study social face perception and expression. We use state-of-the-art 3D face reconstruction models to quantify face movement as temporal shape deviations in a common 3D mesh topology, which disentangles global (head) movement and local (facial) movement. Using a set of validation analyses, we test different reconstruction algorithms and quantify how well they reconstruct facial “action units” and track key facial landmarks in 3D, demonstrating promising performance and highlight areas for improvement. We provide an open-source software package that implements functionality for easy reconstruction, preprocessing, and analysis of these dynamic facial expression data.","PeriodicalId":318766,"journal":{"name":"2023 IEEE 17th International Conference on Automatic Face and Gesture Recognition (FG)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127833639","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SignNet: Single Channel Sign Generation using Metric Embedded Learning","authors":"Tejaswini Ananthanarayana, Lipisha Chaudhary, Ifeoma Nwogu","doi":"10.1109/FG57933.2023.10042711","DOIUrl":"https://doi.org/10.1109/FG57933.2023.10042711","url":null,"abstract":"A true interpreting agent not only understands sign language and translates to text, but also understands text and translates to signs. Much of the AI work in sign language translation to date has focused mainly on translating from signs to text. Towards the latter goal, we propose a text-to-sign translation model, SignNet, which exploits the notion of similarity (and dissimilarity) of visual signs in translating. This module presented is only one part of a dual-learning two task process involving text-to-sign (T2S) as well as sign-to-text (S2T). We currently implement SignNet as a single channel architecture so that the output of the T2S task can be fed into S2T in a continuous dual learning framework. By single channel, we refer to a single modality, the body pose joints. In this work, we present SignNet, a T2S task using a novel metric embedding learning process, to preserve the distances between sign embeddings relative to their dissimilarity. We also describe how to choose positive and negative examples of signs for similarity testing. From our analysis, we observe that metric embedding learning-based model perform significantly better than the other models with traditional losses, when evaluated using BLEU scores. In the task of gloss to pose, SignNet performed as well as its state-of-the-art (SoTA) counterparts and outperformed them in the task of text to pose, by showing noteworthy enhancements in BLEU 1 - BLEU 4 scores (BLEU 1: 31 → 39; ≈26% improvement and BLEU 4: 10.43 →11.84; ≈14% improvement) when tested on the popular RWTH PHOENIX-Weather-2014T benchmark dataset","PeriodicalId":318766,"journal":{"name":"2023 IEEE 17th International Conference on Automatic Face and Gesture Recognition (FG)","volume":"21 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116943448","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Generalized Face Anti-Spoofing via Multi-Task Learning and One-Side Meta Triplet Loss","authors":"Chu-Chun Chuang, Chien-Yi Wang, S. Lai","doi":"10.1109/FG57933.2023.10042685","DOIUrl":"https://doi.org/10.1109/FG57933.2023.10042685","url":null,"abstract":"With the increasing variations of face presentation attacks, model generalization becomes an essential challenge for a practical face anti-spoofing system. This paper presents a generalized face anti-spoofing framework that consists of three tasks: depth estimation, face parsing, and live/spoof classification. With the pixel-wise supervision from the face parsing and depth estimation tasks, the regularized features can better distinguish spoof faces. While simulating domain shift with meta-learning techniques, the proposed one-side triplet loss can further improve the generalization capability by a large margin. Extensive experiments on four public datasets demonstrate that the proposed framework and training strategies are more effective than previous works for model generalization to unseen domains.","PeriodicalId":318766,"journal":{"name":"2023 IEEE 17th International Conference on Automatic Face and Gesture Recognition (FG)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129741128","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}