{"title":"Technical Program Committee FG 2023","authors":"","doi":"10.1109/fg57933.2023.10042682","DOIUrl":"https://doi.org/10.1109/fg57933.2023.10042682","url":null,"abstract":"","PeriodicalId":318766,"journal":{"name":"2023 IEEE 17th International Conference on Automatic Face and Gesture Recognition (FG)","volume":"360 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122786529","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kais Riani, Salem Sharak, M. Abouelenien, Mihai Burzo, Rada Mihalcea, John Elson, C. Maranville, K. Prakah-Asante, W. Manzoor
{"title":"Non-Contact Based Modeling of Enervation","authors":"Kais Riani, Salem Sharak, M. Abouelenien, Mihai Burzo, Rada Mihalcea, John Elson, C. Maranville, K. Prakah-Asante, W. Manzoor","doi":"10.1109/FG57933.2023.10042529","DOIUrl":"https://doi.org/10.1109/FG57933.2023.10042529","url":null,"abstract":"Significant research is currently carried out with a focus on autonomous vehicles; research is starting to focus on areas such as the modeling of occupant states and behavioral elements. This paper contributes to this line of research by developing a pipeline that extracts physiological signals from thermal imagery and modeling occupant enervation using a fully non-contact based approach. These signals are obtained via a multimodal dataset of 36 subjects across multiple channels, including the thermal and physiological modalities. Moreover, we provide a comparative analysis of non-contact and contact based channels to model the enervation state of individuals. Our analysis indicates that non-contact physiological signals extracted from thermal imagery can reach and exceed the performance of contact-based physiological signals. In addition, modeling of enervation is possible using said non-contact physiological signals and thermal features, with an accuracy of up to 70% in identifying energized and enervated occupant states. Our findings provide a novel approach for future research and opens the possibility for integration of unrestrictive sensors in future automobiles.","PeriodicalId":318766,"journal":{"name":"2023 IEEE 17th International Conference on Automatic Face and Gesture Recognition (FG)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124667465","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Relation-aware Network for Facial Expression Recognition","authors":"Xin Ma, Yingdong Ma","doi":"10.1109/FG57933.2023.10042525","DOIUrl":"https://doi.org/10.1109/FG57933.2023.10042525","url":null,"abstract":"Facial expression recognition (FER) is a challenging computer vision task due to problems including intra-class variation, occlusion, head-pose variation, etc. The convolutional neural networks (CNNs) have been widely adopted to implement facial expression classification. While convolutional operation captures local information effectively, CNN-models ignore relations between pixels and channels. In this work, we present a Relation-aware Network (RANet) for facial expression classification. RANet is composed of two relational attention modules to construct relationships of spatial positions and channels. Global relationships help RANet focusing on discriminative facial regions to alleviate the above problems. The separable convolution has been applied to compute spatial attention efficiently. Experimental results demonstrate that our proposed method achieves 89.57% and 65.09% accuracy rate on the RAF-DB dataset and the AffectNet-7 dataset, respectively.","PeriodicalId":318766,"journal":{"name":"2023 IEEE 17th International Conference on Automatic Face and Gesture Recognition (FG)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134530678","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chaona Chen, Oliver G. B. Garrod, P. Schyns, Rachael E. Jack
{"title":"Modelling Culturally Diverse Smiles Using Data-Driven Methods","authors":"Chaona Chen, Oliver G. B. Garrod, P. Schyns, Rachael E. Jack","doi":"10.1109/FG57933.2023.10042621","DOIUrl":"https://doi.org/10.1109/FG57933.2023.10042621","url":null,"abstract":"Smiling faces are often preferred in daily social interactions. Many socially interactive human-like virtual agents are equipped with the capability to produce standardized smiles that are widely considered to be universal. However, mounting evidence shows that people from different cultures prefer different smiles. To engage a culturally diverse range of human users, socially interactive human-like virtual agents must be equipped with culturally-valid dynamic facial expressions. To develop culturally sensitive smiles, we use data-driven, perception-based methods to model the facial expressions of happy in 60 individuals in two distinct cultures (East Asian and Western European). On each experimental trial, we generated a random facial animation composed of a random sub-set of individual face movements (i.e., AUs), each with a random movement. Each cultural participant categorized 2400 such facial animations according to an emotion label (e.g., happy) if appropriate, otherwise selecting ‘other.’ We derived facial expression models of happy for each cultural participant by measuring the statistical relationship between the dynamic AUs presented on each trial and each participant's responses. Analysis of the facial expression models revealed clear cross-cultural similarity and diversity in smiles–for example, smiling with raised cheeks (AU12-6) is culturally common, while open-mouth smiling (AU25-12) is Western-specific and smiling with eyebrow raising (AU1-2) is East Asian-specific. Analysis of the temporal dynamics of each AU further revealed cultural diversity in smiles. We anticipate that our approach will improve the social signalling capabilities of socially interactive human-like virtual agents and broaden their usability in global market.","PeriodicalId":318766,"journal":{"name":"2023 IEEE 17th International Conference on Automatic Face and Gesture Recognition (FG)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133892498","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"RavenGaze: A Dataset for Gaze Estimation Leveraging Psychological Experiment Through Eye Tracker","authors":"Tao Xu, Borimandafu Wu, Yuqiong Bai, Yun Zhou","doi":"10.1109/FG57933.2023.10042793","DOIUrl":"https://doi.org/10.1109/FG57933.2023.10042793","url":null,"abstract":"One major challenge in appearance-based gaze estimation is the lack of high-quality labeled data. Establishing databases or datasets is a way to obtain accurate gaze data and test methods or tools. However, the methods of collecting data in existing databases are designed on artificial chasing target tasks or unintentional free-looking tasks, which are not natural and real eye interactions and cannot reflect the inner cognitive processes of humans. To fill this gap, we propose the first gaze estimation dataset collected from an actual psychological experiment by the eye tracker, called the RavenGaze dataset. We design an experiment employing Raven's Matrices as visual stimuli and collecting gaze data, facial videos as well as screen content videos simultaneously. Thirty-four participants were recruited. The results show that the existing algorithms perform well on our RavenGaze dataset in the 3D and 2D gaze estimation task, and demonstrate good generalization ability according to cross-dataset evaluation task. RavenGaze and the establishment of the benchmark lay the foundation for other researchers to do further in-depth research and test their methods or tools. Our dataset is available at https://intelligentinteractivelab.github.io/datasets/RavenGaze/index.html.","PeriodicalId":318766,"journal":{"name":"2023 IEEE 17th International Conference on Automatic Face and Gesture Recognition (FG)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133345421","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Localization using Multi-Focal Spatial Attention for Masked Face Recognition","authors":"Yooshin Cho, Hanbyel Cho, Hyeong Gwon Hong, Jaesung Ahn, Dongmin Cho, JungWoo Chang, Junmo Kim","doi":"10.1109/FG57933.2023.10042672","DOIUrl":"https://doi.org/10.1109/FG57933.2023.10042672","url":null,"abstract":"Since the beginning of world-wide COVID-19 pandemic, facial masks have been recommended to limit the spread of the disease. However, these masks hide certain facial attributes. Hence, it has become difficult for existing face recognition systems to perform identity verification on masked faces. In this context, it is necessary to develop masked Face Recognition (MFR) for contactless biometric recognition systems. Thus, in this paper, we propose Complementary Attention Learning and Multi-Focal Spatial Attention that precisely removes masked region by training complementary spatial attention to focus on two distinct regions: masked regions and backgrounds. In our method, standard spatial attention and networks focus on unmasked regions, and extract mask-invariant features while minimizing the loss of the conventional Face Recognition (FR) performance. For conventional FR, we evaluate the performance on the IJB-C, Age-DB, CALFW, and CPLFW datasets. We evaluate the MFR performance on the ICCV2021-MFR/Insightface track, and demonstrate the improved performance on the both MFR and FR datasets. Additionally, we empirically verify that spatial attention of proposed method is more precisely activated in unmasked regions.","PeriodicalId":318766,"journal":{"name":"2023 IEEE 17th International Conference on Automatic Face and Gesture Recognition (FG)","volume":"80 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123581921","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Maneesh Bilalpur, Saurabh Hinduja, Laura A. Cariola, Lisa B. Sheeber, Nick Alien, László A. Jeni, Louis-Philippe Morency, J. Cohn
{"title":"Multimodal Feature Selection for Detecting Mothers' Depression in Dyadic Interactions with their Adolescent Offspring","authors":"Maneesh Bilalpur, Saurabh Hinduja, Laura A. Cariola, Lisa B. Sheeber, Nick Alien, László A. Jeni, Louis-Philippe Morency, J. Cohn","doi":"10.1109/FG57933.2023.10042796","DOIUrl":"https://doi.org/10.1109/FG57933.2023.10042796","url":null,"abstract":"Depression is the most common psychological disorder, a leading cause of disability world-wide, and a major contributor to inter-generational transmission of psychopathol-ogy within families. To contribute to our understanding of depression within families and to inform modality selection and feature reduction, it is critical to identify interpretable features in developmentally appropriate contexts. Mothers with and without depression were studied. Depression was defined as history of treatment for depression and elevations in current or recent symptoms. We explored two multimodal feature selection strategies in dyadic interaction tasks of mothers with their adolescent children for depression detection. Modalities included face and head dynamics, facial action units, speech-related behavior, and verbal features. The initial feature space was vast and inter-correlated (collinear). To reduce dimension-ality and gain insight into the relative contribution of each modality and feature, we explored feature selection strategies using Variance Inflation Factor (VIF) and Shapley values. On an average collinearity correction through VIF resulted in about 4 times feature reduction across unimodal and multimodal features. Collinearity correction was also found to be an optimal intermediate step prior to Shapley analysis. Shapley feature selection following VIF yielded best performance. The top 15 features obtained through Shapley achieved 78 % accuracy. The most informative features came from all four modalities sampled, which supports the importance of multimodal feature selection.","PeriodicalId":318766,"journal":{"name":"2023 IEEE 17th International Conference on Automatic Face and Gesture Recognition (FG)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128982193","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Are we in sync during turn switch?","authors":"Jieyeon Woo, Liu Yang, C. Achard, C. Pelachaud","doi":"10.1109/FG57933.2023.10042799","DOIUrl":"https://doi.org/10.1109/FG57933.2023.10042799","url":null,"abstract":"During an interaction, people exchange speaking turns by coordinating with their partners. Exchanges can be done smoothly, with pauses between turns or through interruptions. Previous studies have analyzed various modalities to investigate turn shifts and their types (smooth turn exchange, overlap, and interruption). Modality analyses were also done to study the interpersonal synchronization which is observed throughout the whole interaction. Likewise, we intend to analyze different modalities to find a relationship between the different turn switch types and interpersonal synchrony. In this study, we provide an analysis of multimodal features, focusing on prosodic features (F0 and loudness), head activity, and facial action units, to characterize different switch types.","PeriodicalId":318766,"journal":{"name":"2023 IEEE 17th International Conference on Automatic Face and Gesture Recognition (FG)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126797423","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Practical Parametric Synthesis of Realistic Pseudo-Random Face Shapes","authors":"Igor Borovikov, K. Levonyan, Mihai Anghelescu","doi":"10.1109/FG57933.2023.10042771","DOIUrl":"https://doi.org/10.1109/FG57933.2023.10042771","url":null,"abstract":"There is a growing demand for populating virtual worlds with large numbers of realistic-looking characters. Besides hand-crafted characters like the main protagonists in video games, the virtual worlds may also need massive numbers of secondary characters. Manual authoring of their features is not usually practical. For parametric models of human faces, a naive approach randomizes all the parameters of the human face to generate a random one. However, the uniform or hand-crafted distribution of the shape authoring parameters is unlikely to represent value ranges and correlations present naturally in human faces. The paper proposes a simple automated method for generating realistic-looking head shapes via learned mapping between latent space like the FaceNet embedding and the explicit parametric space used by the character modeling tools. Our approach is simple, robust, and can efficiently generate a large variety of head shapes with a predictable dissimilarity.","PeriodicalId":318766,"journal":{"name":"2023 IEEE 17th International Conference on Automatic Face and Gesture Recognition (FG)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121230702","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Masa Hu, Garrick Brazil, Nanxiang Li, Liu Ren, Xiaoming Liu
{"title":"Camera Self-Calibration Using Human Faces","authors":"Masa Hu, Garrick Brazil, Nanxiang Li, Liu Ren, Xiaoming Liu","doi":"10.1109/FG57933.2023.10042701","DOIUrl":"https://doi.org/10.1109/FG57933.2023.10042701","url":null,"abstract":"Despite recent advancements in depth estimation and face alignment, it remains difficult to predict the distance to a human face in arbitrary videos due to the lack of camera calibration. A typical pipeline is to perform calibration with a checkerboard before the video capture, but this is inconvenient to users or impossible for unknown cameras. This work proposes to use the human face as the calibration object to estimate metric depth information and camera intrinsics. Our novel approach alternates between optimizing the 3D face and the camera intrinsics parameterized by a neural network. Compared to prior work, our method performs camera calibration on a larger variety of videos captured by unknown cameras. Further, due to the face prior, our method is more robust to noise in 2D observations compared to previous self-calibration methods. We show that our method improves calibration and depth prediction accuracy over prior works on both synthetic and real data. Code will be available at https://github.com/yhu9/FaceCalibration.","PeriodicalId":318766,"journal":{"name":"2023 IEEE 17th International Conference on Automatic Face and Gesture Recognition (FG)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131361415","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}