{"title":"Viewpoint Integration for Hand-Based Recognition of Social Interactions from a First-Person View.","authors":"Sven Bambach, David J Crandall, Chen Yu","doi":"10.1145/2818346.2820771","DOIUrl":"https://doi.org/10.1145/2818346.2820771","url":null,"abstract":"<p><p>Wearable devices are becoming part of everyday life, from first-person cameras (GoPro, Google Glass), to smart watches (Apple Watch), to activity trackers (FitBit). These devices are often equipped with advanced sensors that gather data about the wearer and the environment. These sensors enable new ways of recognizing and analyzing the wearer's everyday personal activities, which could be used for intelligent human-computer interfaces and other applications. We explore one possible application by investigating how egocentric video data collected from head-mounted cameras can be used to recognize social activities between two interacting partners (e.g. playing chess or cards). In particular, we demonstrate that just the positions and poses of hands within the first-person view are highly informative for activity recognition, and present a computer vision approach that detects hands to automatically estimate activities. While hand pose detection is imperfect, we show that combining evidence across first-person views from the two social partners significantly improves activity recognition accuracy. This result highlights how integrating weak but complimentary sources of evidence from social partners engaged in the same task can help to recognize the nature of their interaction.</p>","PeriodicalId":74508,"journal":{"name":"Proceedings of the ... ACM International Conference on Multimodal Interaction. ICMI (Conference)","volume":"2015 ","pages":"351-354"},"PeriodicalIF":0.0,"publicationDate":"2015-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/2818346.2820771","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35459048","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hamdi Dibeklioğlu, Zakia Hammal, Ying Yang, Jeffrey F Cohn
{"title":"Multimodal Detection of Depression in Clinical Interviews.","authors":"Hamdi Dibeklioğlu, Zakia Hammal, Ying Yang, Jeffrey F Cohn","doi":"10.1145/2818346.2820776","DOIUrl":"https://doi.org/10.1145/2818346.2820776","url":null,"abstract":"<p><p>Current methods for depression assessment depend almost entirely on clinical interview or self-report ratings. Such measures lack systematic and efficient ways of incorporating behavioral observations that are strong indicators of psychological disorder. We compared a clinical interview of depression severity with automatic measurement in 48 participants undergoing treatment for depression. Interviews were obtained at 7-week intervals on up to four occasions. Following standard cut-offs, participants at each session were classified as remitted, intermediate, or depressed. Logistic regression classifiers using leave-one-out validation were compared for facial movement dynamics, head movement dynamics, and vocal prosody individually and in combination. Accuracy (remitted versus depressed) for facial movement dynamics was higher than that for head movement dynamics; and each was substantially higher than that for vocal prosody. Accuracy for all three modalities together reached 88.93%, exceeding that for any single modality or pair of modalities. These findings suggest that automatic detection of depression from behavioral indicators is feasible and that multimodal measures afford most powerful detection.</p>","PeriodicalId":74508,"journal":{"name":"Proceedings of the ... ACM International Conference on Multimodal Interaction. ICMI (Conference)","volume":"2015 ","pages":"307-310"},"PeriodicalIF":0.0,"publicationDate":"2015-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/2818346.2820776","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34416673","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Stefan Scherer, Zakia Hammal, Ying Yang, Louis-Philippe Morency, Jeffrey F Cohn
{"title":"Dyadic Behavior Analysis in Depression Severity Assessment Interviews.","authors":"Stefan Scherer, Zakia Hammal, Ying Yang, Louis-Philippe Morency, Jeffrey F Cohn","doi":"10.1145/2663204.2663238","DOIUrl":"10.1145/2663204.2663238","url":null,"abstract":"<p><p>Previous literature suggests that depression impacts vocal timing of both participants and clinical interviewers but is mixed with respect to acoustic features. To investigate further, 57 middle-aged adults (men and women) with Major Depression Disorder and their clinical interviewers (all women) were studied. Participants were interviewed for depression severity on up to four occasions over a 21 week period using the Hamilton Rating Scale for Depression (HRSD), which is a criterion measure for depression severity in clinical trials. Acoustic features were extracted for both participants and interviewers using COVAREP Toolbox. Missing data occurred due to missed appointments, technical problems, or insufficient vocal samples. Data from 36 participants and their interviewers met criteria and were included for analysis to compare between high and low depression severity. Acoustic features for participants varied between men and women as expected, and failed to vary with depression severity for participants. For interviewers, acoustic characteristics strongly varied with severity of the interviewee's depression. Accommodation - the tendency of interactants to adapt their communicative behavior to each other - between interviewers and interviewees was inversely related to depression severity. These findings suggest that interviewers modify their acoustic features in response to depression severity, and depression severity strongly impacts interpersonal accommodation.</p>","PeriodicalId":74508,"journal":{"name":"Proceedings of the ... ACM International Conference on Multimodal Interaction. ICMI (Conference)","volume":"2014 ","pages":"112-119"},"PeriodicalIF":0.0,"publicationDate":"2014-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/2663204.2663238","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34857329","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Automatic detection of pain intensity.","authors":"Zakia Hammal, Jeffrey F Cohn","doi":"10.1145/2388676.2388688","DOIUrl":"10.1145/2388676.2388688","url":null,"abstract":"<p><p>Previous efforts suggest that occurrence of pain can be detected from the face. Can intensity of pain be detected as well? The Prkachin and Solomon Pain Intensity (PSPI) metric was used to classify four levels of pain intensity (none, trace, weak, and strong) in 25 participants with previous shoulder injury (McMaster-UNBC Pain Archive). Participants were recorded while they completed a series of movements of their affected and unaffected shoulders. From the video recordings, canonical normalized appearance of the face (CAPP) was extracted using active appearance modeling. To control for variation in face size, all CAPP were rescaled to 96×96 pixels. CAPP then was passed through a set of Log-Normal filters consisting of 7 frequencies and 15 orientations to extract 9216 features. To detect pain level, 4 support vector machines (SVMs) were separately trained for the automatic measurement of pain intensity on a frame-by-frame level using both 5-folds cross-validation and leave-one-subject-out cross-validation. F1 for each level of pain intensity ranged from 91% to 96% and from 40% to 67% for 5-folds and leave-one-subject-out cross-validation, respectively. Intra-class correlation, which assesses the consistency of continuous pain intensity between manual and automatic PSPI was 0.85 and 0.55 for 5-folds and leave-one-subject-out cross-validation, respectively, which suggests moderate to high consistency. These findings show that pain intensity can be reliably measured from facial expression in participants with orthopedic injury.</p>","PeriodicalId":74508,"journal":{"name":"Proceedings of the ... ACM International Conference on Multimodal Interaction. ICMI (Conference)","volume":"2012 ","pages":"47-52"},"PeriodicalIF":0.0,"publicationDate":"2012-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7385931/pdf/nihms-1599641.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"38205962","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Arman Savran, Houwei Cao, Miraj Shah, Ani Nenkova, Ragini Verma
{"title":"Combining Video, Audio and Lexical Indicators of Affect in Spontaneous Conversation via Particle Filtering.","authors":"Arman Savran, Houwei Cao, Miraj Shah, Ani Nenkova, Ragini Verma","doi":"10.1145/2388676.2388781","DOIUrl":"https://doi.org/10.1145/2388676.2388781","url":null,"abstract":"We present experiments on fusing facial video, audio and lexical indicators for affect estimation during dyadic conversations. We use temporal statistics of texture descriptors extracted from facial video, a combination of various acoustic features, and lexical features to create regression based affect estimators for each modality. The single modality regressors are then combined using particle filtering, by treating these independent regression outputs as measurements of the affect states in a Bayesian filtering framework, where previous observations provide prediction about the current state by means of learned affect dynamics. Tested on the Audio-visual Emotion Recognition Challenge dataset, our single modality estimators achieve substantially higher scores than the official baseline method for every dimension of affect. Our filtering-based multi-modality fusion achieves correlation performance of 0.344 (baseline: 0.136) and 0.280 (baseline: 0.096) for the fully continuous and word level sub challenges, respectively.","PeriodicalId":74508,"journal":{"name":"Proceedings of the ... ACM International Conference on Multimodal Interaction. ICMI (Conference)","volume":"2012 ","pages":"485-492"},"PeriodicalIF":0.0,"publicationDate":"2012-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/2388676.2388781","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"32734741","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}