{"title":"Social stances by virtual smiles","authors":"M. Ochs, C. Pelachaud, K. Prepin","doi":"10.1109/WIAMIS.2013.6616144","DOIUrl":"https://doi.org/10.1109/WIAMIS.2013.6616144","url":null,"abstract":"When two persons participate to a discussion, they not only exchange about the concepts and ideas they are dis-cussing, but they also express stances with regard to content of their speech (called epistemic stances) and to convey their interpersonal relationship (called interpersonal stances). The stances can be expressed through non-verbal behaviors, for instance smiles. Stances are also co-constructed by their interactants through simultaneous or sequential behaviors such as the alignment of speaker's and listener's smiles. In this paper, we present several studies exploring the stances (epistemic, interpersonal, and co-constructed) that the social signal of smile may convey. We propose to analyze different contextual levels to highlight how users' engagement and discourse context influence their perception of the virtual characters' stances.","PeriodicalId":408077,"journal":{"name":"2013 14th International Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129470435","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A nested infinite Gaussian mixture model for identifying known and unknown audio events","authors":"Y. Sasaki, Kazuyoshi Yoshii, S. Kagami","doi":"10.1109/WIAMIS.2013.6616152","DOIUrl":"https://doi.org/10.1109/WIAMIS.2013.6616152","url":null,"abstract":"This paper presents a novel statistical method that can classify given audio events into known classes or recognize them as an unknown class. We propose a nested infinite Gaussian mixture model (iGMM) to represent varied audio events in real environment. One of the main problems of conventional classification methods is that we need to specify a fixed number of classes in advance. Therefore, all audio events are forced to be classified into known classes. To solve the problem, the proposed method formulates a infinite Gaussian mixture model (iGMM) in which the number of classes are allowed to increase without bound. Another problem is that the complexity of each audio event is different. Then, the nested iGMM using nonparametric Bayesian approach is applied to adjust the needed dimension of each audio model. Experimental results show the effectiveness for these two problems to represent the given audio events.","PeriodicalId":408077,"journal":{"name":"2013 14th International Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS)","volume":"158 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121392425","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On coding and resampling of video in 4:2:2 chroma format for cascaded coding applications","authors":"Andrea Gabriellini, M. Mrak","doi":"10.1109/WIAMIS.2013.6616153","DOIUrl":"https://doi.org/10.1109/WIAMIS.2013.6616153","url":null,"abstract":"Throughout the broadcasting chain 4:2:2 chroma format is widely used even if some parts of the chain require other formats (4:2:0 or 4:4:4). This paper presents an approach to coding video content in 4:2:2 chroma format using resampling of chroma samples. All subsequent video coding operations are then carried out at the new chroma format. The choice of filter for resampling the reconstructed video signal is sent to the decoder in the compressed bit-stream. This paper investigates choices of resampling filters and coding parameters associated with the proposed approach with a goal to minimise conversion losses. Coding performance of possible solutions are reported for two reversible resampling filter pairs when applied in the emerging HEVC video coding standard.","PeriodicalId":408077,"journal":{"name":"2013 14th International Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114025345","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Event-driven retrieval in collaborative photo collections","authors":"M. Brenner, E. Izquierdo","doi":"10.1109/WIAMIS.2013.6616121","DOIUrl":"https://doi.org/10.1109/WIAMIS.2013.6616121","url":null,"abstract":"We present an approach to retrieve photos relating to social events in collaborative photo collections. Compared to traditional approaches that typically consider only the visual features of photos as a source of information, we incorporate multiple additional contextual cues like date and time, location and usernames to improve retrieval performance. Experiments based on the MediaEval Social Event Detection Dataset demonstrate the effectiveness of our approach.","PeriodicalId":408077,"journal":{"name":"2013 14th International Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS)","volume":"108 1-3","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131519749","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Introducing motion information in dense feature classifiers","authors":"Claudiu Tanase, B. Mérialdo","doi":"10.1109/WIAMIS.2013.6616132","DOIUrl":"https://doi.org/10.1109/WIAMIS.2013.6616132","url":null,"abstract":"Semantic concept detection in large scale video collections is mostly achieved through a static analysis of selected keyframes. A popular choice for representing the visual content of an image is based on the pooling of local descriptors such as Dense SIFT. However, simple motion features such as optic flow can be extracted relatively easy from such keyframes. In this paper we propose an efficient addition to the DSIFT approach by including information derived from optic flow. Based on optic flow magnitude, we can estimate for each DSIFT patch whether it is static or moving. We modify the bag of words model used traditionally with DSIFT by creating two separate occurrence histograms instead of one: one for static patches and one for dynamic patches. We further refine this method by studying different separation thresholds and soft assign-ment, as well as different normalization techniques. Classifier score fusion is used to maximize the average precision of all these variants. Experimental results on the TRECVID Semantic Indexing collection show that by means of classifier fusion our method increases overall mean average precision of the DSIFT classifier from 0.061 to 0.106.","PeriodicalId":408077,"journal":{"name":"2013 14th International Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS)","volume":"62 11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131027801","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Affine invariant salient patch descriptors for image retrieval","authors":"F. Isikdogan, A. A. Salah","doi":"10.1109/WIAMIS.2013.6616136","DOIUrl":"https://doi.org/10.1109/WIAMIS.2013.6616136","url":null,"abstract":"Image description constitutes a major part of matching-based tasks in computer vision. The size of descriptors becomes more important for retrieval tasks in large datasets. In this paper, we propose a compact and robust image description algorithm for image retrieval, which consists of three main stages: salient patch extraction, affine invariant feature computation over concentric elliptical tracks on the patch, and global feature incorporation. We evaluate the performance of our algorithm for region-based image retrieval and image reuse detection, a special case of image retrieval. We present a novel synthetic image reuse dataset, which is generated by superimposing objects on different background images with systematic transformations. Our results show that the proposed descriptor is effective for this problem.","PeriodicalId":408077,"journal":{"name":"2013 14th International Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS)","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127662969","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Eftychios E. Protopapadakis, A. Doulamis, N. Doulamis
{"title":"Tapped delay multiclass support vector machines for industrial workflow recognition","authors":"Eftychios E. Protopapadakis, A. Doulamis, N. Doulamis","doi":"10.1109/WIAMIS.2013.6616141","DOIUrl":"https://doi.org/10.1109/WIAMIS.2013.6616141","url":null,"abstract":"In this paper, a tapped delay multiclass support vector machine scheme is used for supervised job classification, based on video data taken from Nissan factory. The procedure is based on multiclass SVMs enhanced with the time dimension by incorporating additional information of n-th previous frames and allowing for user feedback when necessary. Such methodology will support the visual supervision of industrial environments by providing essential information to the supervisors and supporting their job.","PeriodicalId":408077,"journal":{"name":"2013 14th International Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126468455","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An application framework for implicit sentiment human-centered tagging using attributed affect","authors":"K. C. Apostolakis, P. Daras","doi":"10.1109/WIAMIS.2013.6616145","DOIUrl":"https://doi.org/10.1109/WIAMIS.2013.6616145","url":null,"abstract":"In this paper, a novel framework for implicit sentiment image tagging and retrieval is presented, based on the concept of attributed affect. The user's affective response is recorded and analyzed to provide an appropriate affective label, while eye gaze is monitored in order to identify a specific object depicted in the scene, which is attributed as the cause of the user's current state of core affect. Through this procedure, automatic tagging of content, as well as retrieval based on personal preferences is possible. Our experiments show that our framework successfully channels behavioral tags (in the form of affective labels) to the data tagging and retrieval loop, even when applied in the context of a cost-efficient, widely available hardware setup, that uses a single low resolution webcam mounted on a standard modern computer system.","PeriodicalId":408077,"journal":{"name":"2013 14th International Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121572885","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Footstep detection and classification using distributed microphones","authors":"K. Nakadai, Yuta Fujii, S. Sugano","doi":"10.1109/WIAMIS.2013.6616127","DOIUrl":"https://doi.org/10.1109/WIAMIS.2013.6616127","url":null,"abstract":"This paper addresses footstep detection and classification with multiple microphones distributed on the floor. We propose to introduce geometrical features such as position and velocity of a sound source for classification which is estimated by amplitude-based localization. It does not require precise inter-microphone time synchronization unlike a conventional microphone array technique. To classify various types of sound events, we introduce four types of features, i.e., time-domain, spectral and Cepstral features in addition to the geometrical features. We constructed a prototype system for footstep detection and classification based on the proposed ideas with eight microphones aligned in a 2-by-4 grid manner. Preliminary classification experiments showed that classification accuracy for four types of sound sources such as a walking footstep, running footstep, handclap, and utterance maintains over 70% even when the signal-to-noise ratio is low, like 0 dB. We also confirmed two advantages with the proposed footstep detection and classification. One is that the proposed features can be applied to classification of other sound sources besides footsteps. The other is that the use of a multichannel approach further improves noise-robustness by selecting the best microphone among the microphones, and providing geometrical information on a sound source.","PeriodicalId":408077,"journal":{"name":"2013 14th International Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132249181","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alessio Degani, M. Dalai, R. Leonardi, P. Migliorati
{"title":"A heuristic for distance fusion in cover song identification","authors":"Alessio Degani, M. Dalai, R. Leonardi, P. Migliorati","doi":"10.1109/WIAMIS.2013.6616128","DOIUrl":"https://doi.org/10.1109/WIAMIS.2013.6616128","url":null,"abstract":"In this paper, we propose a method to integrate the results of different cover song identification algorithms into one single measure which, on the average, gives better results than initial algorithms. The fusion of the different distance measures is made by projecting all the measures in a multi-dimensional space, where the dimensionality of this space is the number of the considered distances. In our experiments, we test two distance measures, namely the Dynamic Time Warping and the Qmax measure when applied in different combinations to two features, namely a Salience feature and a Harmonic Pitch Class Profile (HPCP). While the HPCP is meant to extract purely harmonic descriptions, in fact, the Salience allows to better discern melodic differences. It is shown that the combination of two or more distance measure improves the overall performance.","PeriodicalId":408077,"journal":{"name":"2013 14th International Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS)","volume":"185 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115701028","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}