{"title":"Feature Level Fusion for Bimodal Facial Action Unit Recognition","authors":"Zibo Meng, Shizhong Han, Min Chen, Yan Tong","doi":"10.1109/ISM.2015.116","DOIUrl":"https://doi.org/10.1109/ISM.2015.116","url":null,"abstract":"Recognizing facial actions from spontaneous facial displays suffers from subtle and complex facial deformation, frequent head movements, and partial occlusions. It is especially challenging when the facial activities are accompanied with speech. Instead of employing information solely from the visual channel, this paper presents a novel fusion framework, which exploits information from both visual and audio channels in recognizing speech-related facial action units (AUs). In particular, features are first extracted from visual and audio channels, independently. Then, the audio features are aligned with the visual features in order to handle the difference in time scales and the time shift between the two signals. Finally, these aligned audio and visual features are integrated via a feature-level fusion framework and utilized in recognizing AUs. Experimental results on a new audiovisual AU-coded dataset have demonstrated that the proposed feature-level fusion framework outperforms a state-of-the-art visual-based method in recognizing speech-related AUs, especially for those AUs that are \"invisible\" in the visual channel during speech. The improvement is more impressive with occlusions on the facial images, which, fortunately, would not affect the audio channel.","PeriodicalId":250353,"journal":{"name":"2015 IEEE International Symposium on Multimedia (ISM)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121448682","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kevin Desai, K. Bahirat, S. Raghuraman, B. Prabhakaran
{"title":"Network Adaptive Textured Mesh Generation for Collaborative 3D Tele-Immersion","authors":"Kevin Desai, K. Bahirat, S. Raghuraman, B. Prabhakaran","doi":"10.1109/ISM.2015.111","DOIUrl":"https://doi.org/10.1109/ISM.2015.111","url":null,"abstract":"3D Tele-Immersion (3DTI) has emerged as an efficient environment for virtual interactions and collaborations in a variety of fields like rehabilitation, education, gaming, etc. In 3DTI, geographically distributed users are captured using multiple cameras and immersed in a single virtual environment. The quality of experience depends on the available network bandwidth, quality of the 3D model generated and the time taken for rendering. In a collaborative environment, achieving high quality, high frame rate rendering by transmitting data to multiple sites having different bandwidth is challenging. In this paper we introduce a network adaptive textured mesh generation scheme to transmit varying quality data based on the available bandwidth. To reduce the volume of information transmitted, a visual quality based vertex selection approach is used to generate a sparse representation of the user. This sparse representation is then transmitted to the receiver side where a sweep-line based technique is used to generate a 3D mesh of the user. High visual quality is maintained by transmitting a high resolution texture image compressed using a lossy compression algorithm. In our studies users were unable to notice visual quality variations of the rendered 3D model even at 90% compression.","PeriodicalId":250353,"journal":{"name":"2015 IEEE International Symposium on Multimedia (ISM)","volume":"85 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126562090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Classquake: Measuring Students' Attentiveness in the Classroom","authors":"Kai Michael Hover, M. Muhlhauser","doi":"10.1109/ism.2015.24","DOIUrl":"https://doi.org/10.1109/ism.2015.24","url":null,"abstract":"","PeriodicalId":250353,"journal":{"name":"2015 IEEE International Symposium on Multimedia (ISM)","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134125097","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Foveated High Efficiency Video Coding for Low Bit Rate Transmission","authors":"I. Cheng, Masha Mohammadkhani, A. Basu, F. Dufaux","doi":"10.1109/ISM.2015.37","DOIUrl":"https://doi.org/10.1109/ISM.2015.37","url":null,"abstract":"This work describes the design and subjective performance of Foveated High Efficiency Video Coding (FHEVC). Even though foveation has been widely used for various forms of compression since the early 1990s, we believe its use to improve HEVC is new. We consider the application of, possibly moving, foveated compression in this work and evaluate scenarios where it can be used to improve perceptual quality of videos under constrained transmission resources, e.g., bandwidth. A new method to reduce artifacts during remapping is also proposed. The preliminary implementation considers a single fovea only. Experiments summarizing user evaluations are presented to validate our implementation.","PeriodicalId":250353,"journal":{"name":"2015 IEEE International Symposium on Multimedia (ISM)","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123811358","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Frame Synchronization of Live Video Streams Using Visible Light Communication","authors":"Maziar Mehrabi, S. Lafond, Le Wang","doi":"10.1109/ISM.2015.26","DOIUrl":"https://doi.org/10.1109/ISM.2015.26","url":null,"abstract":"With the growth of heterogeneous social media networks and the widespread use of camera-equipped handheld devices, interactive video broadcasting services are emerging on the Internet. When a media server combines and broadcasts live-streaming video contents received from heterogeneous camera equipped devices filming a common scene from different angles, the time-based alignment of the audio and video streams is required. Although many techniques and methods for video stream synchronization have been in use or proposed, these solutions are not suitable for a non-centralized multi-camera system consisting of for example heterogeneous camera-equipped smart phones. This paper proposes a novel approach by harnessing the capabilities of Visible Light Communication (VLC) to provide a robust and efficient way to synchronize video streams. This paper presents the design and implementation of a VLC-based video synchronization prototype. The synchronization of different video streams is provided by the means of VLC through Light Emitting Diode (LED) lights and digital phone cameras. This is achieved by embedding the necessary information as light patterns in the video content which can later be extracted by processing the video streams. The main benefit of our approach is the ability to use off-the-shelf cameras as it does not require any modification of software or hardware components in the camera devices. Moreover, the means of VLC can be exploited to carry other types of information such as position so that the receiver of the video stream can have a notion of the location in which the video was recorded.","PeriodicalId":250353,"journal":{"name":"2015 IEEE International Symposium on Multimedia (ISM)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116235884","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Dynamic MCU Placement for Video Conferencing on Peer-to-Peer Network","authors":"Md. Amjad Hossain, J. Khan","doi":"10.1109/ISM.2015.125","DOIUrl":"https://doi.org/10.1109/ISM.2015.125","url":null,"abstract":"In this paper, we investigate a novel Multipoint Video Conferencing (MVC) architecture potentially suitable for Peer-to-Peer (P2P) platform, such as Gnutella. In particular, we present an election protocol (extension to Gnutella) where the Multipoint Control Unit (MCU) of the MVC is dynamically migrated among peers when new peer joins or leaves. Simulation result shows that this improves overall conferencing performance compared to the system with static MCU by minimizing total traffic, individual node hotness, and video composition delay.","PeriodicalId":250353,"journal":{"name":"2015 IEEE International Symposium on Multimedia (ISM)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116847974","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Camilo Arévalo, M. GerardoM.Sarria, M. Mora, Carlos A. Arce-Lopera
{"title":"Towards an Efficient Algorithm to Get the Chorus of a Salsa Song","authors":"Camilo Arévalo, M. GerardoM.Sarria, M. Mora, Carlos A. Arce-Lopera","doi":"10.1109/ISM.2015.42","DOIUrl":"https://doi.org/10.1109/ISM.2015.42","url":null,"abstract":"A well-known musical genre and part of Latin-American cultural identity is Salsa. To be able to perform a scientific analysis of this genre, the first step to take is to analyze the structure of Salsa songs. Furthermore, the most representative part of Salsa is the chorus. In this paper we detail the design and implementation of an algorithm developed for getting the chorus of any Salsa song.","PeriodicalId":250353,"journal":{"name":"2015 IEEE International Symposium on Multimedia (ISM)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129273833","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Scalable Saliency-Aware Distributed Compressive Video Sensing","authors":"Jin Xu, S. Djahel, Yuansong Qiao","doi":"10.1109/ISM.2015.54","DOIUrl":"https://doi.org/10.1109/ISM.2015.54","url":null,"abstract":"Distributed compressive video sensing (DCVS) is an emerging low-complexity video coding framework which integrates the merits of distributed video coding (DVC) and compressive sensing (CS). Because the human visual system (HVS) is the ultimate receiver of visual signals, we aim to improve the perceptual rate-distortion performance of DCVS by designing a novel scalable saliency-aware DCVS codec. Firstly, we perform saliency estimation in the the side information (SI) frame generated at the decoder side and adaptively control the size of region-of-interest (ROI) according to the measurements budget by applying a saliency guided foveation model. Subsequently, based on online estimation of the correlation noise between a non-key frame and its SI, we develop a saliency-aware block compressive sensing scheme to more accurately reconstruct the ROI of each non-key frame. The obtained experimental results reveal that our DCVS codec outperforms the legacy DCVS codecs in terms of the perceptual rate-distortion performance.","PeriodicalId":250353,"journal":{"name":"2015 IEEE International Symposium on Multimedia (ISM)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122237307","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Super-Resolution Method Using Spatio-Temporal Registration of Multi-Scale Components in Consideration of Color-Sampling Patterns of UHDTV Cameras","authors":"Y. Matsuo, S. Sakaida","doi":"10.1109/ISM.2015.57","DOIUrl":"https://doi.org/10.1109/ISM.2015.57","url":null,"abstract":"Ultra high-definition television (UHDTV) video contain many similar objects in a single-frame because it has high self-similarity caused by its high resolution. In addition, typical UHDTV cameras have one-CMOS sensor with a Bayer or other color-sampling pattern. A super-resolution method using single-frame registration of an original image and its multi-scale components is therefore proposed. Furthermore, this registration performs similarly for this original image and multi-scale components in past and future images of this original image. Accuracy of the registration is enhanced by compensating the registration results in consideration of color-sampling patterns of UHDTV cameras. Experiments show that the proposed method provides an objectively better PSNR measurement and a subjectively better appearance in comparison with the conventional and state-of-the-art super-resolution methods.","PeriodicalId":250353,"journal":{"name":"2015 IEEE International Symposium on Multimedia (ISM)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127896000","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
F. Branco, Nuno Correia, A. Rodrigues, João Gouveia, Rui Nóbrega
{"title":"Temporal and Spatial Evolution through Images","authors":"F. Branco, Nuno Correia, A. Rodrigues, João Gouveia, Rui Nóbrega","doi":"10.1109/ISM.2015.105","DOIUrl":"https://doi.org/10.1109/ISM.2015.105","url":null,"abstract":"Image matching algorithms are used in image search, classification and retrieval but are also useful to show how urban structures evolve over time. Images have the power to illustrate and evoke past events and can be used to show the evolution of structures such as buildings and other elements present in the urban landscape. The paper describes a process and a tool to provide a chronological journey through time, given a set of photographs from different time periods. The developed tool provides the ability to generate visualizations of a geographic location, given a set of related images, taken at different periods in time. It automatically processes comparisons of images and establishes relationships between them. It also offers a semi-automated method to define relationships between parts of images.","PeriodicalId":250353,"journal":{"name":"2015 IEEE International Symposium on Multimedia (ISM)","volume":"226 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121480102","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}