Shayok Chakraborty, V. Balasubramanian, S. Panchanathan
{"title":"Batch Mode Active Learning for Multimedia Pattern Recognition","authors":"Shayok Chakraborty, V. Balasubramanian, S. Panchanathan","doi":"10.1109/ISM.2012.101","DOIUrl":"https://doi.org/10.1109/ISM.2012.101","url":null,"abstract":"Multimedia applications like face recognition and facial expression recognition inherently rely on the availability of a large amount of labeled data to train a robust recognition system. In order to induce a reliable classification model for a multimedia pattern recognition application, the data is typically labeled by human experts based on some domain knowledge. However, manual annotation of a large number of images is an expensive process in terms of time, labor and human expertise. This has led to the development of active learning algorithms, which automatically identify the salient instances from a given set of unlabeled data and are effective in reducing the human annotation effort to train a classification model. Further, to address the possible presence of multiple labeling oracles, there have been efforts towards a batch form of active learning, where a set of unlabeled images are selected simultaneously for labeling instead of a single image at a time. Existing algorithms on batch mode active learning concentrate only on the development of a batch selection criterion and assume that the batch size (number of samples to be queried from an unlabeled set) to be specified in advance. However, in multimedia applications like face/facial expression recognition, it is difficult to decide on a batch size in advance because of the dynamic nature of video streams. Further, multimedia applications like facial expression recognition involve a fuzzy label space because of the imprecision and the vagueness in the class label boundaries. This necessitates a BMAL framework, for fuzzy label problems. To address these fundamental challenges, we propose two novel BMAL techniques in this work: (i) a framework for dynamic batch mode active learning, which adaptively selects the batch size and the specific instances to be queried based on the complexity of the data stream being analyzed and (ii) a BMAL algorithm for fuzzy label classification problems. To the best of our knowledge, this is the first attempt to develop such techniques in the active learning literature.","PeriodicalId":282528,"journal":{"name":"2012 IEEE International Symposium on Multimedia","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115387929","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"EDContours: High-Speed Parameter-Free Contour Detector Using EDPF","authors":"C. Akinlar, C. Topal","doi":"10.1109/ISM.2012.37","DOIUrl":"https://doi.org/10.1109/ISM.2012.37","url":null,"abstract":"We present a high-speed contour detector, which we name EDContours, that works by running our real-time parameter-free edge segment detector, Edge Drawing Parameter Free (EDPF), at different scale-space representations of an image. Combining the edge segments detected by EDPF at different scales, EDContours generates a soft contour map for a given image. EDContours works on gray-scale images, is parameter-free, runs very fast, and results in an F-measure score of 0.62 on the Berkeley Segmentation Dataset (BSDS300).","PeriodicalId":282528,"journal":{"name":"2012 IEEE International Symposium on Multimedia","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123139642","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"3D Model Hypotheses for Player Segmentation and Rendering in Free-Viewpoint Soccer Video","authors":"Haopeng Li, M. Flierl","doi":"10.1109/ISM.2012.47","DOIUrl":"https://doi.org/10.1109/ISM.2012.47","url":null,"abstract":"This paper presents a player segmentation approach based on 3D model hypotheses for soccer games. We use a hyper plane model for player modeling and a collection of piecewise geometric models for background modeling. To determine the assignment of each pixel in the image plane, we test it with two model hypotheses. We construct a cost function that measures the fitness of model hypotheses for each pixel. To fully utilize the perspective diversity of the multiview imagery, we propose a three-step strategy to choose the best model for each pixel. The experimental results show that our segmentation approach based on 3D model hypotheses outperforms conventional temporal median and graph cut methods for both subjective and objective evaluation.","PeriodicalId":282528,"journal":{"name":"2012 IEEE International Symposium on Multimedia","volume":"9 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116786875","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"TEEVE-Remote: A Novel User-Interaction Solution for 3D Tele-immersive System","authors":"Pengye Xia, K. Nahrstedt, M. A. Jurik","doi":"10.1109/ISM.2012.77","DOIUrl":"https://doi.org/10.1109/ISM.2012.77","url":null,"abstract":"3D Tele-immersion (3DTI) system enables geographically distributed users to interact with each other in the virtual 3D space. Many 3DTI applications require users to have frequent physical movement in the application (e.g, 3D interactive exergaming, remote therapy). However, traditional user interaction (UI) solution (which includes large display, mouse/keyboard) for 3DTI system does not give users much freedom to move during the interaction and thus has difficulties to meet this requirement. In this work, we design and implement a novel UI solution TEEVE-Remote which utilizes state-of-the-art camera, mobile phone and display technologies to overcome the difficulties and therefore significantly improve the user experience of 3DTI system.","PeriodicalId":282528,"journal":{"name":"2012 IEEE International Symposium on Multimedia","volume":"86 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121141727","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hannes Fassold, Stefanie Wechtitsch, Albert Hofmann, W. Bailer, P. Schallauer, R. Borgotallo, A. Messina, Mohan Liu, P. Ndjiki-Nya, Peter Altendorf
{"title":"Automated Visual Quality Analysis for Media Production","authors":"Hannes Fassold, Stefanie Wechtitsch, Albert Hofmann, W. Bailer, P. Schallauer, R. Borgotallo, A. Messina, Mohan Liu, P. Ndjiki-Nya, Peter Altendorf","doi":"10.1109/ISM.2012.82","DOIUrl":"https://doi.org/10.1109/ISM.2012.82","url":null,"abstract":"Automatic quality control for audiovisual media is an important tool in the media production process. In this paper we present tools for assessing the quality of audiovisual content in order to decide about the reusability of archive content. We first discuss automatic detectors for the common impairments noise and grain, video breakups, sharpness, image dynamics and blocking. For the efficient viewing and verification of the automatic results by an operator, three approaches for user interfaces are presented. Finally, we discuss the integration of the tools into a service oriented architecture, focusing on the recent standardization efforts by EBU and AMWA's Joint Task Force on a Framework for Interoperability of Media Services in TV Production (FIMS).","PeriodicalId":282528,"journal":{"name":"2012 IEEE International Symposium on Multimedia","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123587383","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Detailed Comparative Analysis of VP8 and H.264","authors":"Yousef O. Sharrab, Nabil J. Sarhan","doi":"10.1109/ISM.2012.33","DOIUrl":"https://doi.org/10.1109/ISM.2012.33","url":null,"abstract":"VP8 has recently been offered by Google as an open video compression format in attempt to compete with the widely used H.264 video compression standard. This paper describes the major differences between VP8 and H.264 and provides detailed comparative evaluations through extensive experiments. We use 29 raw video sequences, offering a wide spectrum of resolutions and content characteristics, with the resolution ranging from 176×144 (QCIF) to 3840×2160 (2160p). To ensure a fair study, we use 3 coding presets in H.264, each with three types of tuning, and 7 presets in VP8. The presets cover a variety of achieved quality or complexity levels. The performance metrics include accuracy of bit rate handling, encoding speed, decoding speed, and perceptual video quality.","PeriodicalId":282528,"journal":{"name":"2012 IEEE International Symposium on Multimedia","volume":"186 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114183089","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Automatic Classification of Teeth in Bitewing Dental Images Using OLPP","authors":"Nourdin Al-sherif, G. Guo, H. Ammar","doi":"10.1109/ISM.2012.26","DOIUrl":"https://doi.org/10.1109/ISM.2012.26","url":null,"abstract":"Teeth classification is an important component in building an Automated Dental Identification System (ADIS) as part of creating a data structure that guides tooth-to-tooth matching. This aids in avoiding illogical comparisons that both inefficiently consume the limited computational resources and mislead decision-making. We tackle this problem by using low computational-cost, appearance-based Orthogonal Locality Preserving Projection (OLPP) algorithm to assign an initial class, i.e. molar or premolar to the teeth in bitewing dental images. After this initial classification, we use a string matching technique, based on teeth neighborhood rules, to validate initial teeth-classes and thus assign each tooth a number corresponding to its location in the dental chart. On a large dataset of bitewing films that contain 622 teeth, the proposed approach achieves classification accuracy of 89% and teeth class validation enhances the overall teeth classification accuracy to 92%.","PeriodicalId":282528,"journal":{"name":"2012 IEEE International Symposium on Multimedia","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130657651","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Understanding Your Needs: An Adaptive VoD System","authors":"Mu Mu, W. Knowles, N. Race","doi":"10.1109/ISM.2012.55","DOIUrl":"https://doi.org/10.1109/ISM.2012.55","url":null,"abstract":"Video-on-demand (VoD) is becoming a popular service for commercial content distribution by offering end users the freedom to access recorded programmes. The management of on-demand assets is essential to maximise the efficiency of storage and network utilisation as well as advertisement. This paper introduces our recent efforts in design and implementation of an adaptive VoD archive system in an IPTV infrastructure. The system exploits live statistics on the user behaviours as well as the dynamic popularity of VoD programmes. Using the modelled programme popularity function, the VoD archive is capable of managing the VoD repository by adapting to the most recent user requests. The design has greatly improved the activity of VoD repository and user experience in on-demand services.","PeriodicalId":282528,"journal":{"name":"2012 IEEE International Symposium on Multimedia","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123675926","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Incorporating Fuzziness in Extended Local Ternary Patterns","authors":"W. Liao","doi":"10.1109/ISM.2012.36","DOIUrl":"https://doi.org/10.1109/ISM.2012.36","url":null,"abstract":"Local binary/ternary patterns are widely employed to describe the structure of an image region. However, local patterns are very sensitive to noise due to the thresholding process. In this paper, we propose two different approaches to incorporate fuzziness in extended local ternary patterns (ELTP) to enhance the robustness of this class of operator to interferences. The first approach replaces the ternary mapping mechanism with fuzzy member functions to arrive at a fuzzy ELTP representation. The second approach modifies the clustering operation in formulating ELTP to a fuzzy C-means procedure to construct soft histograms in the final feature representation, denoted as FCM-ELTP. Both fuzzy descriptors have proven to exhibit better resistance to noise in the experiments designed to compare the performance of ELTP and the newly proposed fuzzy ELTP and FCM-ELTP.","PeriodicalId":282528,"journal":{"name":"2012 IEEE International Symposium on Multimedia","volume":"12 4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124377070","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Color-Weakness Compensation Using Riemann Normal Coordinates","authors":"S. Oshima, Rika Mochizuki, R. Lenz, J. Chao","doi":"10.1109/ISM.2012.42","DOIUrl":"https://doi.org/10.1109/ISM.2012.42","url":null,"abstract":"We introduce normal coordinates in Riemann spaces as a tool to construct color-weak compensation methods. We use them to compute color stimuli for a color weak observers that result in the same color perception as the original image presented to a color normal observer in the sense that perceived color-differences are identical for both. The compensation is obtained through a color-difference-preserving map, i.e. an isometry between the 3D color spaces of a color-normal and any given color-weak observer. This approach uses discrimination threshold data and is free from approximation errors due to local linearization. The performance is evaluated with the help of semantic differential (SD) tests.","PeriodicalId":282528,"journal":{"name":"2012 IEEE International Symposium on Multimedia","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124128038","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}