{"title":"Fusion of Time-of-Flight and Phase Shifting for high-resolution and low-latency depth sensing","authors":"Yueyi Zhang, Zhiwei Xiong, Feng Wu","doi":"10.1109/ICME.2015.7177426","DOIUrl":"https://doi.org/10.1109/ICME.2015.7177426","url":null,"abstract":"Depth sensors based on Time-of-Flight (ToF) and Phase Shifting (PS) have complementary strengths and weaknesses. ToF can provide real-time depth but limited in resolution and sensitive to noise. PS can generate accurate and robust depth with high resolution but requires a number of patterns that leads to high latency. In this paper, we propose a novel fusion framework to take advantages of both ToF and PS. The basic idea is using the coarse depth from ToF to disambiguate the wrapped depth from PS. Specifically, we address two key technical problems: cross-modal calibration and interference-free synchronization between ToF and PS sensors. Experiments demonstrate that the proposed method generates accurate and robust depth with high resolution and low latency, which is beneficial to tremendous applications.","PeriodicalId":146271,"journal":{"name":"2015 IEEE International Conference on Multimedia and Expo (ICME)","volume":"123 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133643387","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jiangchao Yao, Ya Zhang, Zhe Xu, Jun-wei Sun, Jun Zhou, Xiao Gu
{"title":"Joint Latent Dirichlet Allocation for non-iid social tags","authors":"Jiangchao Yao, Ya Zhang, Zhe Xu, Jun-wei Sun, Jun Zhou, Xiao Gu","doi":"10.1109/ICME.2015.7177490","DOIUrl":"https://doi.org/10.1109/ICME.2015.7177490","url":null,"abstract":"Topic models have been widely used for analyzing text corpora and achieved great success in applications including content organization and information retrieval. However, different from traditional text data, social tags in the web containers are usually of small amounts, unordered, and non-iid, i.e., it is highly dependent on contextual information such as users and objects. Considering the specific characteristics of social tags, we here introduce a new model named Joint Latent Dirichlet Allocation (JLDA) to capture the relationships among users, objects, and tags. The model assumes that the latent topics of users and those of objects jointly influence the generation of tags. The latent distributions is then inferred with Gibbs sampling. Experiments on two social tag data sets have demonstrated that the model achieves a lower predictive error and generates more reasonable topics. We also present an interesting application of this model to object recommendation.","PeriodicalId":146271,"journal":{"name":"2015 IEEE International Conference on Multimedia and Expo (ICME)","volume":"91 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116939995","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Energy and area efficient hardware implementation of 4K Main-10 HEVC decoder in Ultra-HD Blu-ray player and TV systems","authors":"Tsu-Ming Liu, Yung-Chang Chang, Chih-Ming Wang, Hue-Min Lin, Chia-Yun Cheng, Chun-Chia Chen, Min-Hao Chiu, Sheng-Jen Wang, P. Chao, Meng-Jye Hu, Fu-Chun Yeh, Shun-Hsiang Chuang, Hsiu-Yi Lin, Ming-Long Wu, Che-Hong Chen, Chia-Lin Ho, Chi-Cheng Ju","doi":"10.1109/ICME.2015.7177399","DOIUrl":"https://doi.org/10.1109/ICME.2015.7177399","url":null,"abstract":"A 4K and Main-10 HEVC video decoder LSI is fabricated in a 28nm CMOS process. It adopts a block-concealed processor (BcP) to improve the visual quality and a bandwidth-suppressed processor (BsP) is newly designed to reduce 30% and 45% of external data accesses in playback and gaming scenario, respectively. It features fully core scalable (FCS) architecture which lowers the required working frequency by 65%. A 10-bit compact scheme is proposed to reduce the frame buffer space by 37.5%. Moreover, a multi-standard architecture reduces are by 28%. It achieves 530Mpixels/s throughput which is two times larger than the state-of-the-art HEVC design [2] and consumes 0.2nJ/pixel energy efficiency, enabling real-time 4K video playback in Ultra-HD Blu-ray player and TV systems.","PeriodicalId":146271,"journal":{"name":"2015 IEEE International Conference on Multimedia and Expo (ICME)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133050584","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multi-graph multi-instance learning with soft label consistency for object-based image retrieval","authors":"Fei Li, Rujie Liu","doi":"10.1109/ICME.2015.7177391","DOIUrl":"https://doi.org/10.1109/ICME.2015.7177391","url":null,"abstract":"Object-based image retrieval has been an active research topic in the last decade, in which a user is only interested in some object instead of the whole image. As a promising approach, graph-based multi-instance learning has been paid much attention. Early retrieval methods often conduct learning on one graph in either image or region level. To further improve the performance, some recent methods adopt multi-graph learning, but the relationship between image- and region-level information is not well explored. In this paper, by constructing both image- and region-level graphs, a novel multi-graph multi-instance learning method is proposed. Different from the existing methods, the relationship between each labeled image and its segmented regions is reflected by the consistency of their corresponding soft labels, and it is formulated by the mutual restrictions in an optimization framework. A comprehensive cost function is designed to involve all the available information, and an iterative solution is introduced to solve the problem. Experimental results on the benchmark data set demonstrate the effectiveness of our proposal.","PeriodicalId":146271,"journal":{"name":"2015 IEEE International Conference on Multimedia and Expo (ICME)","volume":"285 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122973803","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Instructive video retrieval for surgical skill coaching using attribute learning","authors":"Lin Chen, Qiang Zhang, Peng Zhang, Baoxin Li","doi":"10.1109/ICME.2015.7177389","DOIUrl":"https://doi.org/10.1109/ICME.2015.7177389","url":null,"abstract":"Video-based coaching systems have seen increasing adoption in various applications including dance, sports, and surgery training. Most existing systems are either passive (for data capture only) or barely active (with limited automated feedback to a trainee). In this paper, we present a video-based skill coaching system for simulation-based surgical training by exploring a newly proposed problem of instructive video retrieval. By introducing attribute learning into video for high-level skill understanding, we aim at providing automated feedback and providing an instructive video, to which the trainees can refer for performance improvement. This is achieved by ensuring the feedback is weakness-specific, skill-superior and content-similar. A suite of techniques was integrated to build the coaching system with these features. In particular, algorithms were developed for action segmentation, video attribute learning, and attribute-based video retrieval. Experiments with realistic surgical videos demonstrate the feasibility of the proposed method and suggest areas for further improvement.","PeriodicalId":146271,"journal":{"name":"2015 IEEE International Conference on Multimedia and Expo (ICME)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133160300","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Image retrieval based on compressed camera sensor fingerprints","authors":"D. Valsesia, G. Coluccia, T. Bianchi, E. Magli","doi":"10.1109/ICME.2015.7177454","DOIUrl":"https://doi.org/10.1109/ICME.2015.7177454","url":null,"abstract":"Image retrieval is the process of finding images from a large collection, satisfying a user-specified criterion. Content-based retrieval has been the traditional paradigm, in which one wishes to find images whose content is similar to a query. In this paper we explore a novel criterion for image search, based on forensic principles. We address the problem of retrieving all the photos in a collection that have been acquired by a specific device which is presented to the system as a query. This is an important forensic problem, whose solution could be very useful for detecting improper usage of pictures. We do not rely on metadata such as Exif headers because they can be unavailable, or easily manipulated, and in most cases cannot identify the specific device. We rely instead on a forensic tool called Photo Response Non-Uniformity (PRNU), which constitutes a reliable fingerprint of a camera sensor. We examine recent advances in compression of such fingerprints, which allow to address the previously unexplored image retrieval problem on large scales.","PeriodicalId":146271,"journal":{"name":"2015 IEEE International Conference on Multimedia and Expo (ICME)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128269022","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Probabilistic learning from mislabelled data for multimedia content recognition","authors":"Pravin Kakar, A. Chia","doi":"10.1109/ICME.2015.7177393","DOIUrl":"https://doi.org/10.1109/ICME.2015.7177393","url":null,"abstract":"There have been considerable advances in multimedia recognition recently as powerful computing capabilities and large, representative datasets become ubiquitous. A fundamental assumption of traditional recognition techniques is that the data available for training are accurately labelled. Given the scale and diversity of web data, it takes considerable annotation effort to reduce label noise to acceptable levels. In this work, we propose a novel method to work around this issue by utilizing approximate apriori estimates of the mislabelling probabilities to design a noise-aware learning framework. We demonstrate the proposed framework's effectiveness on several datasets of various modalities and show that it is able to achieve high levels of accuracy even when faced with significant mislabelling in the data.","PeriodicalId":146271,"journal":{"name":"2015 IEEE International Conference on Multimedia and Expo (ICME)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126059344","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Visualizing video sounds with sound word animation","authors":"Fangzhou Wang, H. Nagano, K. Kashino, T. Igarashi","doi":"10.1109/ICME.2015.7177422","DOIUrl":"https://doi.org/10.1109/ICME.2015.7177422","url":null,"abstract":"Text captions are important means to provide sound information in videos when the sound is not accessible. However, conventional text captions are far less expressive for non-verbal sounds since they are designed to visualize speech sound. To address this problem, we propose a method for automatically transforming non-verbal video sounds to animated sound words, and positioning them near the sound source objects in the video for visualization. This provides natural visual representation of non-verbal sounds with rich information about the sound category and dynamics. We conducted a user study with over 300 participants using an online crowdsourcing service. The results showed that animated sound words could not only effectively and naturally visualize the dynamics of sound while clarify the position of the sound source, but also contribute to making video watching more enjoyable and increasing the visual impact of the video.","PeriodicalId":146271,"journal":{"name":"2015 IEEE International Conference on Multimedia and Expo (ICME)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117039964","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alessio Degani, M. Dalai, R. Leonardi, P. Migliorati
{"title":"Harmonic Change Detection for musical chords segmentation","authors":"Alessio Degani, M. Dalai, R. Leonardi, P. Migliorati","doi":"10.1109/ICME.2015.7177404","DOIUrl":"https://doi.org/10.1109/ICME.2015.7177404","url":null,"abstract":"In this paper, different strategies for the calculation of the Harte's Harmonic Change Detection Function (HCDF) are discussed. HCDFs can be used for detecting chord boundaries for Automatic Chord Estimation (ACE) tasks, where the chord transitions are identified as peaks in the HCDF. We show that different audio features and different novelty metric have significant impact on the overall accuracy results of a chord segmentation algorithm. Furthermore, we show that certain combination of audio features and novelty measures provide a significant improvement with respect to the current chord segmentation algorithms.","PeriodicalId":146271,"journal":{"name":"2015 IEEE International Conference on Multimedia and Expo (ICME)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132644180","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Robust interactive image segmentation with weak supervision for mobile touch screen devices","authors":"T. Wang, Huiling Wang, Lixin Fan","doi":"10.1109/ICME.2015.7177395","DOIUrl":"https://doi.org/10.1109/ICME.2015.7177395","url":null,"abstract":"In this paper, we present a robust and efficient approach for segmenting images with less and intuitive user interaction, particularly targeted for mobile touch screen devices. Our approach combines geodesic distance information with the flexibility of level set methods in energy minimization, leveraging the complementary strengths of each to promote accurate boundary placement and strong region connectivity while requiring less user interaction. To maximize the user-provided prior knowledge, we further propose a weakly supervised seed generation algorithm which enables image object segmentation without user-provided background seeds. Our approach provides a practical solution for visual object cutout on mobile touch screen devices, facilitating various media manipulation applications. We describe such a use case to selectively create oil painting effects on images. We demonstrate that our approach is less sensitive to seed placement and better at edge localization, whilst requiring less user interaction, compared with the state-of-the-art methods.","PeriodicalId":146271,"journal":{"name":"2015 IEEE International Conference on Multimedia and Expo (ICME)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131593468","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}