Frontiers of Multimedia Research最新文献

Multimedia fog computing: minions in the cloud and crowd 多媒体雾计算:云和人群中的仆从

Frontiers of Multimedia Research Pub Date : 2017-12-19 DOI: 10.1145/3122865.3122876

Cheng-Hsin Hsu, Hua-Jun Hong, Tarek Elgamal, K. Nahrstedt, N. Venkatasubramanian

引用次数: 4

Audition for multimedia computing 多媒体计算试听

Frontiers of Multimedia Research Pub Date : 2017-12-19 DOI: 10.1145/3122865.3122868

G. Friedland, P. Smaragdis, Josh H. McDermott, B. Raj

{"title":"Audition for multimedia computing","authors":"G. Friedland, P. Smaragdis, Josh H. McDermott, B. Raj","doi":"10.1145/3122865.3122868","DOIUrl":"https://doi.org/10.1145/3122865.3122868","url":null,"abstract":"What do the fields of robotics, human-computer interaction, AI, video retrieval, privacy, cybersecurity, Internet of Things, and big data all have in common? They all work with various sources of data: visual, textual, time stamps, links, records. But there is one source of data that has been almost completely ignored by the academic community---sound. \u0000 \u0000Our comprehension of the world relies critically on audition---the ability to perceive and interpret the sounds we hear. Sound is ubiquitous, and is a unique source of information about our environment and the events occurring in it. Just by listening, we can determine whether our child's laughter originated inside or outside our house, how far away they were when they laughed, and whether the window through which the sound passed was open or shut. The ability to derive information about the world from sound is a core aspect of perceptual intelligence. \u0000 \u0000Auditory inferences are often complex and sophisticated despite their routine occurrence. The number of possible inferences is typically not enumerable, and the final interpretation is not merely one of selection from a fixed set. And yet humans perform such inferences effortlessly, based only on sounds captured using two sensors, our ears. \u0000 \u0000Electronic devices can also \"perceive\" sound. Every phone and tablet has at least one microphone, as do most cameras. Any device or space can be equipped with microphones at minimal expense. Indeed, machines can not only \"listen\"; they have potential advantages over humans as listening devices, in that they can communicate and coordinate their experiences in ways that biological systems simply cannot. Collections of devices that can sense sound and communicate with each other could instantiate a single electronic entity that far surpasses humans in its ability to record and process information from sound. \u0000 \u0000And yet machines at present cannot truly hear. Apart from well-developed efforts to recover structure in speech and music, the state of the art in machine hearing is limited to relatively impoverished descriptions of recorded sounds: detecting occurrences of a limited pre-specified set of sound types, and their locations. Although researchers typically envision artificially intelligent agents such as robots to have human-like hearing abilities, at present the rich descriptions and inferences humans can make about sound are entirely beyond the capability of machine systems. \u0000 \u0000In this chapter, we suggest establishing the field of Computer Audition to develop the theory behind artificial systems that extract information from sound. Our objective is to enable computer systems to replicate and exceed human abilities. This chapter describes the challenges of this field.","PeriodicalId":408764,"journal":{"name":"Frontiers of Multimedia Research","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123744981","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Efficient similarity search 高效相似度搜索

Frontiers of Multimedia Research Pub Date : 2017-12-19 DOI: 10.1145/3122865.3122871

H. Jégou

{"title":"Efficient similarity search","authors":"H. Jégou","doi":"10.1145/3122865.3122871","DOIUrl":"https://doi.org/10.1145/3122865.3122871","url":null,"abstract":"This chapter addresses one of the fundamental problems involved in multimedia systems, namely efficient similarity search for large collections of multimedia content. This problem has received a lot of attention from various research communities. In particular, it is a historical line of research in computational geometry and databases. The computer vision and multimedia communities have adopted pragmatic approaches guided by practical requirements: the large sets of features required to describe image collections make visual search a highly demanding task. As a result, early works [Flickner et al. 1995, Fagin 1998, Beis and Lowe 1997] in image indexing have foreseen the interest in approximate algorithms, especially after the dissemination of methods based on local description in the 90s, as any improvement obtained on this indexing part improves the whole visual search system. \u0000 \u0000Among the existing approximate nearest neighbors (ANN) strategies, the popular framework of Locality-Sensitive Hashing (LSH) [Indyk and Motwani 1998, Gionis et al. 1999] provides theoretical guarantees on the search quality with limited assumptions on the underlying data distribution. It was first proposed [Indyk and Motwani 1998] for the Hamming and l1 spaces, and was later extended to the Euclidean/ cosine cases [Charikar 2002, Datar et al. 2004] or the earth mover's distance [Charikar 2002, Andoni and Indyk 2006]. LSH has been successfully used for local descriptors [Ke et al. 2004], 3D object indexing [Matei et al. 2006, Shakhnarovich et al. 2006], and other fields such as audio retrieval [Casey and Slaney 2007, Ryynanen and Klapuri 2008]. It has also received some attention in a context of private information retrieval [Pathak and Raj 2012, Aghasaryan et al. 2013, Furon et al. 2013]. \u0000 \u0000A few years ago, approaches inspired by compression and more specifically quantization-based approaches [Jǵou et al. 2011] were shown to be a viable alternative to hashing methods, and shown successful for efficiently searching in a billion-sized dataset. \u0000 \u0000This chapter discusses these different trends. It is organized as follows. Section 5.1 gives some background references and concepts, including evaluation issues. Most of the methods and variants are exposed within the LSH framework. It is worth mentioning that LSH is more of a concept than a particular algorithm. The search algorithms associated with LSH follow two distinct search mechanisms, the probe-cell model and sketches, which are discussed in Sections 5.2 and 5.3, respectively. Section 5.4 describes methods inspired by compression algorithms, while Section 5.5 discusses hybrid approaches combining the non-exhaustiveness of the cell-probe model with the advantages of sketches or compression-based algorithms. Other metrics than Euclidean and cosine are briefly discussed in Section 5.6.","PeriodicalId":408764,"journal":{"name":"Frontiers of Multimedia Research","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114157877","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13

Encrypted domain multimedia content analysis 加密域多媒体内容分析

Frontiers of Multimedia Research Pub Date : 2017-12-19 DOI: 10.1145/3122865.3122870

P. Atrey, Ankita Lathey, M. A. Yakubu

引用次数: 0

Hawkes processes for events in social media 霍克斯处理社交媒体上的事件

Frontiers of Multimedia Research Pub Date : 2017-12-19 DOI: 10.1145/3122865.3122874

Marian-Andrei Rizoiu, Young Lee, Swapnil Mishra

引用次数: 46

Situation recognition using multimodal data 使用多模态数据的情况识别

Frontiers of Multimedia Research Pub Date : 2017-12-19 DOI: 10.1145/3122865.3122873

Vivek K. Singh

引用次数: 1

Cloud gaming 云游戏

Frontiers of Multimedia Research Pub Date : 2017-12-19 DOI: 10.1145/3122865.3122877

Kuan-Ta Chen, Wei Cai, R. Shea, Chun-Ying Huang, Jiangchuan Liu, Victor C. M. Leung, Cheng-Hsin Hsu

引用次数: 2

Multimodal analysis of free-standing conversational groups 独立会话组的多模态分析

Frontiers of Multimedia Research Pub Date : 2017-12-19 DOI: 10.1145/3122865.3122869

Xavier Alameda-Pineda, E. Ricci, N. Sebe

{"title":"Multimodal analysis of free-standing conversational groups","authors":"Xavier Alameda-Pineda, E. Ricci, N. Sebe","doi":"10.1145/3122865.3122869","DOIUrl":"https://doi.org/10.1145/3122865.3122869","url":null,"abstract":"\"Free-standing conversational groups\" are what we call the elementary building blocks of social interactions formed in settings when people are standing and congregate in groups. The automatic detection, analysis, and tracking of such structural conversational units captured on camera poses many interesting challenges for the research community. First, although delineating these formations is strongly linked to other behavioral cues such as head and body poses, finding methods that successfully describe and exploit these links is not obvious. Second, the use of visual data is crucial, but when analyzing crowded scenes, one must account for occlusions and low-resolution images. In this regard, the use of other sensing technologies such as wearable devices can facilitate the analysis of social interactions by complementing the visual information. Yet the exploitation of multiple modalities poses other challenges in terms of data synchronization, calibration, and fusion. In this chapter, we discuss recent advances in multimodal social scene analysis, in particular for the detection of conversational groups or F-formations [Kendon 1990]. More precisely, a multimodal joint head and body pose estimator is described and compared to other recent approaches for head and body pose estimation and F-formation detection. Experimental results on the recently published SALSA dataset are reported, they evidence the long road toward a fully automated high-precision social scene analysis framework.","PeriodicalId":408764,"journal":{"name":"Frontiers of Multimedia Research","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127149919","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Utilizing implicit user cues for multimedia analytics 利用隐含的用户线索进行多媒体分析

Frontiers of Multimedia Research Pub Date : 1900-01-01 DOI: 10.1145/3122865.3122875

Subramanian Ramanathan, S. O. Gilani, N. Sebe

引用次数: 0