Cheng-Hsin Hsu, Hua-Jun Hong, Tarek Elgamal, K. Nahrstedt, N. Venkatasubramanian
{"title":"Multimedia fog computing: minions in the cloud and crowd","authors":"Cheng-Hsin Hsu, Hua-Jun Hong, Tarek Elgamal, K. Nahrstedt, N. Venkatasubramanian","doi":"10.1145/3122865.3122876","DOIUrl":"https://doi.org/10.1145/3122865.3122876","url":null,"abstract":"In cloud computing, minions refer to virtual or physical machines that carry out the actual workload. Minions in the cloud hide in faraway data centers and thus cloud computing is less friendly to multimedia applications. The fog computing paradigm pushes minions toward edge networks. We adopt a generalized definition, where minions get into end devices owned by the crowd. The serious uncertainty, such as dynamic network conditions, limited battery levels, and unpredictable minion availability in multimedia fog platforms makes them harder to be managed than cloud platforms. In this chapter, we share our experience on utilizing resources from the crowd to optimize multimedia applications. The learned lessons shed some light on the optimal design of a unified multimedia fog platform for distributed multimedia applications.","PeriodicalId":408764,"journal":{"name":"Frontiers of Multimedia Research","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130483581","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
G. Friedland, P. Smaragdis, Josh H. McDermott, B. Raj
{"title":"Audition for multimedia computing","authors":"G. Friedland, P. Smaragdis, Josh H. McDermott, B. Raj","doi":"10.1145/3122865.3122868","DOIUrl":"https://doi.org/10.1145/3122865.3122868","url":null,"abstract":"What do the fields of robotics, human-computer interaction, AI, video retrieval, privacy, cybersecurity, Internet of Things, and big data all have in common? They all work with various sources of data: visual, textual, time stamps, links, records. But there is one source of data that has been almost completely ignored by the academic community---sound. \u0000 \u0000Our comprehension of the world relies critically on audition---the ability to perceive and interpret the sounds we hear. Sound is ubiquitous, and is a unique source of information about our environment and the events occurring in it. Just by listening, we can determine whether our child's laughter originated inside or outside our house, how far away they were when they laughed, and whether the window through which the sound passed was open or shut. The ability to derive information about the world from sound is a core aspect of perceptual intelligence. \u0000 \u0000Auditory inferences are often complex and sophisticated despite their routine occurrence. The number of possible inferences is typically not enumerable, and the final interpretation is not merely one of selection from a fixed set. And yet humans perform such inferences effortlessly, based only on sounds captured using two sensors, our ears. \u0000 \u0000Electronic devices can also \"perceive\" sound. Every phone and tablet has at least one microphone, as do most cameras. Any device or space can be equipped with microphones at minimal expense. Indeed, machines can not only \"listen\"; they have potential advantages over humans as listening devices, in that they can communicate and coordinate their experiences in ways that biological systems simply cannot. Collections of devices that can sense sound and communicate with each other could instantiate a single electronic entity that far surpasses humans in its ability to record and process information from sound. \u0000 \u0000And yet machines at present cannot truly hear. Apart from well-developed efforts to recover structure in speech and music, the state of the art in machine hearing is limited to relatively impoverished descriptions of recorded sounds: detecting occurrences of a limited pre-specified set of sound types, and their locations. Although researchers typically envision artificially intelligent agents such as robots to have human-like hearing abilities, at present the rich descriptions and inferences humans can make about sound are entirely beyond the capability of machine systems. \u0000 \u0000In this chapter, we suggest establishing the field of Computer Audition to develop the theory behind artificial systems that extract information from sound. Our objective is to enable computer systems to replicate and exceed human abilities. This chapter describes the challenges of this field.","PeriodicalId":408764,"journal":{"name":"Frontiers of Multimedia Research","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123744981","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Efficient similarity search","authors":"H. Jégou","doi":"10.1145/3122865.3122871","DOIUrl":"https://doi.org/10.1145/3122865.3122871","url":null,"abstract":"This chapter addresses one of the fundamental problems involved in multimedia systems, namely efficient similarity search for large collections of multimedia content. This problem has received a lot of attention from various research communities. In particular, it is a historical line of research in computational geometry and databases. The computer vision and multimedia communities have adopted pragmatic approaches guided by practical requirements: the large sets of features required to describe image collections make visual search a highly demanding task. As a result, early works [Flickner et al. 1995, Fagin 1998, Beis and Lowe 1997] in image indexing have foreseen the interest in approximate algorithms, especially after the dissemination of methods based on local description in the 90s, as any improvement obtained on this indexing part improves the whole visual search system. \u0000 \u0000Among the existing approximate nearest neighbors (ANN) strategies, the popular framework of Locality-Sensitive Hashing (LSH) [Indyk and Motwani 1998, Gionis et al. 1999] provides theoretical guarantees on the search quality with limited assumptions on the underlying data distribution. It was first proposed [Indyk and Motwani 1998] for the Hamming and l1 spaces, and was later extended to the Euclidean/ cosine cases [Charikar 2002, Datar et al. 2004] or the earth mover's distance [Charikar 2002, Andoni and Indyk 2006]. LSH has been successfully used for local descriptors [Ke et al. 2004], 3D object indexing [Matei et al. 2006, Shakhnarovich et al. 2006], and other fields such as audio retrieval [Casey and Slaney 2007, Ryynanen and Klapuri 2008]. It has also received some attention in a context of private information retrieval [Pathak and Raj 2012, Aghasaryan et al. 2013, Furon et al. 2013]. \u0000 \u0000A few years ago, approaches inspired by compression and more specifically quantization-based approaches [Jǵou et al. 2011] were shown to be a viable alternative to hashing methods, and shown successful for efficiently searching in a billion-sized dataset. \u0000 \u0000This chapter discusses these different trends. It is organized as follows. Section 5.1 gives some background references and concepts, including evaluation issues. Most of the methods and variants are exposed within the LSH framework. It is worth mentioning that LSH is more of a concept than a particular algorithm. The search algorithms associated with LSH follow two distinct search mechanisms, the probe-cell model and sketches, which are discussed in Sections 5.2 and 5.3, respectively. Section 5.4 describes methods inspired by compression algorithms, while Section 5.5 discusses hybrid approaches combining the non-exhaustiveness of the cell-probe model with the advantages of sketches or compression-based algorithms. Other metrics than Euclidean and cosine are briefly discussed in Section 5.6.","PeriodicalId":408764,"journal":{"name":"Frontiers of Multimedia Research","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114157877","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Hawkes processes for events in social media","authors":"Marian-Andrei Rizoiu, Young Lee, Swapnil Mishra","doi":"10.1145/3122865.3122874","DOIUrl":"https://doi.org/10.1145/3122865.3122874","url":null,"abstract":"This chapter provides an accessible introduction for point processes, and especially Hawkes processes, for modeling discrete, inter-dependent events over continuous time. We start by reviewing the definitions and key concepts in point processes. We then introduce the Hawkes process and its event intensity function, as well as schemes for event simulation and parameter estimation. We also describe a practical example drawn from social media data---we show how to model retweet cascades using a Hawkes self-exciting process.We present a design of the memory kernel, and results on estimating parameters and predicting popularity. The code and sample event data are available in an online repository.","PeriodicalId":408764,"journal":{"name":"Frontiers of Multimedia Research","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116493447","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Situation recognition using multimodal data","authors":"Vivek K. Singh","doi":"10.1145/3122865.3122873","DOIUrl":"https://doi.org/10.1145/3122865.3122873","url":null,"abstract":"","PeriodicalId":408764,"journal":{"name":"Frontiers of Multimedia Research","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130320065","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kuan-Ta Chen, Wei Cai, R. Shea, Chun-Ying Huang, Jiangchuan Liu, Victor C. M. Leung, Cheng-Hsin Hsu
{"title":"Cloud gaming","authors":"Kuan-Ta Chen, Wei Cai, R. Shea, Chun-Ying Huang, Jiangchuan Liu, Victor C. M. Leung, Cheng-Hsin Hsu","doi":"10.1145/3122865.3122877","DOIUrl":"https://doi.org/10.1145/3122865.3122877","url":null,"abstract":"","PeriodicalId":408764,"journal":{"name":"Frontiers of Multimedia Research","volume":"114 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128808785","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multimodal analysis of free-standing conversational groups","authors":"Xavier Alameda-Pineda, E. Ricci, N. Sebe","doi":"10.1145/3122865.3122869","DOIUrl":"https://doi.org/10.1145/3122865.3122869","url":null,"abstract":"\"Free-standing conversational groups\" are what we call the elementary building blocks of social interactions formed in settings when people are standing and congregate in groups. The automatic detection, analysis, and tracking of such structural conversational units captured on camera poses many interesting challenges for the research community. First, although delineating these formations is strongly linked to other behavioral cues such as head and body poses, finding methods that successfully describe and exploit these links is not obvious. Second, the use of visual data is crucial, but when analyzing crowded scenes, one must account for occlusions and low-resolution images. In this regard, the use of other sensing technologies such as wearable devices can facilitate the analysis of social interactions by complementing the visual information. Yet the exploitation of multiple modalities poses other challenges in terms of data synchronization, calibration, and fusion. In this chapter, we discuss recent advances in multimodal social scene analysis, in particular for the detection of conversational groups or F-formations [Kendon 1990]. More precisely, a multimodal joint head and body pose estimator is described and compared to other recent approaches for head and body pose estimation and F-formation detection. Experimental results on the recently published SALSA dataset are reported, they evidence the long road toward a fully automated high-precision social scene analysis framework.","PeriodicalId":408764,"journal":{"name":"Frontiers of Multimedia Research","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127149919","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Utilizing implicit user cues for multimedia analytics","authors":"Subramanian Ramanathan, S. O. Gilani, N. Sebe","doi":"10.1145/3122865.3122875","DOIUrl":"https://doi.org/10.1145/3122865.3122875","url":null,"abstract":"","PeriodicalId":408764,"journal":{"name":"Frontiers of Multimedia Research","volume":"153 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127271110","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}