ICMI-MLMI '10Pub Date : 2010-11-08DOI: 10.1145/1891903.1891910
D. Bohus, E. Horvitz
{"title":"Facilitating multiparty dialog with gaze, gesture, and speech","authors":"D. Bohus, E. Horvitz","doi":"10.1145/1891903.1891910","DOIUrl":"https://doi.org/10.1145/1891903.1891910","url":null,"abstract":"We study how synchronized gaze, gesture and speech rendered by an embodied conversational agent can influence the flow of conversations in multiparty settings. We begin by reviewing a computational framework for turn-taking that provides the foundation for tracking and communicating intentions to hold, release, or take control of the conversational floor. We then present implementation aspects of this model in an embodied conversational agent. Empirical results with this model in a shared task setting indicate that the various verbal and non-verbal cues used by the avatar can effectively shape the multiparty conversational dynamics. In addition, we identify and discuss several context variables which impact the turn allocation process.","PeriodicalId":181145,"journal":{"name":"ICMI-MLMI '10","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131444321","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
ICMI-MLMI '10Pub Date : 2010-11-08DOI: 10.1145/1891903.1891945
Patrick Ehlen, Michael Johnston
{"title":"Location grounding in multimodal local search","authors":"Patrick Ehlen, Michael Johnston","doi":"10.1145/1891903.1891945","DOIUrl":"https://doi.org/10.1145/1891903.1891945","url":null,"abstract":"Computational models of dialog context have often focused on unimodal spoken dialog or text, using the language itself as the primary locus of contextual information. But as we move from spoken interaction to situated multimodal interaction on mobile platforms supporting a combination of spoken dialog with graphical interaction, touch-screen input, geolocation, and other non-linguistic contextual factors, we will need more sophisticated models of context that capture the influence of these factors on semantic interpretation and dialog flow. Here we focus on how users establish the location they deem salient from the multimodal context by grounding it through interactions with a map-based query system. While many existing systems rely on geolocation to establish the location context of a query, we hypothesize that this approach often ignores the grounding actions users make, and provide an analysis of log data from one such system that reveals errors that arise from that faulty treatment of grounding. We then explore and evaluate, using live field data from a deployed multimodal search system, several different context classification techniques that attempt to learn the location contexts users make salient by grounding them through their multimodal actions.","PeriodicalId":181145,"journal":{"name":"ICMI-MLMI '10","volume":"84 4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132442869","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
ICMI-MLMI '10Pub Date : 2010-11-08DOI: 10.1145/1891903.1891934
Qiong Liu, Chunyuan Liao, L. Wilcox, Anthony Dunnigan
{"title":"Embedded media barcode links: optimally blended barcode overlay on paper for linking to associated media","authors":"Qiong Liu, Chunyuan Liao, L. Wilcox, Anthony Dunnigan","doi":"10.1145/1891903.1891934","DOIUrl":"https://doi.org/10.1145/1891903.1891934","url":null,"abstract":"Embedded Media Barcode Links, or simply EMBLs, are optimally blended iconic barcode marks, printed on paper documents, that signify the existence of multimedia associated with that part of the document content (Figure 1). EMBLs are used for multimedia retrieval with a camera phone. Users take a picture of an EMBL-signified document patch using a cell phone, and the multimedia associated with the EMBL-signified document location is displayed on the phone. Unlike a traditional barcode which requires an exclusive space, the EMBL construction algorithm acts as an agent to negotiate with a barcode reader for maximum user and document benefits. Because of this negotiation, EMBLs are optimally blended with content and thus have less interference with the original document layout and can be moved closer to a media associated location. Retrieval of media associated with an EMBL is based on the barcode identification of a captured EMBL. Therefore, EMBL retains nearly all barcode identification advantages, such as accuracy, speed, and scalability. Moreover, EMBL takes advantage of users' knowledge of a traditional barcode. Unlike Embedded Media Maker (EMM) which requires underlying document features for marker identification, EMBL has no requirement for the underlying features. This paper will discuss the procedures for EMBL construction and optimization. It will also give experimental results that strongly support the EMBL construction and optimization ideas.","PeriodicalId":181145,"journal":{"name":"ICMI-MLMI '10","volume":"4 5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130286809","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
ICMI-MLMI '10Pub Date : 2010-11-01DOI: 10.1145/1891903.1891924
Ying Yin, Randall Davis
{"title":"Toward natural interaction in the real world: real-time gesture recognition","authors":"Ying Yin, Randall Davis","doi":"10.1145/1891903.1891924","DOIUrl":"https://doi.org/10.1145/1891903.1891924","url":null,"abstract":"Using a new hand tracking technology capable of tracking 3D hand postures in real-time, we developed a recognition system for continuous natural gestures. By natural gestures, we mean those encountered in spontaneous interaction, rather than a set of artificial gestures chosen to simplify recognition. To date we have achieved 95.6% accuracy on isolated gesture recognition, and 73% recognition rate on continuous gesture recognition, with data from 3 users and twelve gesture classes. We connected our gesture recognition system to Google Earth, enabling real time gestural control of a 3D map. We describe the challenges of signal accuracy and signal interpretation presented by working in a real-world environment, and detail how we overcame them.","PeriodicalId":181145,"journal":{"name":"ICMI-MLMI '10","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122710330","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}