{"title":"Location grounding in multimodal local search","authors":"Patrick Ehlen, Michael Johnston","doi":"10.1145/1891903.1891945","DOIUrl":null,"url":null,"abstract":"Computational models of dialog context have often focused on unimodal spoken dialog or text, using the language itself as the primary locus of contextual information. But as we move from spoken interaction to situated multimodal interaction on mobile platforms supporting a combination of spoken dialog with graphical interaction, touch-screen input, geolocation, and other non-linguistic contextual factors, we will need more sophisticated models of context that capture the influence of these factors on semantic interpretation and dialog flow. Here we focus on how users establish the location they deem salient from the multimodal context by grounding it through interactions with a map-based query system. While many existing systems rely on geolocation to establish the location context of a query, we hypothesize that this approach often ignores the grounding actions users make, and provide an analysis of log data from one such system that reveals errors that arise from that faulty treatment of grounding. We then explore and evaluate, using live field data from a deployed multimodal search system, several different context classification techniques that attempt to learn the location contexts users make salient by grounding them through their multimodal actions.","PeriodicalId":181145,"journal":{"name":"ICMI-MLMI '10","volume":"84 4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ICMI-MLMI '10","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1891903.1891945","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 12
Abstract
Computational models of dialog context have often focused on unimodal spoken dialog or text, using the language itself as the primary locus of contextual information. But as we move from spoken interaction to situated multimodal interaction on mobile platforms supporting a combination of spoken dialog with graphical interaction, touch-screen input, geolocation, and other non-linguistic contextual factors, we will need more sophisticated models of context that capture the influence of these factors on semantic interpretation and dialog flow. Here we focus on how users establish the location they deem salient from the multimodal context by grounding it through interactions with a map-based query system. While many existing systems rely on geolocation to establish the location context of a query, we hypothesize that this approach often ignores the grounding actions users make, and provide an analysis of log data from one such system that reveals errors that arise from that faulty treatment of grounding. We then explore and evaluate, using live field data from a deployed multimodal search system, several different context classification techniques that attempt to learn the location contexts users make salient by grounding them through their multimodal actions.