{"title":"Naturally conveyed explanations of device behavior","authors":"Michael Oltmans, Randall Davis","doi":"10.1145/971478.971498","DOIUrl":"https://doi.org/10.1145/971478.971498","url":null,"abstract":"Designers routinely explain their designs to one another using sketches and verbal descriptions of behavior, both of which can be understood long before the device has been fully specified. But current design tools fail almost completely to support this sort of interaction, instead not only forcing designers to specify details of the design, but typically requiring that they do so by navigating a forest of menus and dialog boxes, rather than directly describing the behaviors with sketches and verbal explanations. We have created a prototype system, called assistance, capable of interpreting multimodal explanations for simple 2-D kinematic devices. The program generates a model of the events and the causal relationships between events that have been described via hand drawn sketches, sketched annotations, and verbal descriptions. Our goal is to make the designer's interaction with the computer more like interacting with another designer. This requires the ability not only to understand physical devices but also to understand the means by which the explanations of these devices are conveyed.","PeriodicalId":416822,"journal":{"name":"Workshop on Perceptive User Interfaces","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121526965","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Robust finger tracking for wearable computer interfacing","authors":"S. M. Dominguez, T. Keaton, A. H. Sayed","doi":"10.1145/971478.971516","DOIUrl":"https://doi.org/10.1145/971478.971516","url":null,"abstract":"Key to the design of human-machine gesture interface applications is the ability of the machine to quickly and efficiently identify and track the hand movements of its user. In a wearable computer system equipped with head-mounted cameras, this task is extremely difficult due to the uncertain camera motion caused by the user's head movement, the user standing still then randomly walking, and the user's hand or pointing finger abruptly changing directions at variable speeds. This paper presents a tracking methodology based on a robust state-space estimation algorithm, which attempts to control the influence of uncertain environment conditions on the system's performance by adapting the tracking model to compensate for the uncertainties inherent in the data. Our system tracks a user's pointing gesture from a single head mounted camera, to allow the user to encircle an object of interest, thereby coarsely segmenting the object. The snapshot of the object is then passed to a recognition engine for identification, and retrieval of any pre-stored information regarding the object. A comparison of our robust tracker against a plain Kalman tracker showed a 15% improvement in the estimated position error, and exhibited a faster response time.","PeriodicalId":416822,"journal":{"name":"Workshop on Perceptive User Interfaces","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128871354","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Towards reliable multimodal sensing in aware environments","authors":"Scott Stillman, Irfan Essa","doi":"10.1145/971478.971499","DOIUrl":"https://doi.org/10.1145/971478.971499","url":null,"abstract":"A prototype system for implementing a reliable sensor network for large scale smart environments is presented. Most applications within any form of smart environments (rooms, offices, homes, etc.) are dependent on reliable who, where, when, and what information of its inhabitants (users). This information can be inferred from different sensors spread throughout the space. However, isolated sensing technologies provide limited information under the varying, dynamic, and long-term scenarios (24/7), that are inherent in applications for intelligent environments. In this paper, we present a prototype system that provides an infrastructure for leveraging the strengths of different sensors and processes used for the interpretation of their collective data. We describe the needs of such systems, propose an architecture to dealwith such multi-modal fusion, and discuss the initial set of sensors and processes used to address such needs.","PeriodicalId":416822,"journal":{"name":"Workshop on Perceptive User Interfaces","volume":"295 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127619060","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multimodal optimizations: can legacy systems defeat them?","authors":"J. Harper, D. Sweeney","doi":"10.1145/971478.971493","DOIUrl":"https://doi.org/10.1145/971478.971493","url":null,"abstract":"This paper describes several results obtained during the implementation and evaluation of a speech complemented interface to a vehicle monitoring system. A speech complemented interface is one wherein the operations at the interface (keyboard and mouse, for instance) are complemented by operator speech not directly processed by the computer. Such systems from an interface perspective have 'low brow' multimodal characteristics. Typical domains include vehicle tracking applications (taxis, buses, freight) where operators frequently use speech to confirm displayed vehicle properties with a driver.","PeriodicalId":416822,"journal":{"name":"Workshop on Perceptive User Interfaces","volume":"356 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116240542","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
G. Bradski, V. Eruhimov, Sergey Molinov, V. Mosyagin, Vadim Pisarevsky
{"title":"A video joystick from a toy","authors":"G. Bradski, V. Eruhimov, Sergey Molinov, V. Mosyagin, Vadim Pisarevsky","doi":"10.1145/971478.971518","DOIUrl":"https://doi.org/10.1145/971478.971518","url":null,"abstract":"The paper describes an algorithm for 3D reconstruction of a toy composed from rigid bright colored blocks with the help of a conventional video camera. The blocks are segmented using histogram thresholds and merged into one connected component corresponding to the whole toy. We also present the algorithm for extracting the color structure and matching feature points across the frames and discuss robust structure from motion and recognition connected with the subject.","PeriodicalId":416822,"journal":{"name":"Workshop on Perceptive User Interfaces","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128469631","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Crosby, B. Auernheimer, C. Aschwanden, C. Ikehara
{"title":"Physiological data feedback for application in distance education","authors":"M. Crosby, B. Auernheimer, C. Aschwanden, C. Ikehara","doi":"10.1145/971478.971496","DOIUrl":"https://doi.org/10.1145/971478.971496","url":null,"abstract":"This paper describes initial experiments collecting physiological data from subjects performing computer tasks. A prototype realtime Emotion Mouse collected skin temperature, galvanic skin response (GSR), and heartbeat data. Possible applications to distance education, and a second-generation system are discussed.","PeriodicalId":416822,"journal":{"name":"Workshop on Perceptive User Interfaces","volume":"10 12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124195359","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
K. Wilson, N. Checka, D. Demirdjian, Trevor Darrell
{"title":"Audio-video array source separation for perceptual user interfaces","authors":"K. Wilson, N. Checka, D. Demirdjian, Trevor Darrell","doi":"10.1145/971478.971500","DOIUrl":"https://doi.org/10.1145/971478.971500","url":null,"abstract":"Steerable microphone arrays provide a flexible infrastructure for audio source separation. In order for them to be used effectively in perceptual user interfaces, there must be a mechanism in place for steering the focus of the array to the sound source. Audio-only steering techniques often perform poorly in the presence of multiple sound sources or strong reverberation. Video-only techniques can achieve high spatial precision but require that the audio and video subsystems be accurately calibrated to preserve this precision. We present an audio-video localization technique that combines the benefits of the two modalities. We implement our technique in a test environment containing multiple stereo cameras and a room-sized microphone array. Our technique achieves an 8.9 dB improvement over a single far-field microphone and a 6.7 dB improvement over source separation based on video-only localization.","PeriodicalId":416822,"journal":{"name":"Workshop on Perceptive User Interfaces","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134565062","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Signal level fusion for multimodal perceptual user interface","authors":"John W. Fisher III, Trevor Darrell","doi":"10.1145/971478.971482","DOIUrl":"https://doi.org/10.1145/971478.971482","url":null,"abstract":"Multi-modal fusion is an important, yet challenging task for perceptual user interfaces. Humans routinely perform complex and simple tasks in which ambiguous auditory and visual data are combined in order to support accurate perception. By contrast, automated approaches for processing multi-modal data sources lag far behind. This is primarily due to the fact that few methods adequately model the complexity of the audio/visual relationship. We present an information theoretic approach for fusion of multiple modalities. Furthermore we discuss a statistical model for which our approach to fusion is justified. We present empirical results demonstrating audio-video localization and consistency measurement. We show examples determining where a speaker is within a scene, and whether they are producing the specified audio stream.","PeriodicalId":416822,"journal":{"name":"Workshop on Perceptive User Interfaces","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123483127","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
B. Braathen, M. Bartlett, G. Littlewort, J. Movellan
{"title":"First steps towards automatic recognition of spontaneous facial action units","authors":"B. Braathen, M. Bartlett, G. Littlewort, J. Movellan","doi":"10.1145/971478.971515","DOIUrl":"https://doi.org/10.1145/971478.971515","url":null,"abstract":"We present ongoing work on a project for automatic recognition of spontaneous facial actions (FACs). Current methods for automatic facial expression recognition assume images are collected in controlled environments in which the subjects deliberately face the camera. Since people often nod or turn their heads, automatic recognition of spontaneous facial behavior requires methods for handling out-of-image-plane head rotations. There are many promising approaches to address the problem of out-of-image plane rotations. In this paper we explore an approach based on 3-D warping of images into canonical views. Since our goal is to explore the potential of this approach, we first tried with images with 8 hand-labeled facial landmarks. However the approach can be generalized in a straight-forward manner to work automatically based on the output of automatic feature detectors. A front-end system was developed that jointly estimates camera parameters, head geometry and 3-D head pose across entire sequences of video images. Head geometry and image parameters were assumed constant across images and 3-D head pose is allowed to vary. First a a small set of images was used to estimate camera parameters and 3D face geometry. Markov chain Monte-Carlo methods were then used to recover the most-likely sequence of 3D poses given a sequence of video images. Once the 3D pose was known, we warped each image into frontal views with a canonical face geometry. We evaluate the performance of the approach as a front-end for an spontaneous expression recognition task.","PeriodicalId":416822,"journal":{"name":"Workshop on Perceptive User Interfaces","volume":"490 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116692961","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A perceptual user interface for recognizing head gesture acknowledgements","authors":"James W. Davis, Serge Vaks","doi":"10.1145/971478.971504","DOIUrl":"https://doi.org/10.1145/971478.971504","url":null,"abstract":"We present the design and implementation of a perceptual user interface for a responsive dialog-box agent that employs real-time computer vision to recognize user acknowledgements from head gestures (e.g., nod = yes). IBM Pupil-Cam technology together with anthropometric head and face measures are used to first detect the location of the user's face. Salient facial features are then identi ed and tracked to compute the global 2-D motion direction of the head. For recognition, timings of natural gesture motion are incorporated into a state-space model. The interface is presented in the context of an enhanced text editor employing a perceptual dialog-box agent.","PeriodicalId":416822,"journal":{"name":"Workshop on Perceptive User Interfaces","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126659586","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}