{"title":"A tracking framework for collaborative human computer interaction","authors":"E. Polat, M. Yeasin, Rajeev Sharma","doi":"10.1109/ICMI.2002.1166964","DOIUrl":"https://doi.org/10.1109/ICMI.2002.1166964","url":null,"abstract":"The ability to track many people and their body parts (i.e., face and hands) in a complex environment is crucial for designing collaborative natural human computer interaction (HCI). A challenging issue in tracking body parts is the data association uncertainty while assigning measurements to the proper tracks in the case of occlusion and close interaction of body parts of different people. This paper describes a framework for tracking body parts of people in 2D/3D using a multiple hypothesis tracking (MHT) algorithm. A path coherence function has been incorporated along with MHT to reduce the negative effects of closely spaced measurements that produce unconvincing tracks and unnecessary computations. The performance of the framework has been validated using experiments on a real sequence of images.","PeriodicalId":208377,"journal":{"name":"Proceedings. Fourth IEEE International Conference on Multimodal Interfaces","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130155643","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Using TouchPad pressure to detect negative affect","authors":"Helena M. Mentis, Geri Gay","doi":"10.1109/ICMI.2002.1167029","DOIUrl":"https://doi.org/10.1109/ICMI.2002.1167029","url":null,"abstract":"Humans naturally use behavioral cues in their interactions with other humans. The Media Equation proposes that these same cues are directed towards media, including computers. It is probable that detection of these cues by a computer during run-time could improve usability design and analysis. A preliminary experiment testing one of these cues, Synaptics TouchPad pressure, shows that behavioral cues can be used as a critical incident indicator by detecting negative affect.","PeriodicalId":208377,"journal":{"name":"Proceedings. Fourth IEEE International Conference on Multimodal Interfaces","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115834373","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
P. Cosi, E. M. Caldognetto, Giulio Perin, C. Zmarich
{"title":"Labial coarticulation modeling for realistic facial animation","authors":"P. Cosi, E. M. Caldognetto, Giulio Perin, C. Zmarich","doi":"10.1109/ICMI.2002.1167047","DOIUrl":"https://doi.org/10.1109/ICMI.2002.1167047","url":null,"abstract":"A modified version of the coarticulation model proposed by Cohen and Massaro (1993) is described. A semi-automatic minimization technique, working on real cinematic data, acquired by the ELITE opto-electronic system, was used to train the dynamic characteristics of the model. Finally, the model was applied with success to GRETA, an Italian talking head, and examples are illustrated to show the naturalness of the resulting animation technique.","PeriodicalId":208377,"journal":{"name":"Proceedings. Fourth IEEE International Conference on Multimodal Interfaces","volume":"212 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124176901","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
R. Isotani, Kiyoshi Yamabana, S. Ando, Ken Hanazawa, S. Ishikawa, T. Emori, K. Iso, H. Hattori, Akitoshi Okumura, Takao Watanabe
{"title":"An automatic speech translation system on PDAs for travel conversation","authors":"R. Isotani, Kiyoshi Yamabana, S. Ando, Ken Hanazawa, S. Ishikawa, T. Emori, K. Iso, H. Hattori, Akitoshi Okumura, Takao Watanabe","doi":"10.1109/ICMI.2002.1166995","DOIUrl":"https://doi.org/10.1109/ICMI.2002.1166995","url":null,"abstract":"We present an automatic speech-to-speech translation system for personal digital assistants (PDAs) that helps oral communication between Japanese and English speakers in various situations while traveling. Our original compact large vocabulary continuous speech recognition engine, compact translation engine based on a lexicalized grammar, and compact Japanese speech synthesis engine lead to the development of a Japanese/English bi-directional speech translation system that works with limited computational resources.","PeriodicalId":208377,"journal":{"name":"Proceedings. Fourth IEEE International Conference on Multimodal Interfaces","volume":"258 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124232970","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A PDA-based sign translator","authors":"Jing Zhang, Xilin Chen, Jie Yang, A. Waibel","doi":"10.1109/ICMI.2002.1166996","DOIUrl":"https://doi.org/10.1109/ICMI.2002.1166996","url":null,"abstract":"We propose an effective approach for a PDA-based sign system and present the sign translator. Its main functions include three parts: detection, recognition and translation. Automatic detection and recognition of text in natural scenes is a prerequisite for the automatic sign translator. In order to make the system robust for text detection in various natural scenes, the detection approach efficiently embeds multi-resolution, adaptive search in a hierarchical framework with different emphases at each layer. We also introduce an intensity-based OCR method to recognize characters in various fonts and lighting conditions, where we employ the Gabor transform to obtain local features, and LDA for selection and classification of features. The recognition rate is 92.4% for the testing set obtained from the natural sign. A sign is different from the normal used sentence. It is brief with a lot of abbreviations and place nouns. We only briefly introduce a rule-based place name translation. We have integrated all these functions in a PDA, which can capture sign images, auto segment and recognize the Chinese sign, and translate it into English.","PeriodicalId":208377,"journal":{"name":"Proceedings. Fourth IEEE International Conference on Multimodal Interfaces","volume":"140 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128477927","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. L. Hernandez-Rebollar, R. Lindeman, N. Kyriakopoulos
{"title":"A multi-class pattern recognition system for practical finger spelling translation","authors":"J. L. Hernandez-Rebollar, R. Lindeman, N. Kyriakopoulos","doi":"10.1109/ICMI.2002.1166990","DOIUrl":"https://doi.org/10.1109/ICMI.2002.1166990","url":null,"abstract":"The paper presents a portable system and method for recognizing the 26 hand shapes of the American Sign Language alphabet, using a novel glove-like device. Two additional signs, 'space', and 'enter' are added to the alphabet to allow the user to form words or phrases and send them to a speech synthesizer. Since the hand shape for a letter varies from one signer to another, this is a 28-class pattern recognition system. A three-level hierarchical classifier divides the problem into \"dispatchers\" and \"recognizers.\" After reducing pattern dimension from ten to three, the projection of class distributions onto horizontal planes makes it possible to apply simple linear discrimination in 2D, and Bayes' Rule in those cases where classes had features with overlapped distributions. Twenty-one out of 26 letters were recognized with 100% accuracy; the worst case, letter U, achieved 78%.","PeriodicalId":208377,"journal":{"name":"Proceedings. Fourth IEEE International Conference on Multimodal Interfaces","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130723502","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Smart Platform - a software infrastructure for Smart Space (SISS)","authors":"Weikai Xie, Yuanchun Shi, Guangyou Xu, Y. Mao","doi":"10.1109/ICMI.2002.1167033","DOIUrl":"https://doi.org/10.1109/ICMI.2002.1167033","url":null,"abstract":"A software infrastructure is fundamental to a Smart Space. Previously proposed software infrastructures for Smart Space (SISS) did not sufficiently address the issue of performance and usability. A new solution, Smart Platform, which is focused on improving these aspects of a SISS, is presented in this paper. To optimize its intermodule communication performance, the stream-oriented communication is distinguished from the message-oriented ones, and a corresponding hybrid communication scheme is proposed. To improve the usability, a featured loose coupling structure, a straightforward Publish-and-Subscribe coordination model as well as a set of user-friendly deployment and development tools are developed. Besides, Smart Platform is intended as an open and generic SISS available for other research groups. To this end, XML-based message syntax and the open wire-protocol based architecture are adopted to make sharing research efforts more easily.","PeriodicalId":208377,"journal":{"name":"Proceedings. Fourth IEEE International Conference on Multimodal Interfaces","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129691470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Research of machine learning method for specific information recognition on the Internet","authors":"Dequan Zheng, Y. Hu, T. Zhao, Hao Yu, Sheng Li","doi":"10.1109/ICMI.2002.1166998","DOIUrl":"https://doi.org/10.1109/ICMI.2002.1166998","url":null,"abstract":"With the available resources on the Internet becoming plentiful, a large amount of harmful information is permeating in and has been seriously affecting people's normal work and living. Therefore, harmful data streams must be recognized and filtered out effectively. After analyzing some harmful contents in Internet information streams, we present a new method, which recognizes specific information by machine learning (ML). We extracted key information from a number of corpuses through the ML method to obtain the part of speech (POS) transfer-form for key information by learning from corpuses, which is based on the same pronunciation matching of key information. Furthermore, the testing value of key information will be obtained in a real corpus to examine the likelihood between matching rules from information streams and those learnt from corpuses through the average value of POS transfer probability of key information. Therefore, the testing value for the whole real data stream will be obtained The experiment proved that the method was efficient for recognizing certain Internet harmful information.","PeriodicalId":208377,"journal":{"name":"Proceedings. Fourth IEEE International Conference on Multimodal Interfaces","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133041243","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Training a talking head","authors":"Michael M. Cohen, D. Massaro, R. Clark","doi":"10.1109/ICMI.2002.1167046","DOIUrl":"https://doi.org/10.1109/ICMI.2002.1167046","url":null,"abstract":"A Cyberware laser scan of DWM was made, Baldi's generic morphology was mapped into the form of DWM, this head was trained on real data recorded with Optotrak LED markers, and the quality of its speech was evaluated. Participants were asked to recognize auditory sentences presented alone in noise, aligned with the newly trained synthetic textured mapped target face, or the original natural face. There was a significant advantage when the noisy auditory sentence was paired with either head, with the synthetic textured mapped target face giving as much of an improvement as the original recordings of the natural face.","PeriodicalId":208377,"journal":{"name":"Proceedings. Fourth IEEE International Conference on Multimodal Interfaces","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126048080","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Gregory Aist, B. Kort, R. Reilly, Jack Mostow, Rosalind W. Picard
{"title":"Experimentally augmenting an intelligent tutoring system with human-supplied capabilities: adding human-provided emotional scaffolding to an automated reading tutor that listens","authors":"Gregory Aist, B. Kort, R. Reilly, Jack Mostow, Rosalind W. Picard","doi":"10.1109/ICMI.2002.1167044","DOIUrl":"https://doi.org/10.1109/ICMI.2002.1167044","url":null,"abstract":"We present the first statistically reliable empirical evidence from a controlled study for the effect of human-provided emotional scaffolding on student persistence in an intelligent tutoring system. We describe an experiment that added human-provided emotional scaffolding to an automated Reading Tutor that listens, and discuss the methodology we developed to conduct this experiment. Each student participated in one (experimental) session with emotional scaffolding, and in one (control) session without emotional scaffolding, counterbalanced by order of session. Each session was divided into several portions. After each portion of the session was completed, the Reading Tutor gave the student a choice: continue, or quit. We measured persistence as the number of portions the student completed. Human-provided emotional scaffolding added to the automated Reading Tutor resulted in increased student persistence, compared to the Reading Tutor alone. Increased persistence means increased time on task, which ought lead to improved learning. If these results for reading turn out to hold for other domains too, the implication for intelligent tutoring systems is that they should respond with not just cognitive support-but emotional scaffolding as well. Furthermore, the general technique of adding human-supplied capabilities to an existing intelligent tutoring system should prove useful for studying other ITSs too.","PeriodicalId":208377,"journal":{"name":"Proceedings. Fourth IEEE International Conference on Multimodal Interfaces","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126193216","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}