{"title":"Towards computer vision with description logics: some recent progress","authors":"R. Moller, B. Neumann, Michael Wessel","doi":"10.1109/ISIU.1999.824868","DOIUrl":"https://doi.org/10.1109/ISIU.1999.824868","url":null,"abstract":"A description logic (DL) is a knowledge representation formalism which may provide interesting inference services for diverse application areas. This paper first gives an overview of the benefits which a DL may provide for computer vision. The main body of the paper presents recent work at Hamburg University on extending DLs to handle spatial reasoning and default reasoning.","PeriodicalId":227256,"journal":{"name":"Proceedings Integration of Speech and Image Understanding","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129319842","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Connecting concepts from vision and speech processing","authors":"S. Wachsmuth, G. Sagerer","doi":"10.1109/ISIU.1999.824829","DOIUrl":"https://doi.org/10.1109/ISIU.1999.824829","url":null,"abstract":"This paper addresses the problem of how to establish referential links between interpretations of speech and visual data. In order to get rid of erroneous, vague, or incomplete conceptual descriptions, we propose a probabilistic interaction scheme. The modelling of dependencies and the calculation of inferences are realized by using Bayesian networks. This interaction scheme provides a basis for disambiguation and error recovery. We implemented an interaction component in an assembly task environment. A robot constructor can be instructed by speech and pointing gestures in order to connect primitive component parts of a wooden toy construction kit. The system is evaluated on a test data set which consists of 448 spoken utterances from 16 speakers who name objects on 10 images from different scenes. First results show the effectiveness and robustness of the probabilistic approach.","PeriodicalId":227256,"journal":{"name":"Proceedings Integration of Speech and Image Understanding","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122025265","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"From images to sentences via spatial relations","authors":"A. Abella, J. Kender","doi":"10.1109/ISIU.1999.824875","DOIUrl":"https://doi.org/10.1109/ISIU.1999.824875","url":null,"abstract":"This work presents a conceptual framework for representing, manipulating, measuring, and communicating in natural language several ideas about topological (non-metric) spatial locations, object spatial contexts, and user expectations of spatial relationships. It articulates a theory of spatial relations, how they can be represented as fuzzy predicates internally, and how they can be appropriately derived from, imagery; then, how they can be augmented or filtered using prior knowledge, and lastly, how they can produce natural language statements about location and space. This framework quantifies the notions of context and vagueness, so that all spatial relations are measurably accurate, provably efficient, and matched to users' expectations. The work makes explicit two critical heuristics for reducing the complexity of the relationships implicit in imagery, one a general rule for single object descriptions, and the other a general rule for rank ordering object relationships. A derived working system combines variable aspects of computer science and linguistics in such a way so as to be extensible to many environments. The system has been demonstrated both in, a landmark navigation task and in a medical task, two very separate domains, and has been evaluated in both.","PeriodicalId":227256,"journal":{"name":"Proceedings Integration of Speech and Image Understanding","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128691135","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
U. Ahlrichs, J. Fischer, Joachim Denzler, C. Drexler, H. Niemann, E. Noth, D. Paulus
{"title":"Knowledge based image and speech analysis for service robots","authors":"U. Ahlrichs, J. Fischer, Joachim Denzler, C. Drexler, H. Niemann, E. Noth, D. Paulus","doi":"10.1109/ISIU.1999.824841","DOIUrl":"https://doi.org/10.1109/ISIU.1999.824841","url":null,"abstract":"Active visual based scene exploration as well as speech understanding and dialogue are important skills of a service robot which is employed in natural environments and has to interact with humans. In this paper we suggest a knowledge based approach for both scene exploration and spoken dialogue using semantic networks. For scene exploration the knowledge base contains information about camera movements and objects. In the dialogue system the knowledge base contains information about the individual dialogue steps as well as about syntax and semantics of utterances. In order to make use of the knowledge, an iterative control algorithm which has real-time and any-time capabilities is applied. In addition, we propose appearance based object models which can substitute the object models represented in the knowledge base for scene exploration. We show the applicability of the approach for exploration of office scenes and for spoken dialogues in the experiments. The integration of the multi-sensory input can easily be done, since the knowledge about both application domains is represented using the same network formalism.","PeriodicalId":227256,"journal":{"name":"Proceedings Integration of Speech and Image Understanding","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114691454","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Learning audio-visual associations using mutual information","authors":"D. Roy, B. Schiele, A. Pentland","doi":"10.1109/ISIU.1999.824909","DOIUrl":"https://doi.org/10.1109/ISIU.1999.824909","url":null,"abstract":"This paper addresses the problem of finding useful associations between audio and visual input signals. The proposed approach is based on the maximization of mutual information of audio-visual clusters. This approach results in segmentation of continuous speech signals, and finds visual categories which correspond to segmented spoken words. Such audio-visual associations may be used for modeling infant language acquisition and to dynamically personalize speech-based human-computer interfaces for various applications including catalog browsing and wearable computing. This paper describes an implemented system for learning shape names from camera and microphone input. We present results in an evaluation of the system for the domain of modeling language learning.","PeriodicalId":227256,"journal":{"name":"Proceedings Integration of Speech and Image Understanding","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134257971","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"From video to language-a detour via logic vs. jumping to conclusions","authors":"H. Nagel","doi":"10.1109/ISIU.1999.824862","DOIUrl":"https://doi.org/10.1109/ISIU.1999.824862","url":null,"abstract":"Temporal developments within a scene can be recorded by a video camera in the form of spatio-temporal grayvalue variations. The digitization and subsequent algorithmic evaluation of the resulting video sequence transforms, as a first step, the original signal into a geometric description which comprises the shape, position, and trajectory of bodies in the depicted 3D scene. In order to facilitate communication of this information to human users, it appears advantageous to transform such a geometric description as a second step into a fuzzy metric-temporal logic representation. This latter can be processed in turn by logic operations in order to extract the information of interest to a particular user at the time of his interaction with the system. This contribution discusses problems which show up in an attempt to specify and use a fuzzy metric-temporal logic representation of traffic situations at innercity road intersections.","PeriodicalId":227256,"journal":{"name":"Proceedings Integration of Speech and Image Understanding","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129305051","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Towards affective integration of vision, behavior, and speech processing","authors":"Naoyuki Okada, Kentaro Inui, M. Tokuhisa","doi":"10.1109/ISIU.1999.824850","DOIUrl":"https://doi.org/10.1109/ISIU.1999.824850","url":null,"abstract":"In each subfield of artificial intelligence such as image understanding, speech understanding, robotics, etc., a tremendous amount of research effort has so far yielded considerable results. Unfortunately, they have ended up too different to combine with one another straight-forwardly. We have been conducting a case study, or AESOPWORLD project, aiming at establishing an architectural foundation of \"integrated\" intelligent agents. In this article, we first review our agent model, which integrates the seven mental and the two physical faculties: recognition, planning, action, desire, emotion, memory, language, and sensor, actuator. We then describe each faculty of recognition, action, and planning, and their interaction by centering around planning. Image understanding is understood as a part of this recognition. Next, we show dialogue processing, where the faculties of recognition and planning also play an essential role for communications. Finally, we discuss the faculty of emotions to show an application of our agent to affective communications. This computation of emotions could be expected to be a base's for human-friendly interfaces.","PeriodicalId":227256,"journal":{"name":"Proceedings Integration of Speech and Image Understanding","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115112694","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}