Proceedings Integration of Speech and Image Understanding最新文献

Towards computer vision with description logics: some recent progress 用描述逻辑实现计算机视觉:一些最新进展

Proceedings Integration of Speech and Image Understanding Pub Date : 1999-09-21 DOI: 10.1109/ISIU.1999.824868

R. Moller, B. Neumann, Michael Wessel

引用次数: 36

Connecting concepts from vision and speech processing 连接视觉和语音处理的概念

Proceedings Integration of Speech and Image Understanding Pub Date : 1999-09-21 DOI: 10.1109/ISIU.1999.824829

S. Wachsmuth, G. Sagerer

引用次数: 6

From images to sentences via spatial relations 通过空间关系从图像到句子

Proceedings Integration of Speech and Image Understanding Pub Date : 1999-09-21 DOI: 10.1109/ISIU.1999.824875

A. Abella, J. Kender

{"title":"From images to sentences via spatial relations","authors":"A. Abella, J. Kender","doi":"10.1109/ISIU.1999.824875","DOIUrl":"https://doi.org/10.1109/ISIU.1999.824875","url":null,"abstract":"This work presents a conceptual framework for representing, manipulating, measuring, and communicating in natural language several ideas about topological (non-metric) spatial locations, object spatial contexts, and user expectations of spatial relationships. It articulates a theory of spatial relations, how they can be represented as fuzzy predicates internally, and how they can be appropriately derived from, imagery; then, how they can be augmented or filtered using prior knowledge, and lastly, how they can produce natural language statements about location and space. This framework quantifies the notions of context and vagueness, so that all spatial relations are measurably accurate, provably efficient, and matched to users' expectations. The work makes explicit two critical heuristics for reducing the complexity of the relationships implicit in imagery, one a general rule for single object descriptions, and the other a general rule for rank ordering object relationships. A derived working system combines variable aspects of computer science and linguistics in such a way so as to be extensible to many environments. The system has been demonstrated both in, a landmark navigation task and in a medical task, two very separate domains, and has been evaluated in both.","PeriodicalId":227256,"journal":{"name":"Proceedings Integration of Speech and Image Understanding","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128691135","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 22

Knowledge based image and speech analysis for service robots 基于知识的服务机器人图像和语音分析

Proceedings Integration of Speech and Image Understanding Pub Date : 1999-09-21 DOI: 10.1109/ISIU.1999.824841

U. Ahlrichs, J. Fischer, Joachim Denzler, C. Drexler, H. Niemann, E. Noth, D. Paulus

{"title":"Knowledge based image and speech analysis for service robots","authors":"U. Ahlrichs, J. Fischer, Joachim Denzler, C. Drexler, H. Niemann, E. Noth, D. Paulus","doi":"10.1109/ISIU.1999.824841","DOIUrl":"https://doi.org/10.1109/ISIU.1999.824841","url":null,"abstract":"Active visual based scene exploration as well as speech understanding and dialogue are important skills of a service robot which is employed in natural environments and has to interact with humans. In this paper we suggest a knowledge based approach for both scene exploration and spoken dialogue using semantic networks. For scene exploration the knowledge base contains information about camera movements and objects. In the dialogue system the knowledge base contains information about the individual dialogue steps as well as about syntax and semantics of utterances. In order to make use of the knowledge, an iterative control algorithm which has real-time and any-time capabilities is applied. In addition, we propose appearance based object models which can substitute the object models represented in the knowledge base for scene exploration. We show the applicability of the approach for exploration of office scenes and for spoken dialogues in the experiments. The integration of the multi-sensory input can easily be done, since the knowledge about both application domains is represented using the same network formalism.","PeriodicalId":227256,"journal":{"name":"Proceedings Integration of Speech and Image Understanding","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114691454","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 19

Learning audio-visual associations using mutual information 利用相互信息学习视听联系

Proceedings Integration of Speech and Image Understanding Pub Date : 1999-09-21 DOI: 10.1109/ISIU.1999.824909

D. Roy, B. Schiele, A. Pentland

引用次数: 33

From video to language-a detour via logic vs. jumping to conclusions 从视频到语言——通过逻辑绕路还是直接下结论

Proceedings Integration of Speech and Image Understanding Pub Date : 1999-09-21 DOI: 10.1109/ISIU.1999.824862

H. Nagel

引用次数: 8

Towards affective integration of vision, behavior, and speech processing 迈向视觉、行为和言语处理的情感整合

Proceedings Integration of Speech and Image Understanding Pub Date : 1999-09-21 DOI: 10.1109/ISIU.1999.824850

Naoyuki Okada, Kentaro Inui, M. Tokuhisa

{"title":"Towards affective integration of vision, behavior, and speech processing","authors":"Naoyuki Okada, Kentaro Inui, M. Tokuhisa","doi":"10.1109/ISIU.1999.824850","DOIUrl":"https://doi.org/10.1109/ISIU.1999.824850","url":null,"abstract":"In each subfield of artificial intelligence such as image understanding, speech understanding, robotics, etc., a tremendous amount of research effort has so far yielded considerable results. Unfortunately, they have ended up too different to combine with one another straight-forwardly. We have been conducting a case study, or AESOPWORLD project, aiming at establishing an architectural foundation of \"integrated\" intelligent agents. In this article, we first review our agent model, which integrates the seven mental and the two physical faculties: recognition, planning, action, desire, emotion, memory, language, and sensor, actuator. We then describe each faculty of recognition, action, and planning, and their interaction by centering around planning. Image understanding is understood as a part of this recognition. Next, we show dialogue processing, where the faculties of recognition and planning also play an essential role for communications. Finally, we discuss the faculty of emotions to show an application of our agent to affective communications. This computation of emotions could be expected to be a base's for human-friendly interfaces.","PeriodicalId":227256,"journal":{"name":"Proceedings Integration of Speech and Image Understanding","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115112694","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11