Proceedings Integration of Speech and Image Understanding最新文献

筛选
英文 中文
Towards computer vision with description logics: some recent progress 用描述逻辑实现计算机视觉:一些最新进展
Proceedings Integration of Speech and Image Understanding Pub Date : 1999-09-21 DOI: 10.1109/ISIU.1999.824868
R. Moller, B. Neumann, Michael Wessel
{"title":"Towards computer vision with description logics: some recent progress","authors":"R. Moller, B. Neumann, Michael Wessel","doi":"10.1109/ISIU.1999.824868","DOIUrl":"https://doi.org/10.1109/ISIU.1999.824868","url":null,"abstract":"A description logic (DL) is a knowledge representation formalism which may provide interesting inference services for diverse application areas. This paper first gives an overview of the benefits which a DL may provide for computer vision. The main body of the paper presents recent work at Hamburg University on extending DLs to handle spatial reasoning and default reasoning.","PeriodicalId":227256,"journal":{"name":"Proceedings Integration of Speech and Image Understanding","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1999-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129319842","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 36
Connecting concepts from vision and speech processing 连接视觉和语音处理的概念
Proceedings Integration of Speech and Image Understanding Pub Date : 1999-09-21 DOI: 10.1109/ISIU.1999.824829
S. Wachsmuth, G. Sagerer
{"title":"Connecting concepts from vision and speech processing","authors":"S. Wachsmuth, G. Sagerer","doi":"10.1109/ISIU.1999.824829","DOIUrl":"https://doi.org/10.1109/ISIU.1999.824829","url":null,"abstract":"This paper addresses the problem of how to establish referential links between interpretations of speech and visual data. In order to get rid of erroneous, vague, or incomplete conceptual descriptions, we propose a probabilistic interaction scheme. The modelling of dependencies and the calculation of inferences are realized by using Bayesian networks. This interaction scheme provides a basis for disambiguation and error recovery. We implemented an interaction component in an assembly task environment. A robot constructor can be instructed by speech and pointing gestures in order to connect primitive component parts of a wooden toy construction kit. The system is evaluated on a test data set which consists of 448 spoken utterances from 16 speakers who name objects on 10 images from different scenes. First results show the effectiveness and robustness of the probabilistic approach.","PeriodicalId":227256,"journal":{"name":"Proceedings Integration of Speech and Image Understanding","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1999-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122025265","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
From images to sentences via spatial relations 通过空间关系从图像到句子
Proceedings Integration of Speech and Image Understanding Pub Date : 1999-09-21 DOI: 10.1109/ISIU.1999.824875
A. Abella, J. Kender
{"title":"From images to sentences via spatial relations","authors":"A. Abella, J. Kender","doi":"10.1109/ISIU.1999.824875","DOIUrl":"https://doi.org/10.1109/ISIU.1999.824875","url":null,"abstract":"This work presents a conceptual framework for representing, manipulating, measuring, and communicating in natural language several ideas about topological (non-metric) spatial locations, object spatial contexts, and user expectations of spatial relationships. It articulates a theory of spatial relations, how they can be represented as fuzzy predicates internally, and how they can be appropriately derived from, imagery; then, how they can be augmented or filtered using prior knowledge, and lastly, how they can produce natural language statements about location and space. This framework quantifies the notions of context and vagueness, so that all spatial relations are measurably accurate, provably efficient, and matched to users' expectations. The work makes explicit two critical heuristics for reducing the complexity of the relationships implicit in imagery, one a general rule for single object descriptions, and the other a general rule for rank ordering object relationships. A derived working system combines variable aspects of computer science and linguistics in such a way so as to be extensible to many environments. The system has been demonstrated both in, a landmark navigation task and in a medical task, two very separate domains, and has been evaluated in both.","PeriodicalId":227256,"journal":{"name":"Proceedings Integration of Speech and Image Understanding","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1999-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128691135","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
Knowledge based image and speech analysis for service robots 基于知识的服务机器人图像和语音分析
Proceedings Integration of Speech and Image Understanding Pub Date : 1999-09-21 DOI: 10.1109/ISIU.1999.824841
U. Ahlrichs, J. Fischer, Joachim Denzler, C. Drexler, H. Niemann, E. Noth, D. Paulus
{"title":"Knowledge based image and speech analysis for service robots","authors":"U. Ahlrichs, J. Fischer, Joachim Denzler, C. Drexler, H. Niemann, E. Noth, D. Paulus","doi":"10.1109/ISIU.1999.824841","DOIUrl":"https://doi.org/10.1109/ISIU.1999.824841","url":null,"abstract":"Active visual based scene exploration as well as speech understanding and dialogue are important skills of a service robot which is employed in natural environments and has to interact with humans. In this paper we suggest a knowledge based approach for both scene exploration and spoken dialogue using semantic networks. For scene exploration the knowledge base contains information about camera movements and objects. In the dialogue system the knowledge base contains information about the individual dialogue steps as well as about syntax and semantics of utterances. In order to make use of the knowledge, an iterative control algorithm which has real-time and any-time capabilities is applied. In addition, we propose appearance based object models which can substitute the object models represented in the knowledge base for scene exploration. We show the applicability of the approach for exploration of office scenes and for spoken dialogues in the experiments. The integration of the multi-sensory input can easily be done, since the knowledge about both application domains is represented using the same network formalism.","PeriodicalId":227256,"journal":{"name":"Proceedings Integration of Speech and Image Understanding","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1999-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114691454","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
From video to language-a detour via logic vs. jumping to conclusions 从视频到语言——通过逻辑绕路还是直接下结论
Proceedings Integration of Speech and Image Understanding Pub Date : 1999-09-21 DOI: 10.1109/ISIU.1999.824862
H. Nagel
{"title":"From video to language-a detour via logic vs. jumping to conclusions","authors":"H. Nagel","doi":"10.1109/ISIU.1999.824862","DOIUrl":"https://doi.org/10.1109/ISIU.1999.824862","url":null,"abstract":"Temporal developments within a scene can be recorded by a video camera in the form of spatio-temporal grayvalue variations. The digitization and subsequent algorithmic evaluation of the resulting video sequence transforms, as a first step, the original signal into a geometric description which comprises the shape, position, and trajectory of bodies in the depicted 3D scene. In order to facilitate communication of this information to human users, it appears advantageous to transform such a geometric description as a second step into a fuzzy metric-temporal logic representation. This latter can be processed in turn by logic operations in order to extract the information of interest to a particular user at the time of his interaction with the system. This contribution discusses problems which show up in an attempt to specify and use a fuzzy metric-temporal logic representation of traffic situations at innercity road intersections.","PeriodicalId":227256,"journal":{"name":"Proceedings Integration of Speech and Image Understanding","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1999-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129305051","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Towards affective integration of vision, behavior, and speech processing 迈向视觉、行为和言语处理的情感整合
Proceedings Integration of Speech and Image Understanding Pub Date : 1999-09-21 DOI: 10.1109/ISIU.1999.824850
Naoyuki Okada, Kentaro Inui, M. Tokuhisa
{"title":"Towards affective integration of vision, behavior, and speech processing","authors":"Naoyuki Okada, Kentaro Inui, M. Tokuhisa","doi":"10.1109/ISIU.1999.824850","DOIUrl":"https://doi.org/10.1109/ISIU.1999.824850","url":null,"abstract":"In each subfield of artificial intelligence such as image understanding, speech understanding, robotics, etc., a tremendous amount of research effort has so far yielded considerable results. Unfortunately, they have ended up too different to combine with one another straight-forwardly. We have been conducting a case study, or AESOPWORLD project, aiming at establishing an architectural foundation of \"integrated\" intelligent agents. In this article, we first review our agent model, which integrates the seven mental and the two physical faculties: recognition, planning, action, desire, emotion, memory, language, and sensor, actuator. We then describe each faculty of recognition, action, and planning, and their interaction by centering around planning. Image understanding is understood as a part of this recognition. Next, we show dialogue processing, where the faculties of recognition and planning also play an essential role for communications. Finally, we discuss the faculty of emotions to show an application of our agent to affective communications. This computation of emotions could be expected to be a base's for human-friendly interfaces.","PeriodicalId":227256,"journal":{"name":"Proceedings Integration of Speech and Image Understanding","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1999-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115112694","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Learning audio-visual associations using mutual information 利用相互信息学习视听联系
Proceedings Integration of Speech and Image Understanding Pub Date : 1999-09-21 DOI: 10.1109/ISIU.1999.824909
D. Roy, B. Schiele, A. Pentland
{"title":"Learning audio-visual associations using mutual information","authors":"D. Roy, B. Schiele, A. Pentland","doi":"10.1109/ISIU.1999.824909","DOIUrl":"https://doi.org/10.1109/ISIU.1999.824909","url":null,"abstract":"This paper addresses the problem of finding useful associations between audio and visual input signals. The proposed approach is based on the maximization of mutual information of audio-visual clusters. This approach results in segmentation of continuous speech signals, and finds visual categories which correspond to segmented spoken words. Such audio-visual associations may be used for modeling infant language acquisition and to dynamically personalize speech-based human-computer interfaces for various applications including catalog browsing and wearable computing. This paper describes an implemented system for learning shape names from camera and microphone input. We present results in an evaluation of the system for the domain of modeling language learning.","PeriodicalId":227256,"journal":{"name":"Proceedings Integration of Speech and Image Understanding","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1999-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134257971","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 33
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信