Towards Situated Imaging

Mingze Xi, Madhawa Perera, Stuart Anderson, Matt Adcock
{"title":"Towards Situated Imaging","authors":"Mingze Xi, Madhawa Perera, Stuart Anderson, Matt Adcock","doi":"10.1109/AIxVR59861.2024.00019","DOIUrl":null,"url":null,"abstract":"Integrating augmented reality (AR) with externally hosted computer vision (CV) models can provide enhanced AR experiences. For instance, by utilising an advanced object detection model, an AR system can recognise a range of predefined objects within the user’s immediate surroundings. However, existing AR-CV workflows rarely incorporate user-defined contextual information, which often come in the form of multi-modal queries blending both natural and body language. Interpreting these intricate user queries, processing them via a sequence of deep learning models, and then adeptly visualising the outcomes remains a formidable challenge.In this paper, we describe Situated Imaging (SI), an extensible array of techniques for in-situ interactive visual computing. We delineate the architecture of the Situated Imaging framework, which enhances the conventional AR-CV workflow by incorporating a range of advanced interactive and generative computer vision techniques. We also describe a demonstration implementation to illustrate the pipeline’s capabilities, enabling users to engage in activities such as labelling, highlighting, or generating content within a user-defined context. Furthermore, we provide initial guidance for tailoring this framework to example use cases and identify avenues for future research. Our model-agnostic Situated Imaging pipeline acts as a valuable starting point for both academic scholars and industry practitioners interested in enhancing the AR experience by incorporating computationally intensive AI models.","PeriodicalId":518749,"journal":{"name":"2024 IEEE International Conference on Artificial Intelligence and eXtended and Virtual Reality (AIxVR)","volume":"67 3","pages":"85-89"},"PeriodicalIF":0.0000,"publicationDate":"2024-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2024 IEEE International Conference on Artificial Intelligence and eXtended and Virtual Reality (AIxVR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AIxVR59861.2024.00019","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Integrating augmented reality (AR) with externally hosted computer vision (CV) models can provide enhanced AR experiences. For instance, by utilising an advanced object detection model, an AR system can recognise a range of predefined objects within the user’s immediate surroundings. However, existing AR-CV workflows rarely incorporate user-defined contextual information, which often come in the form of multi-modal queries blending both natural and body language. Interpreting these intricate user queries, processing them via a sequence of deep learning models, and then adeptly visualising the outcomes remains a formidable challenge.In this paper, we describe Situated Imaging (SI), an extensible array of techniques for in-situ interactive visual computing. We delineate the architecture of the Situated Imaging framework, which enhances the conventional AR-CV workflow by incorporating a range of advanced interactive and generative computer vision techniques. We also describe a demonstration implementation to illustrate the pipeline’s capabilities, enabling users to engage in activities such as labelling, highlighting, or generating content within a user-defined context. Furthermore, we provide initial guidance for tailoring this framework to example use cases and identify avenues for future research. Our model-agnostic Situated Imaging pipeline acts as a valuable starting point for both academic scholars and industry practitioners interested in enhancing the AR experience by incorporating computationally intensive AI models.
实现情景成像
将增强现实(AR)与外部托管的计算机视觉(CV)模型相结合,可以提供增强的 AR 体验。例如,通过利用先进的物体检测模型,增强现实系统可以识别用户周围的一系列预定义物体。然而,现有的 AR-CV 工作流程很少纳入用户定义的上下文信息,这些信息通常以融合自然语言和肢体语言的多模态查询形式出现。解读这些错综复杂的用户查询,通过一系列深度学习模型对其进行处理,然后巧妙地将结果可视化,这仍然是一项艰巨的挑战。在本文中,我们介绍了情景成像(SI),这是一系列可扩展的现场交互式视觉计算技术。我们描述了情景成像框架的架构,该框架通过整合一系列先进的交互式和生成式计算机视觉技术,增强了传统的 AR-CV 工作流程。我们还介绍了一个示范实施方案,以说明该管道的功能,使用户能够在用户定义的环境中参与标注、突出显示或生成内容等活动。此外,我们还提供了针对示例用例定制该框架的初步指导,并确定了未来的研究方向。对于有意通过纳入计算密集型人工智能模型来增强 AR 体验的学术学者和行业从业人员来说,我们的模型无关情境成像管道是一个宝贵的起点。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信