Mingze Xi, Madhawa Perera, Stuart Anderson, Matt Adcock
{"title":"Towards Situated Imaging","authors":"Mingze Xi, Madhawa Perera, Stuart Anderson, Matt Adcock","doi":"10.1109/AIxVR59861.2024.00019","DOIUrl":null,"url":null,"abstract":"Integrating augmented reality (AR) with externally hosted computer vision (CV) models can provide enhanced AR experiences. For instance, by utilising an advanced object detection model, an AR system can recognise a range of predefined objects within the user’s immediate surroundings. However, existing AR-CV workflows rarely incorporate user-defined contextual information, which often come in the form of multi-modal queries blending both natural and body language. Interpreting these intricate user queries, processing them via a sequence of deep learning models, and then adeptly visualising the outcomes remains a formidable challenge.In this paper, we describe Situated Imaging (SI), an extensible array of techniques for in-situ interactive visual computing. We delineate the architecture of the Situated Imaging framework, which enhances the conventional AR-CV workflow by incorporating a range of advanced interactive and generative computer vision techniques. We also describe a demonstration implementation to illustrate the pipeline’s capabilities, enabling users to engage in activities such as labelling, highlighting, or generating content within a user-defined context. Furthermore, we provide initial guidance for tailoring this framework to example use cases and identify avenues for future research. Our model-agnostic Situated Imaging pipeline acts as a valuable starting point for both academic scholars and industry practitioners interested in enhancing the AR experience by incorporating computationally intensive AI models.","PeriodicalId":518749,"journal":{"name":"2024 IEEE International Conference on Artificial Intelligence and eXtended and Virtual Reality (AIxVR)","volume":"67 3","pages":"85-89"},"PeriodicalIF":0.0000,"publicationDate":"2024-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2024 IEEE International Conference on Artificial Intelligence and eXtended and Virtual Reality (AIxVR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AIxVR59861.2024.00019","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Integrating augmented reality (AR) with externally hosted computer vision (CV) models can provide enhanced AR experiences. For instance, by utilising an advanced object detection model, an AR system can recognise a range of predefined objects within the user’s immediate surroundings. However, existing AR-CV workflows rarely incorporate user-defined contextual information, which often come in the form of multi-modal queries blending both natural and body language. Interpreting these intricate user queries, processing them via a sequence of deep learning models, and then adeptly visualising the outcomes remains a formidable challenge.In this paper, we describe Situated Imaging (SI), an extensible array of techniques for in-situ interactive visual computing. We delineate the architecture of the Situated Imaging framework, which enhances the conventional AR-CV workflow by incorporating a range of advanced interactive and generative computer vision techniques. We also describe a demonstration implementation to illustrate the pipeline’s capabilities, enabling users to engage in activities such as labelling, highlighting, or generating content within a user-defined context. Furthermore, we provide initial guidance for tailoring this framework to example use cases and identify avenues for future research. Our model-agnostic Situated Imaging pipeline acts as a valuable starting point for both academic scholars and industry practitioners interested in enhancing the AR experience by incorporating computationally intensive AI models.
将增强现实(AR)与外部托管的计算机视觉(CV)模型相结合,可以提供增强的 AR 体验。例如,通过利用先进的物体检测模型,增强现实系统可以识别用户周围的一系列预定义物体。然而,现有的 AR-CV 工作流程很少纳入用户定义的上下文信息,这些信息通常以融合自然语言和肢体语言的多模态查询形式出现。解读这些错综复杂的用户查询,通过一系列深度学习模型对其进行处理,然后巧妙地将结果可视化,这仍然是一项艰巨的挑战。在本文中,我们介绍了情景成像(SI),这是一系列可扩展的现场交互式视觉计算技术。我们描述了情景成像框架的架构,该框架通过整合一系列先进的交互式和生成式计算机视觉技术,增强了传统的 AR-CV 工作流程。我们还介绍了一个示范实施方案,以说明该管道的功能,使用户能够在用户定义的环境中参与标注、突出显示或生成内容等活动。此外,我们还提供了针对示例用例定制该框架的初步指导,并确定了未来的研究方向。对于有意通过纳入计算密集型人工智能模型来增强 AR 体验的学术学者和行业从业人员来说,我们的模型无关情境成像管道是一个宝贵的起点。