{"title":"Conceptual Exploration of Contextual Information for Situational\n Understanding","authors":"Stratis Aloimonos, A. Raglin","doi":"10.54941/ahfe1002855","DOIUrl":null,"url":null,"abstract":"The Army is often required to deploy soldiers into dangerous situations\n to offer assistance and relief. When deployed, these soldiers need to be\n aware of the potential dangers, properly assess the level of possible\n threats, and make the best choices to respond. One solution for this problem\n space is to have an intelligent system that recognizes scenes which may\n contain danger, regardless of the type or timeframe associated with that\n danger. This type of system would help make decisions about what to do in\n situations where danger may be prevalent. Thus, creating an intelligent\n system that could identify the scene and contextual information, for\n example, potential dangers, would provide greater situational understanding\n and support autonomous systems and solider interactions. As a proxy for\n representing scenes that may be similar to those encountered by soldiers, a\n set of images of natural or manmade disasters were selected and used to\n identify strengths and weaknesses in existing models for this type of\n intelligent system. In this work, images from CRISISMMD, a dataset of\n natural disasters tweets, as well as other images of disasters in the public\n domain which do not belong to any particular dataset, are used. For the\n initial phase of the work this dataset was used to determine and showcase\n the strengths and weaknesses of existing object recognition and visual\n question answering systems that when combined would create a prototype\n intelligent system. Specifically, YOLO (You Only Look Once), augmented with\n Word2Vec (a natural language processing (NLP) system which finds the\n similarities of different words in a very large corpus) was selected for\n performing the object recognition (Bochkovskiy et al. 2020). This system was\n selected to identify objects further based on the presence of other, similar\n objects using the similarities between their names. Also, CLIP (Contrastive\n Language Image Pretraining), which identifies the probabilities of scenes\n based on a certain number of possibilities and BLIP (Bootstrapping Language\n Image Pretraining) (Li et al. 2022), an advanced visual question answering\n system which is also capable of generating captions for images were\n explored. In addition, a concept of an intelligent system where contextual\n information is identified and utilized can be used to support situational\n understanding.","PeriodicalId":269162,"journal":{"name":"Proceedings of the 6th International Conference on Intelligent Human Systems Integration (IHSI 2023) Integrating People and Intelligent Systems, February 22–24, 2023, Venice, Italy","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 6th International Conference on Intelligent Human Systems Integration (IHSI 2023) Integrating People and Intelligent Systems, February 22–24, 2023, Venice, Italy","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.54941/ahfe1002855","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The Army is often required to deploy soldiers into dangerous situations
to offer assistance and relief. When deployed, these soldiers need to be
aware of the potential dangers, properly assess the level of possible
threats, and make the best choices to respond. One solution for this problem
space is to have an intelligent system that recognizes scenes which may
contain danger, regardless of the type or timeframe associated with that
danger. This type of system would help make decisions about what to do in
situations where danger may be prevalent. Thus, creating an intelligent
system that could identify the scene and contextual information, for
example, potential dangers, would provide greater situational understanding
and support autonomous systems and solider interactions. As a proxy for
representing scenes that may be similar to those encountered by soldiers, a
set of images of natural or manmade disasters were selected and used to
identify strengths and weaknesses in existing models for this type of
intelligent system. In this work, images from CRISISMMD, a dataset of
natural disasters tweets, as well as other images of disasters in the public
domain which do not belong to any particular dataset, are used. For the
initial phase of the work this dataset was used to determine and showcase
the strengths and weaknesses of existing object recognition and visual
question answering systems that when combined would create a prototype
intelligent system. Specifically, YOLO (You Only Look Once), augmented with
Word2Vec (a natural language processing (NLP) system which finds the
similarities of different words in a very large corpus) was selected for
performing the object recognition (Bochkovskiy et al. 2020). This system was
selected to identify objects further based on the presence of other, similar
objects using the similarities between their names. Also, CLIP (Contrastive
Language Image Pretraining), which identifies the probabilities of scenes
based on a certain number of possibilities and BLIP (Bootstrapping Language
Image Pretraining) (Li et al. 2022), an advanced visual question answering
system which is also capable of generating captions for images were
explored. In addition, a concept of an intelligent system where contextual
information is identified and utilized can be used to support situational
understanding.
军队经常被要求将士兵部署到危险的情况下,提供援助和救济。在部署时,这些士兵需要意识到潜在的危险,正确评估可能的威胁程度,并做出最佳选择来应对。这个问题的一个解决方案是,拥有一个智能系统,可以识别可能包含危险的场景,而不考虑与该危险相关的类型或时间框架。这种类型的系统将有助于在危险可能普遍存在的情况下做出该做什么的决定。因此,创建一个能够识别场景和上下文信息(例如潜在危险)的智能系统,将提供更好的情景理解,并支持自主系统和士兵互动。作为代表可能与士兵遇到的类似场景的代理,我们选择了一组自然灾害或人为灾害的图像,并用于识别这类智能系统现有模型的优缺点。在这项工作中,使用了来自CRISISMMD的图像,这是一个自然灾害推文数据集,以及其他不属于任何特定数据集的公共领域的灾难图像。在工作的初始阶段,该数据集用于确定和展示现有对象识别和视觉问答系统的优缺点,这些系统结合起来将创建一个原型智能系统。具体来说,使用Word2Vec(一种自然语言处理(NLP)系统,可以在非常大的语料库中发现不同单词的相似性)增强的YOLO (You Only Look Once)被选择用于执行对象识别(Bochkovskiy等人,2020)。选择该系统是为了根据其他类似物体的存在,利用它们名称之间的相似性来进一步识别物体。此外,研究人员还探索了基于一定数量可能性识别场景概率的CLIP(对比语言图像预训练)和BLIP (Bootstrapping语言图像预训练)(Li et al. 2022),这是一种高级视觉问答系统,也能够为图像生成字幕。此外,识别和利用上下文信息的智能系统概念可用于支持情景理解。