基于预训练嵌入的涌现语言表征动态图像中的潜在概念

Int. J. Semantic Comput. Pub Date : 2020-09-01 DOI:10.1142/s1793351x20400140

James R. Kubricht, A. Santamaría-Pang, Chinmaya Devaraj, Aritra Chowdhury, P. Tu

{"title":"基于预训练嵌入的涌现语言表征动态图像中的潜在概念","authors":"James R. Kubricht, A. Santamaría-Pang, Chinmaya Devaraj, Aritra Chowdhury, P. Tu","doi":"10.1142/s1793351x20400140","DOIUrl":null,"url":null,"abstract":"Recent unsupervised learning approaches have explored the feasibility of semantic analysis and interpretation of imagery using Emergent Language (EL) models. As EL requires some form of numerical embedding as input, it remains unclear which type is required in order for the EL to properly capture key semantic concepts associated with a given domain. In this paper, we compare unsupervised and supervised approaches for generating embeddings across two experiments. In Experiment 1, data are produced using a single-agent simulator. In each episode, a goal-driven agent attempts to accomplish a number of tasks in a synthetic cityscape environment which includes houses, banks, theaters and restaurants. In Experiment 2, a comparatively smaller dataset is produced where one or more objects demonstrate various types of physical motion in a 3D simulator environment. We investigate whether EL models generated from embeddings of raw pixel data produce expressions that capture key latent concepts (i.e. an agent’s motivations or physical motion types) in each environment. Our initial experiments show that the supervised learning approaches yield embeddings and EL descriptions that capture meaningful concepts from raw pixel inputs. Alternatively, embeddings from an unsupervised learning approach result in greater ambiguity with respect to latent concepts.","PeriodicalId":217956,"journal":{"name":"Int. J. Semantic Comput.","volume":"57 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Emergent Languages from Pretrained Embeddings Characterize Latent Concepts in Dynamic Imagery\",\"authors\":\"James R. Kubricht, A. Santamaría-Pang, Chinmaya Devaraj, Aritra Chowdhury, P. Tu\",\"doi\":\"10.1142/s1793351x20400140\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recent unsupervised learning approaches have explored the feasibility of semantic analysis and interpretation of imagery using Emergent Language (EL) models. As EL requires some form of numerical embedding as input, it remains unclear which type is required in order for the EL to properly capture key semantic concepts associated with a given domain. In this paper, we compare unsupervised and supervised approaches for generating embeddings across two experiments. In Experiment 1, data are produced using a single-agent simulator. In each episode, a goal-driven agent attempts to accomplish a number of tasks in a synthetic cityscape environment which includes houses, banks, theaters and restaurants. In Experiment 2, a comparatively smaller dataset is produced where one or more objects demonstrate various types of physical motion in a 3D simulator environment. We investigate whether EL models generated from embeddings of raw pixel data produce expressions that capture key latent concepts (i.e. an agent’s motivations or physical motion types) in each environment. Our initial experiments show that the supervised learning approaches yield embeddings and EL descriptions that capture meaningful concepts from raw pixel inputs. Alternatively, embeddings from an unsupervised learning approach result in greater ambiguity with respect to latent concepts.\",\"PeriodicalId\":217956,\"journal\":{\"name\":\"Int. J. Semantic Comput.\",\"volume\":\"57 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Int. J. Semantic Comput.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1142/s1793351x20400140\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Int. J. Semantic Comput.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1142/s1793351x20400140","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

最近的无监督学习方法已经探索了使用紧急语言模型对图像进行语义分析和解释的可行性。由于EL需要某种形式的数字嵌入作为输入，因此尚不清楚EL需要哪种类型才能正确捕获与给定领域相关的关键语义概念。在本文中，我们比较了在两个实验中生成嵌入的无监督和有监督方法。在实验1中，数据是使用单代理模拟器生成的。在每一集中，一个目标驱动的代理人试图在一个合成的城市景观环境中完成一些任务，包括房屋、银行、剧院和餐馆。在实验2中，生成了一个相对较小的数据集，其中一个或多个对象在3D模拟器环境中展示了各种类型的物理运动。我们研究了从原始像素数据嵌入中生成的EL模型是否产生了在每个环境中捕获关键潜在概念(即代理的动机或物理运动类型)的表达式。我们的初步实验表明，监督学习方法产生了从原始像素输入中捕获有意义概念的嵌入和EL描述。另外，来自无监督学习方法的嵌入会导致相对于潜在概念的更大的模糊性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Emergent Languages from Pretrained Embeddings Characterize Latent Concepts in Dynamic Imagery

Recent unsupervised learning approaches have explored the feasibility of semantic analysis and interpretation of imagery using Emergent Language (EL) models. As EL requires some form of numerical embedding as input, it remains unclear which type is required in order for the EL to properly capture key semantic concepts associated with a given domain. In this paper, we compare unsupervised and supervised approaches for generating embeddings across two experiments. In Experiment 1, data are produced using a single-agent simulator. In each episode, a goal-driven agent attempts to accomplish a number of tasks in a synthetic cityscape environment which includes houses, banks, theaters and restaurants. In Experiment 2, a comparatively smaller dataset is produced where one or more objects demonstrate various types of physical motion in a 3D simulator environment. We investigate whether EL models generated from embeddings of raw pixel data produce expressions that capture key latent concepts (i.e. an agent’s motivations or physical motion types) in each environment. Our initial experiments show that the supervised learning approaches yield embeddings and EL descriptions that capture meaningful concepts from raw pixel inputs. Alternatively, embeddings from an unsupervised learning approach result in greater ambiguity with respect to latent concepts.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Int. J. Semantic Comput.

自引率

0.00%

发文量