The Ungrounded Alignment Problem

arXiv - CS - Neural and Evolutionary Computing Pub Date : 2024-08-08 DOI:arxiv-2408.04242

Marc Pickett, Aakash Kumar Nain, Joseph Modayil, Llion Jones

{"title":"The Ungrounded Alignment Problem","authors":"Marc Pickett, Aakash Kumar Nain, Joseph Modayil, Llion Jones","doi":"arxiv-2408.04242","DOIUrl":null,"url":null,"abstract":"Modern machine learning systems have demonstrated substantial abilities with\nmethods that either embrace or ignore human-provided knowledge, but combining\nbenefits of both styles remains a challenge. One particular challenge involves\ndesigning learning systems that exhibit built-in responses to specific abstract\nstimulus patterns, yet are still plastic enough to be agnostic about the\nmodality and exact form of their inputs. In this paper, we investigate what we\ncall The Ungrounded Alignment Problem, which asks How can we build in\npredefined knowledge in a system where we don't know how a given stimulus will\nbe grounded? This paper examines a simplified version of the general problem,\nwhere an unsupervised learner is presented with a sequence of images for the\ncharacters in a text corpus, and this learner is later evaluated on its ability\nto recognize specific (possibly rare) sequential patterns. Importantly, the\nlearner is given no labels during learning or evaluation, but must map images\nfrom an unknown font or permutation to its correct class label. That is, at no\npoint is our learner given labeled images, where an image vector is explicitly\nassociated with a class label. Despite ample work in unsupervised and\nself-supervised loss functions, all current methods require a labeled\nfine-tuning phase to map the learned representations to correct classes.\nFinding this mapping in the absence of labels may seem a fool's errand, but our\nmain result resolves this seeming paradox. We show that leveraging only letter\nbigram frequencies is sufficient for an unsupervised learner both to reliably\nassociate images to class labels and to reliably identify trigger words in the\nsequence of inputs. More generally, this method suggests an approach for\nencoding specific desired innate behaviour in modality-agnostic models.","PeriodicalId":501347,"journal":{"name":"arXiv - CS - Neural and Evolutionary Computing","volume":"111 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Neural and Evolutionary Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.04242","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Modern machine learning systems have demonstrated substantial abilities with methods that either embrace or ignore human-provided knowledge, but combining benefits of both styles remains a challenge. One particular challenge involves designing learning systems that exhibit built-in responses to specific abstract stimulus patterns, yet are still plastic enough to be agnostic about the modality and exact form of their inputs. In this paper, we investigate what we call The Ungrounded Alignment Problem, which asks How can we build in predefined knowledge in a system where we don't know how a given stimulus will be grounded? This paper examines a simplified version of the general problem, where an unsupervised learner is presented with a sequence of images for the characters in a text corpus, and this learner is later evaluated on its ability to recognize specific (possibly rare) sequential patterns. Importantly, the learner is given no labels during learning or evaluation, but must map images from an unknown font or permutation to its correct class label. That is, at no point is our learner given labeled images, where an image vector is explicitly associated with a class label. Despite ample work in unsupervised and self-supervised loss functions, all current methods require a labeled fine-tuning phase to map the learned representations to correct classes. Finding this mapping in the absence of labels may seem a fool's errand, but our main result resolves this seeming paradox. We show that leveraging only letter bigram frequencies is sufficient for an unsupervised learner both to reliably associate images to class labels and to reliably identify trigger words in the sequence of inputs. More generally, this method suggests an approach for encoding specific desired innate behaviour in modality-agnostic models.

查看原文本刊更多论文

不接地气的对齐问题

现代机器学习系统已经通过接受或忽略人类提供的知识的方法展示出了强大的能力，但如何将两种风格的优势结合起来仍然是一个挑战。其中一个特别的挑战是设计既能对特定的抽象刺激模式做出内置反应，又能对输入的模式和确切形式保持足够可塑性的学习系统。在本文中，我们研究了所谓的 "未接地对齐问题"（The Ungrounded Alignment Problem），即我们如何才能在一个不知道给定刺激将如何接地的系统中构建预定义知识？本文研究了一般问题的简化版本，即向无监督学习者提供文本语料库中的字符图像序列，然后评估该学习者识别特定（可能罕见）序列模式的能力。重要的是，在学习或评估过程中，学习者不会得到任何标签，但必须将未知字体或排列的图像映射到正确的类标签上。也就是说，我们的学习者在任何时候都不会得到有标签的图像，图像向量与类标签明确相关。尽管在无监督和自我监督损失函数方面做了大量工作，但目前所有的方法都需要一个有标签的微调阶段，才能将学习到的表征映射到正确的类别上。我们的研究表明，对于无监督学习者来说，仅利用字母图谱频率就足以将图像与类别标签可靠地联系起来，并在这些输入序列中可靠地识别出触发词。更广义地说，这种方法提出了一种在模式无关模型中编码特定所需先天行为的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

arXiv - CS - Neural and Evolutionary Computing

自引率

0.00%

发文量