用释义研究语境嵌入的性质

North American Chapter of the Association for Computational Linguistics Pub Date : 2022-07-12 DOI:10.48550/arXiv.2207.05553

Laura Burdick, Jonathan K. Kummerfeld, Rada Mihalcea

{"title":"用释义研究语境嵌入的性质","authors":"Laura Burdick, Jonathan K. Kummerfeld, Rada Mihalcea","doi":"10.48550/arXiv.2207.05553","DOIUrl":null,"url":null,"abstract":"We use paraphrases as a unique source of data to analyze contextualized embeddings, with a particular focus on BERT. Because paraphrases naturally encode consistent word and phrase semantics, they provide a unique lens for investigating properties of embeddings. Using the Paraphrase Database’s alignments, we study words within paraphrases as well as phrase representations. We find that contextual embeddings effectively handle polysemous words, but give synonyms surprisingly different representations in many cases. We confirm previous findings that BERT is sensitive to word order, but find slightly different patterns than prior work in terms of the level of contextualization across BERT’s layers.","PeriodicalId":382084,"journal":{"name":"North American Chapter of the Association for Computational Linguistics","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Using Paraphrases to Study Properties of Contextual Embeddings\",\"authors\":\"Laura Burdick, Jonathan K. Kummerfeld, Rada Mihalcea\",\"doi\":\"10.48550/arXiv.2207.05553\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We use paraphrases as a unique source of data to analyze contextualized embeddings, with a particular focus on BERT. Because paraphrases naturally encode consistent word and phrase semantics, they provide a unique lens for investigating properties of embeddings. Using the Paraphrase Database’s alignments, we study words within paraphrases as well as phrase representations. We find that contextual embeddings effectively handle polysemous words, but give synonyms surprisingly different representations in many cases. We confirm previous findings that BERT is sensitive to word order, but find slightly different patterns than prior work in terms of the level of contextualization across BERT’s layers.\",\"PeriodicalId\":382084,\"journal\":{\"name\":\"North American Chapter of the Association for Computational Linguistics\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-07-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"North American Chapter of the Association for Computational Linguistics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.48550/arXiv.2207.05553\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"North American Chapter of the Association for Computational Linguistics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2207.05553","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

我们使用释义作为独特的数据来源来分析上下文嵌入，特别关注BERT。由于释义自然地编码一致的单词和短语语义，因此它们为研究嵌入的属性提供了独特的视角。使用释义数据库的对齐，我们研究释义中的单词以及短语表示。我们发现上下文嵌入可以有效地处理多义词，但在许多情况下会给同义词提供令人惊讶的不同表示。我们证实了之前的发现，BERT对词序很敏感，但在BERT各层的语境化水平方面，我们发现了与之前的研究略有不同的模式。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Using Paraphrases to Study Properties of Contextual Embeddings

We use paraphrases as a unique source of data to analyze contextualized embeddings, with a particular focus on BERT. Because paraphrases naturally encode consistent word and phrase semantics, they provide a unique lens for investigating properties of embeddings. Using the Paraphrase Database’s alignments, we study words within paraphrases as well as phrase representations. We find that contextual embeddings effectively handle polysemous words, but give synonyms surprisingly different representations in many cases. We confirm previous findings that BERT is sensitive to word order, but find slightly different patterns than prior work in terms of the level of contextualization across BERT’s layers.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

North American Chapter of the Association for Computational Linguistics

自引率

0.00%

发文量