Is Cosine-Similarity of Embeddings Really About Similarity?

ArXiv Pub Date : 2024-03-08 DOI:10.1145/3589335.3651526

Harald Steck, Chaitanya Ekanadham, Nathan Kallus

{"title":"Is Cosine-Similarity of Embeddings Really About Similarity?","authors":"Harald Steck, Chaitanya Ekanadham, Nathan Kallus","doi":"10.1145/3589335.3651526","DOIUrl":null,"url":null,"abstract":"Cosine-similarity is the cosine of the angle between two vectors, or equivalently the dot product between their normalizations. A popular application is to quantify semantic similarity between high-dimensional objects by applying cosine-similarity to a learned low-dimensional feature embedding. This can work better but sometimes also worse than the unnormalized dot-product between embedded vectors in practice. To gain insight into this empirical observation, we study embeddings derived from regularized linear models, where closed-form solutions facilitate analytical insights. We derive analytically how cosine-similarity can yield arbitrary and therefore meaningless `similarities.' For some linear models the similarities are not even unique, while for others they are implicitly controlled by the regularization. We discuss implications beyond linear models: a combination of different regularizations are employed when learning deep models; these have implicit and unintended effects when taking cosine-similarities of the resulting embeddings, rendering results opaque and possibly arbitrary. Based on these insights, we caution against blindly using cosine-similarity and outline alternatives.","PeriodicalId":513202,"journal":{"name":"ArXiv","volume":"5 24","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ArXiv","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3589335.3651526","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Cosine-similarity is the cosine of the angle between two vectors, or equivalently the dot product between their normalizations. A popular application is to quantify semantic similarity between high-dimensional objects by applying cosine-similarity to a learned low-dimensional feature embedding. This can work better but sometimes also worse than the unnormalized dot-product between embedded vectors in practice. To gain insight into this empirical observation, we study embeddings derived from regularized linear models, where closed-form solutions facilitate analytical insights. We derive analytically how cosine-similarity can yield arbitrary and therefore meaningless `similarities.' For some linear models the similarities are not even unique, while for others they are implicitly controlled by the regularization. We discuss implications beyond linear models: a combination of different regularizations are employed when learning deep models; these have implicit and unintended effects when taking cosine-similarities of the resulting embeddings, rendering results opaque and possibly arbitrary. Based on these insights, we caution against blindly using cosine-similarity and outline alternatives.

查看原文本刊更多论文

嵌入的余弦相似性真的与相似性有关吗？

余弦相似度是两个向量之间角度的余弦值，或者说是两个向量归一化之间的点积。一种流行的应用是通过将余弦相似度应用于学习到的低维特征嵌入来量化高维对象之间的语义相似性。在实践中，这可能比嵌入向量之间的非归一化点积效果更好，但有时也会更糟。为了深入了解这一经验观察结果，我们研究了由正则化线性模型推导出的嵌入，其中的闭式解法有助于分析。我们通过分析推导出余弦相似性如何产生任意的、因此毫无意义的 "相似性"。对于某些线性模型，相似性甚至不是唯一的，而对于其他模型，相似性则受正则化的隐性控制。我们讨论了线性模型之外的影响：在学习深度模型时，我们采用了不同的正则化组合；在计算所得到的嵌入的余弦相似度时，这些正则化组合会产生隐含的、意想不到的影响，使结果变得不透明，甚至可能是任意的。基于这些见解，我们提醒大家不要盲目使用余弦相似度，并概述了替代方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ArXiv

自引率

0.00%

发文量