LayerFlow: Layer-wise Exploration of LLM Embeddings using Uncertainty-aware Interlinked Projections

IF 2.9 4区计算机科学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Computer Graphics Forum Pub Date : 2025-05-23 DOI:10.1111/cgf.70123

Rita Sevastjanova, Robin Gerling, Thilo Spinner, Mennatallah El-Assady

{"title":"LayerFlow: Layer-wise Exploration of LLM Embeddings using Uncertainty-aware Interlinked Projections","authors":"Rita Sevastjanova, Robin Gerling, Thilo Spinner, Mennatallah El-Assady","doi":"10.1111/cgf.70123","DOIUrl":null,"url":null,"abstract":"<p>Large language models (LLMs) represent words through contextual word embeddings encoding different language properties like semantics and syntax. Understanding these properties is crucial, especially for researchers investigating language model capabilities, employing embeddings for tasks related to text similarity, or evaluating the reasons behind token importance as measured through attribution methods. Applications for embedding exploration frequently involve dimensionality reduction techniques, which reduce high-dimensional vectors to two dimensions used as coordinates in a scatterplot. This data transformation step introduces uncertainty that can be propagated to the visual representation and influence users' interpretation of the data. To communicate such uncertainties, we present <b>LayerFlow</b> – a visual analytics workspace that displays embeddings in an interlinked projection design and communicates the transformation, representation, and interpretation uncertainty. In particular, to hint at potential data distortions and uncertainties, the workspace includes several visual components, such as convex hulls showing 2D and HD clusters, data point pairwise distances, cluster summaries, and projection quality metrics. We show the usability of the presented workspace through replication and expert case studies that highlight the need to communicate uncertainty through multiple visual components and different data perspectives.</p>","PeriodicalId":10687,"journal":{"name":"Computer Graphics Forum","volume":"44 3","pages":""},"PeriodicalIF":2.9000,"publicationDate":"2025-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Graphics Forum","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/cgf.70123","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

Abstract

Large language models (LLMs) represent words through contextual word embeddings encoding different language properties like semantics and syntax. Understanding these properties is crucial, especially for researchers investigating language model capabilities, employing embeddings for tasks related to text similarity, or evaluating the reasons behind token importance as measured through attribution methods. Applications for embedding exploration frequently involve dimensionality reduction techniques, which reduce high-dimensional vectors to two dimensions used as coordinates in a scatterplot. This data transformation step introduces uncertainty that can be propagated to the visual representation and influence users' interpretation of the data. To communicate such uncertainties, we present LayerFlow – a visual analytics workspace that displays embeddings in an interlinked projection design and communicates the transformation, representation, and interpretation uncertainty. In particular, to hint at potential data distortions and uncertainties, the workspace includes several visual components, such as convex hulls showing 2D and HD clusters, data point pairwise distances, cluster summaries, and projection quality metrics. We show the usability of the presented workspace through replication and expert case studies that highlight the need to communicate uncertainty through multiple visual components and different data perspectives.

查看原文本刊更多论文

LayerFlow：使用不确定性感知的相互关联投影对LLM嵌入的分层探索

大型语言模型（llm）通过对不同的语言属性（如语义和语法）进行编码的上下文词嵌入来表示单词。理解这些属性是至关重要的，特别是对于研究语言模型功能的研究人员，在与文本相似度相关的任务中使用嵌入，或者通过归因方法评估令牌重要性背后的原因。嵌入勘探的应用经常涉及降维技术，将高维向量降维为二维，用作散点图中的坐标。这个数据转换步骤引入了不确定性，这种不确定性可以传播到可视化表示中，并影响用户对数据的解释。为了传达这样的不确定性，我们提出了LayerFlow——一个可视化的分析工作空间，它在一个相互关联的投影设计中显示嵌入，并传达转换、表示和解释的不确定性。特别是，为了暗示潜在的数据失真和不确定性，工作空间包括几个视觉组件，例如显示2D和HD集群的凸壳、数据点成对距离、集群摘要和投影质量指标。我们通过复制和专家案例研究展示了所呈现的工作空间的可用性，这些案例研究强调了通过多个可视化组件和不同数据透视图传达不确定性的必要性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Computer Graphics Forum 工程技术-计算机：软件工程

CiteScore

5.80

自引率

12.00%

发文量

175

审稿时长

3-6 weeks

期刊介绍： Computer Graphics Forum is the official journal of Eurographics, published in cooperation with Wiley-Blackwell, and is a unique, international source of information for computer graphics professionals interested in graphics developments worldwide. It is now one of the leading journals for researchers, developers and users of computer graphics in both commercial and academic environments. The journal reports on the latest developments in the field throughout the world and covers all aspects of the theory, practice and application of computer graphics.