Neural text embeddings in psychological research: A guide with examples in R.

IF 7.8 1区 心理学 Q1 PSYCHOLOGY, MULTIDISCIPLINARY
Louis Teitelbaum, Almog Simchon
{"title":"Neural text embeddings in psychological research: A guide with examples in R.","authors":"Louis Teitelbaum, Almog Simchon","doi":"10.1037/met0000768","DOIUrl":null,"url":null,"abstract":"<p><p>In this guide, we review neural embedding models and compare three methods of quantifying psychological constructs for use with embeddings: distributed dictionary representation, contextualized construct representation, and a novel approach: correlational anchored vectors. We aim to cultivate an intuition for the geometric properties of neural embeddings and a sensitivity to methodological problems that can arise in their use. We argue that while large language model embeddings have the advantage of contextualization, decontextualized word embeddings may have more ability to generalize across text genres when using cosine or dot product similarity metrics. The three methods of operationalizing psychological constructs in vector space likewise each have their advantages in particular applications. We recommend distributed dictionary representation, which derives a vector representation from a word list, for quantifying abstract constructs relating to the overall feel of a text, especially when the research requires that these constructs generalize across multiple genres of text. We recommend contextualized construct representation, which derives a representation from a questionnaire, for cases in which texts are relatively similar in content to the embedded questionnaire, such as experiments in which participants are asked to respond to a related prompt. Correlational anchored vectors, which derives a representation from labeled examples, requires suitably large and reliable training data. (PsycInfo Database Record (c) 2025 APA, all rights reserved).</p>","PeriodicalId":20782,"journal":{"name":"Psychological methods","volume":" ","pages":""},"PeriodicalIF":7.8000,"publicationDate":"2025-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Psychological methods","FirstCategoryId":"102","ListUrlMain":"https://doi.org/10.1037/met0000768","RegionNum":1,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PSYCHOLOGY, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

Abstract

In this guide, we review neural embedding models and compare three methods of quantifying psychological constructs for use with embeddings: distributed dictionary representation, contextualized construct representation, and a novel approach: correlational anchored vectors. We aim to cultivate an intuition for the geometric properties of neural embeddings and a sensitivity to methodological problems that can arise in their use. We argue that while large language model embeddings have the advantage of contextualization, decontextualized word embeddings may have more ability to generalize across text genres when using cosine or dot product similarity metrics. The three methods of operationalizing psychological constructs in vector space likewise each have their advantages in particular applications. We recommend distributed dictionary representation, which derives a vector representation from a word list, for quantifying abstract constructs relating to the overall feel of a text, especially when the research requires that these constructs generalize across multiple genres of text. We recommend contextualized construct representation, which derives a representation from a questionnaire, for cases in which texts are relatively similar in content to the embedded questionnaire, such as experiments in which participants are asked to respond to a related prompt. Correlational anchored vectors, which derives a representation from labeled examples, requires suitably large and reliable training data. (PsycInfo Database Record (c) 2025 APA, all rights reserved).

心理学研究中的神经文本嵌入:用R语言举例的指南。
在本指南中,我们回顾了神经嵌入模型,并比较了用于嵌入的三种量化心理结构的方法:分布式字典表示、情境化结构表示和一种新方法:相关锚定向量。我们的目标是培养对神经嵌入的几何特性的直觉,以及对在使用它们时可能出现的方法问题的敏感性。我们认为,虽然大型语言模型嵌入具有上下文化的优势,但当使用余弦或点积相似度度量时,非上下文化的词嵌入可能具有更强的跨文本类型泛化能力。在向量空间中操作心理构造的三种方法同样在特定的应用中各有其优势。我们推荐分布式词典表示,它从单词列表中派生出向量表示,用于量化与文本整体感觉相关的抽象结构,特别是当研究要求这些结构在多个文本类型中进行概括时。在文本内容与嵌入的问卷内容相对相似的情况下,例如要求参与者回答相关提示的实验中,我们建议使用情境化结构表征法,即从问卷中提取表征。相关锚定向量是一种从标记样例中获得表示的方法,它需要足够大且可靠的训练数据。(PsycInfo Database Record (c) 2025 APA,版权所有)。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Psychological methods
Psychological methods PSYCHOLOGY, MULTIDISCIPLINARY-
CiteScore
13.10
自引率
7.10%
发文量
159
期刊介绍: Psychological Methods is devoted to the development and dissemination of methods for collecting, analyzing, understanding, and interpreting psychological data. Its purpose is the dissemination of innovations in research design, measurement, methodology, and quantitative and qualitative analysis to the psychological community; its further purpose is to promote effective communication about related substantive and methodological issues. The audience is expected to be diverse and to include those who develop new procedures, those who are responsible for undergraduate and graduate training in design, measurement, and statistics, as well as those who employ those procedures in research.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信