{"title":"The principal components of meaning, revisited.","authors":"Chris Westbury, Michelle Yang, Kris Anderson","doi":"10.3758/s13423-024-02551-y","DOIUrl":null,"url":null,"abstract":"<p><p>Osgood, Suci, and Tannebaum were the first to attempt to identify the principal components of semantics using dimensional reduction of a high-dimensional model of semantics constructed from human judgments of word relatedness. Modern word-embedding models analyze patterns of words to construct higher dimensional models of semantics that can be similarly subjected to dimensional reduction. Hollis and Westbury characterized the first eight principal components (PCs) of a word-embedding model by correlating them with several well-known lexical measures, such as logged word frequency, age of acquisition, valence, arousal, dominance, and concreteness. The results show some clear differentiation of interpretation between the PCs. Here, we extend this work by analyzing a larger word-embedding matrix using semantic measures initially derived from subjective inspection of the PCs. We then use quantitative analysis to confirm the utility of these subjective measures for predicting PC values and cross-validate them on two word-embedding matrices developed on distinct corpora. Several semantic and word class measures are strongly predictive of early PC values, including first-person and second-person verbs, personal relevance of abstract and concrete words, affect terms, and names of places and people. The predictors of the lowest magnitude PCs generalized well to word-embedding matrices constructed from separate corpora, including matrices constructed using different word-embedding methods. The predictive categories we describe are consistent with Wittgenstein's argument that an autonomous level of social interaction grounds linguistic meaning.</p>","PeriodicalId":20763,"journal":{"name":"Psychonomic Bulletin & Review","volume":null,"pages":null},"PeriodicalIF":3.2000,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Psychonomic Bulletin & Review","FirstCategoryId":"102","ListUrlMain":"https://doi.org/10.3758/s13423-024-02551-y","RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PSYCHOLOGY, EXPERIMENTAL","Score":null,"Total":0}
引用次数: 0
Abstract
Osgood, Suci, and Tannebaum were the first to attempt to identify the principal components of semantics using dimensional reduction of a high-dimensional model of semantics constructed from human judgments of word relatedness. Modern word-embedding models analyze patterns of words to construct higher dimensional models of semantics that can be similarly subjected to dimensional reduction. Hollis and Westbury characterized the first eight principal components (PCs) of a word-embedding model by correlating them with several well-known lexical measures, such as logged word frequency, age of acquisition, valence, arousal, dominance, and concreteness. The results show some clear differentiation of interpretation between the PCs. Here, we extend this work by analyzing a larger word-embedding matrix using semantic measures initially derived from subjective inspection of the PCs. We then use quantitative analysis to confirm the utility of these subjective measures for predicting PC values and cross-validate them on two word-embedding matrices developed on distinct corpora. Several semantic and word class measures are strongly predictive of early PC values, including first-person and second-person verbs, personal relevance of abstract and concrete words, affect terms, and names of places and people. The predictors of the lowest magnitude PCs generalized well to word-embedding matrices constructed from separate corpora, including matrices constructed using different word-embedding methods. The predictive categories we describe are consistent with Wittgenstein's argument that an autonomous level of social interaction grounds linguistic meaning.
Osgood、Suci 和 Tannebaum 是第一个尝试通过对人类对词语相关性的判断所构建的高维语义模型进行降维处理来识别语义主成分的人。现代词嵌入模型通过分析词的模式来构建语义的高维模型,这些模型同样可以进行降维处理。霍利斯和韦斯特伯里通过将词嵌入模型的前八个主成分(PCs)与几种著名的词汇测量方法(如记录词频、习得年龄、价值、唤醒、支配和具体性)相关联,确定了它们的特征。结果表明 PC 之间的解释有明显的区别。在此,我们利用最初从 PC 的主观检查中得出的语义测量结果,对更大的词嵌入矩阵进行分析,从而扩展了这项工作。然后,我们使用定量分析来确认这些主观测量值对预测 PC 值的实用性,并在两个基于不同语料库开发的词语嵌入矩阵上对它们进行交叉验证。一些语义和词类指标对早期 PC 值具有很强的预测作用,包括第一人称和第二人称动词、抽象和具体词语的个人相关性、情感术语以及地名和人名。最低 PC 值的预测因子对由不同语料库构建的词嵌入矩阵(包括使用不同词嵌入方法构建的矩阵)具有很好的通用性。我们描述的预测类别与维特根斯坦的论点一致,即社会互动的自主水平是语言意义的基础。
期刊介绍:
The journal provides coverage spanning a broad spectrum of topics in all areas of experimental psychology. The journal is primarily dedicated to the publication of theory and review articles and brief reports of outstanding experimental work. Areas of coverage include cognitive psychology broadly construed, including but not limited to action, perception, & attention, language, learning & memory, reasoning & decision making, and social cognition. We welcome submissions that approach these issues from a variety of perspectives such as behavioral measurements, comparative psychology, development, evolutionary psychology, genetics, neuroscience, and quantitative/computational modeling. We particularly encourage integrative research that crosses traditional content and methodological boundaries.