Gonzalo Martínez, Juan Diego Molero, Sandra González, Javier Conde, Marc Brysbaert, Pedro Reviriego
{"title":"Using large language models to estimate features of multi-word expressions: Concreteness, valence, arousal.","authors":"Gonzalo Martínez, Juan Diego Molero, Sandra González, Javier Conde, Marc Brysbaert, Pedro Reviriego","doi":"10.3758/s13428-024-02515-z","DOIUrl":null,"url":null,"abstract":"<p><p>This study investigates the potential of large language models (LLMs) to provide accurate estimates of concreteness, valence, and arousal for multi-word expressions. Unlike previous artificial intelligence (AI) methods, LLMs can capture the nuanced meanings of multi-word expressions. We systematically evaluated GPT-4o's ability to predict concreteness, valence, and arousal. In Study 1, GPT-4o showed strong correlations with human concreteness ratings (r = .8) for multi-word expressions. In Study 2, these findings were repeated for valence and arousal ratings of individual words, matching or outperforming previous AI models. Studies 3-5 extended the valence and arousal analysis to multi-word expressions and showed good validity of the LLM-generated estimates for these stimuli as well. To help researchers with stimulus selection, we provide datasets with LLM-generated norms of concreteness, valence, and arousal for 126,397 English single words and 63,680 multi-word expressions.</p>","PeriodicalId":8717,"journal":{"name":"Behavior Research Methods","volume":"57 1","pages":"5"},"PeriodicalIF":4.6000,"publicationDate":"2024-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Behavior Research Methods","FirstCategoryId":"102","ListUrlMain":"https://doi.org/10.3758/s13428-024-02515-z","RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PSYCHOLOGY, EXPERIMENTAL","Score":null,"Total":0}
引用次数: 0
Abstract
This study investigates the potential of large language models (LLMs) to provide accurate estimates of concreteness, valence, and arousal for multi-word expressions. Unlike previous artificial intelligence (AI) methods, LLMs can capture the nuanced meanings of multi-word expressions. We systematically evaluated GPT-4o's ability to predict concreteness, valence, and arousal. In Study 1, GPT-4o showed strong correlations with human concreteness ratings (r = .8) for multi-word expressions. In Study 2, these findings were repeated for valence and arousal ratings of individual words, matching or outperforming previous AI models. Studies 3-5 extended the valence and arousal analysis to multi-word expressions and showed good validity of the LLM-generated estimates for these stimuli as well. To help researchers with stimulus selection, we provide datasets with LLM-generated norms of concreteness, valence, and arousal for 126,397 English single words and 63,680 multi-word expressions.
期刊介绍:
Behavior Research Methods publishes articles concerned with the methods, techniques, and instrumentation of research in experimental psychology. The journal focuses particularly on the use of computer technology in psychological research. An annual special issue is devoted to this field.