大型语言模型和人群注释对政治社交媒体信息准确内容分析的功效

IF 3 2区社会学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Social Science Computer Review Pub Date : 2025-05-02 DOI:10.1177/08944393251334977

Jennifer Stromer-Galley, Brian McKernan, Saklain Zaman, Chinmay Maganur, Sampada Regmi

{"title":"大型语言模型和人群注释对政治社交媒体信息准确内容分析的功效","authors":"Jennifer Stromer-Galley, Brian McKernan, Saklain Zaman, Chinmay Maganur, Sampada Regmi","doi":"10.1177/08944393251334977","DOIUrl":null,"url":null,"abstract":"Systematic content analysis of messaging has been a staple method in the study of communication. While computer-assisted content analysis has been used in the field for three decades, advances in machine learning and crowd-based annotation combined with the ease of collecting volumes of text-based communication via social media have made the opportunities for classification of messages easier and faster. The greatest advancement yet might be in the form of general intelligence large language models (LLMs), which are ostensibly able to accurately and reliably classify messages by leveraging context to disambiguate meaning. It is unclear, however, how effective LLMs are in deploying the method of content analysis. In this study, we compare the classification of political candidate social media messages between trained annotators, crowd annotators, and large language models from Open AI accessed through the free Web (ChatGPT) and the paid API (GPT API) on five different categories of political communication commonly used in the literature. We find that crowd annotation generally had higher F1 scores than ChatGPT and an earlier version of the GPT API, although the newest version, GPT-4 API, demonstrated good performance as compared with the crowd and with ground truth data derived from trained student annotators. This study suggests the application of any LLM to an annotation task requires validation, and that freely available and older LLM models may not be effective for studying human communication.","PeriodicalId":49509,"journal":{"name":"Social Science Computer Review","volume":"43 1","pages":""},"PeriodicalIF":3.0000,"publicationDate":"2025-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"The Efficacy of Large Language Models and Crowd Annotation for Accurate Content Analysis of Political Social Media Messages\",\"authors\":\"Jennifer Stromer-Galley, Brian McKernan, Saklain Zaman, Chinmay Maganur, Sampada Regmi\",\"doi\":\"10.1177/08944393251334977\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Systematic content analysis of messaging has been a staple method in the study of communication. While computer-assisted content analysis has been used in the field for three decades, advances in machine learning and crowd-based annotation combined with the ease of collecting volumes of text-based communication via social media have made the opportunities for classification of messages easier and faster. The greatest advancement yet might be in the form of general intelligence large language models (LLMs), which are ostensibly able to accurately and reliably classify messages by leveraging context to disambiguate meaning. It is unclear, however, how effective LLMs are in deploying the method of content analysis. In this study, we compare the classification of political candidate social media messages between trained annotators, crowd annotators, and large language models from Open AI accessed through the free Web (ChatGPT) and the paid API (GPT API) on five different categories of political communication commonly used in the literature. We find that crowd annotation generally had higher F1 scores than ChatGPT and an earlier version of the GPT API, although the newest version, GPT-4 API, demonstrated good performance as compared with the crowd and with ground truth data derived from trained student annotators. This study suggests the application of any LLM to an annotation task requires validation, and that freely available and older LLM models may not be effective for studying human communication.\",\"PeriodicalId\":49509,\"journal\":{\"name\":\"Social Science Computer Review\",\"volume\":\"43 1\",\"pages\":\"\"},\"PeriodicalIF\":3.0000,\"publicationDate\":\"2025-05-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Social Science Computer Review\",\"FirstCategoryId\":\"90\",\"ListUrlMain\":\"https://doi.org/10.1177/08944393251334977\",\"RegionNum\":2,\"RegionCategory\":\"社会学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Social Science Computer Review","FirstCategoryId":"90","ListUrlMain":"https://doi.org/10.1177/08944393251334977","RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

摘要

系统的信息内容分析一直是传播学研究的主要方法。虽然计算机辅助内容分析已经在该领域使用了三十年，但机器学习和基于人群的注释的进步，加上通过社交媒体收集大量基于文本的通信，使得信息分类变得更加容易和快速。迄今为止最大的进步可能是通用智能大型语言模型（llm）的形式，它表面上能够通过利用上下文来消除歧义来准确可靠地分类消息。然而，目前尚不清楚法学硕士在部署内容分析方法方面有多有效。在这项研究中，我们比较了经过训练的注释者、人群注释者以及通过免费网络（ChatGPT）和付费API （GPT API）访问的Open AI大型语言模型对政治候选人社交媒体消息的分类，这些模型在文献中常用的五种不同的政治传播类别上。我们发现群体注释通常比ChatGPT和早期版本的GPT API具有更高的F1分数，尽管最新版本的GPT-4 API与群体和来自训练有素的学生注释者的真实数据相比表现出良好的性能。这项研究表明，将任何LLM应用于注释任务都需要验证，并且免费提供的旧LLM模型可能无法有效地研究人类交流。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

The Efficacy of Large Language Models and Crowd Annotation for Accurate Content Analysis of Political Social Media Messages

Systematic content analysis of messaging has been a staple method in the study of communication. While computer-assisted content analysis has been used in the field for three decades, advances in machine learning and crowd-based annotation combined with the ease of collecting volumes of text-based communication via social media have made the opportunities for classification of messages easier and faster. The greatest advancement yet might be in the form of general intelligence large language models (LLMs), which are ostensibly able to accurately and reliably classify messages by leveraging context to disambiguate meaning. It is unclear, however, how effective LLMs are in deploying the method of content analysis. In this study, we compare the classification of political candidate social media messages between trained annotators, crowd annotators, and large language models from Open AI accessed through the free Web (ChatGPT) and the paid API (GPT API) on five different categories of political communication commonly used in the literature. We find that crowd annotation generally had higher F1 scores than ChatGPT and an earlier version of the GPT API, although the newest version, GPT-4 API, demonstrated good performance as compared with the crowd and with ground truth data derived from trained student annotators. This study suggests the application of any LLM to an annotation task requires validation, and that freely available and older LLM models may not be effective for studying human communication.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Social Science Computer Review 社会科学-计算机：跨学科应用

CiteScore

9.00

自引率

4.90%

发文量

审稿时长

>12 weeks

期刊介绍： Unique Scope Social Science Computer Review is an interdisciplinary journal covering social science instructional and research applications of computing, as well as societal impacts of informational technology. Topics included: artificial intelligence, business, computational social science theory, computer-assisted survey research, computer-based qualitative analysis, computer simulation, economic modeling, electronic modeling, electronic publishing, geographic information systems, instrumentation and research tools, public administration, social impacts of computing and telecommunications, software evaluation, world-wide web resources for social scientists. Interdisciplinary Nature Because the Uses and impacts of computing are interdisciplinary, so is Social Science Computer Review. The journal is of direct relevance to scholars and scientists in a wide variety of disciplines. In its pages you''ll find work in the following areas: sociology, anthropology, political science, economics, psychology, computer literacy, computer applications, and methodology.