在政治文本注释中替代人类专家的大型语言模型

Research & Politics Pub Date : 2024-01-01 DOI:10.1177/20531680241236239

Michael Heseltine, Bernhard Clemm von Hohenberg

{"title":"在政治文本注释中替代人类专家的大型语言模型","authors":"Michael Heseltine, Bernhard Clemm von Hohenberg","doi":"10.1177/20531680241236239","DOIUrl":null,"url":null,"abstract":"Large-scale text analysis has grown rapidly as a method in political science and beyond. To date, text-as-data methods rely on large volumes of human-annotated training examples, which place a premium on researcher resources. However, advances in large language models (LLMs) may make automated annotation increasingly viable. This paper tests the performance of GPT-4 across a range of scenarios relevant for analysis of political text. We compare GPT-4 coding with human expert coding of tweets and news articles across four variables (whether text is political, its negativity, its sentiment, and its ideology) and across four countries (the United States, Chile, Germany, and Italy). GPT-4 coding is highly accurate, especially for shorter texts such as tweets, correctly classifying texts up to 95% of the time. Performance drops for longer news articles, and very slightly for non-English text. We introduce a ‘hybrid’ coding approach, in which disagreements of multiple GPT-4 runs are adjudicated by a human expert, which boosts accuracy. Finally, we explore downstream effects, finding that transformer models trained on hand-coded or GPT-4-coded data yield almost identical outcomes. Our results suggest that LLM-assisted coding is a viable and cost-efficient approach, although consideration should be given to task complexity.","PeriodicalId":125693,"journal":{"name":"Research & Politics","volume":"41 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Large language models as a substitute for human experts in annotating political text\",\"authors\":\"Michael Heseltine, Bernhard Clemm von Hohenberg\",\"doi\":\"10.1177/20531680241236239\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Large-scale text analysis has grown rapidly as a method in political science and beyond. To date, text-as-data methods rely on large volumes of human-annotated training examples, which place a premium on researcher resources. However, advances in large language models (LLMs) may make automated annotation increasingly viable. This paper tests the performance of GPT-4 across a range of scenarios relevant for analysis of political text. We compare GPT-4 coding with human expert coding of tweets and news articles across four variables (whether text is political, its negativity, its sentiment, and its ideology) and across four countries (the United States, Chile, Germany, and Italy). GPT-4 coding is highly accurate, especially for shorter texts such as tweets, correctly classifying texts up to 95% of the time. Performance drops for longer news articles, and very slightly for non-English text. We introduce a ‘hybrid’ coding approach, in which disagreements of multiple GPT-4 runs are adjudicated by a human expert, which boosts accuracy. Finally, we explore downstream effects, finding that transformer models trained on hand-coded or GPT-4-coded data yield almost identical outcomes. Our results suggest that LLM-assisted coding is a viable and cost-efficient approach, although consideration should be given to task complexity.\",\"PeriodicalId\":125693,\"journal\":{\"name\":\"Research & Politics\",\"volume\":\"41 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Research & Politics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1177/20531680241236239\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Research & Politics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1177/20531680241236239","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

作为一种方法，大规模文本分析在政治学及其他领域发展迅速。迄今为止，文本即数据的方法依赖于大量人工标注的训练实例，这对研究人员的资源造成了极大的压力。然而，大语言模型（LLM）的进步可能会使自动注释变得越来越可行。本文测试了 GPT-4 在一系列与政治文本分析相关的场景中的性能。我们比较了 GPT-4 编码与人类专家对推文和新闻文章的编码在四个变量（文本是否具有政治性、文本的负面性、文本的情感和文本的意识形态）和四个国家（美国、智利、德国和意大利）中的表现。GPT-4 编码的准确性很高，尤其是对于推文等较短的文本，高达 95% 的时间都能正确分类。对于较长的新闻文章，性能有所下降，而对于非英语文本，性能则略有下降。我们引入了一种 "混合 "编码方法，由人工专家对多次 GPT-4 运行中出现的分歧进行裁决，从而提高了准确性。最后，我们探讨了下游效应，发现在手工编码或 GPT-4 编码数据上训练的转换器模型产生了几乎相同的结果。我们的研究结果表明，LLM 辅助编码是一种可行且具有成本效益的方法，但应考虑到任务的复杂性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Large language models as a substitute for human experts in annotating political text

Large-scale text analysis has grown rapidly as a method in political science and beyond. To date, text-as-data methods rely on large volumes of human-annotated training examples, which place a premium on researcher resources. However, advances in large language models (LLMs) may make automated annotation increasingly viable. This paper tests the performance of GPT-4 across a range of scenarios relevant for analysis of political text. We compare GPT-4 coding with human expert coding of tweets and news articles across four variables (whether text is political, its negativity, its sentiment, and its ideology) and across four countries (the United States, Chile, Germany, and Italy). GPT-4 coding is highly accurate, especially for shorter texts such as tweets, correctly classifying texts up to 95% of the time. Performance drops for longer news articles, and very slightly for non-English text. We introduce a ‘hybrid’ coding approach, in which disagreements of multiple GPT-4 runs are adjudicated by a human expert, which boosts accuracy. Finally, we explore downstream effects, finding that transformer models trained on hand-coded or GPT-4-coded data yield almost identical outcomes. Our results suggest that LLM-assisted coding is a viable and cost-efficient approach, although consideration should be given to task complexity.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Research & Politics

自引率

0.00%

发文量