Do AIs know what the most important issue is? Using language models to code open-text social survey responses at scale

Jonathan Mellon, J. Bailey, Ralph Scott, James Breckwoldt, Marta Miori, Phillip Schmedeman
{"title":"Do AIs know what the most important issue is? Using language models to code open-text social survey responses at scale","authors":"Jonathan Mellon, J. Bailey, Ralph Scott, James Breckwoldt, Marta Miori, Phillip Schmedeman","doi":"10.1177/20531680241231468","DOIUrl":null,"url":null,"abstract":"Can artificial intelligence accurately label open-text survey responses? We compare the accuracy of six large language models (LLMs) using a few-shot approach, three supervised learning algorithms (SVM, DistilRoBERTa, and a neural network trained on BERT embeddings), and a second human coder on the task of categorizing “most important issue” responses from the British Election Study Internet Panel into 50 categories. For the scenario where a researcher lacks existing training data, the accuracy of the highest-performing LLM (Claude-1.3: 93.9%) neared human performance (94.7%) and exceeded the highest-performing supervised approach trained on 1000 randomly sampled cases (neural network: 93.5%). In a scenario where previous data has been labeled but a researcher wants to label novel text, the best LLM’s (Claude-1.3: 80.9%) few-shot performance is only slightly behind the human (88.6%) and exceeds the best supervised model trained on 576,000 cases (DistilRoBERTa: 77.8%). PaLM-2, Llama-2, and the SVM all performed substantially worse than the best LLMs and supervised models across all metrics and scenarios. Our results suggest that LLMs may allow for greater use of open-ended survey questions in the future.","PeriodicalId":125693,"journal":{"name":"Research & Politics","volume":"214 4","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Research & Politics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1177/20531680241231468","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Can artificial intelligence accurately label open-text survey responses? We compare the accuracy of six large language models (LLMs) using a few-shot approach, three supervised learning algorithms (SVM, DistilRoBERTa, and a neural network trained on BERT embeddings), and a second human coder on the task of categorizing “most important issue” responses from the British Election Study Internet Panel into 50 categories. For the scenario where a researcher lacks existing training data, the accuracy of the highest-performing LLM (Claude-1.3: 93.9%) neared human performance (94.7%) and exceeded the highest-performing supervised approach trained on 1000 randomly sampled cases (neural network: 93.5%). In a scenario where previous data has been labeled but a researcher wants to label novel text, the best LLM’s (Claude-1.3: 80.9%) few-shot performance is only slightly behind the human (88.6%) and exceeds the best supervised model trained on 576,000 cases (DistilRoBERTa: 77.8%). PaLM-2, Llama-2, and the SVM all performed substantially worse than the best LLMs and supervised models across all metrics and scenarios. Our results suggest that LLMs may allow for greater use of open-ended survey questions in the future.
人工智能知道什么是最重要的问题吗?使用语言模型对开放文本社会调查回复进行大规模编码
人工智能能否准确标注开放文本调查回复?在将英国大选研究互联网小组的 "最重要问题 "回复分为 50 个类别的任务中,我们比较了六种大型语言模型(LLM)的准确性,这六种模型分别采用了少数几次拍摄方法、三种监督学习算法(SVM、DistilRoBERTa 和基于 BERT 内嵌训练的神经网络),以及第二名人工编码员的准确性。在研究人员缺乏现有训练数据的情况下,性能最高的 LLM(Claude-1.3:93.9%)的准确率接近人类(94.7%),并超过了在 1000 个随机抽样案例上训练的性能最高的监督方法(神经网络:93.5%)。在已对先前数据进行标注,但研究人员希望对新文本进行标注的情况下,最佳 LLM(Claude-1.3:80.9%)的少量标注性能仅略微落后于人类(88.6%),并超过了在 576,000 个案例上训练的最佳监督模型(DistilRoBERTa:77.8%)。在所有指标和场景中,PaLM-2、Llama-2 和 SVM 的表现都大大低于最佳 LLM 和监督模型。我们的研究结果表明,LLMs 可以在未来更多地使用开放式调查问题。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信