量化不同语言之间的叙事相似性

IF 6.5 2区社会学 Q1 SOCIAL SCIENCES, MATHEMATICAL METHODS

Sociological Methods & Research Pub Date : 2025-06-02 DOI:10.1177/00491241251340080

Hannah Waight, Solomon Messing, Anton Shirikov, Margaret E. Roberts, Jonathan Nagler, Jason Greenfield, Megan A. Brown, Kevin Aslett, Joshua A. Tucker

{"title":"量化不同语言之间的叙事相似性","authors":"Hannah Waight, Solomon Messing, Anton Shirikov, Margaret E. Roberts, Jonathan Nagler, Jason Greenfield, Megan A. Brown, Kevin Aslett, Joshua A. Tucker","doi":"10.1177/00491241251340080","DOIUrl":null,"url":null,"abstract":"How can one understand the spread of ideas across text data? This is a key measurement problem in sociological inquiry, from the study of how interest groups shape media discourse, to the spread of policy across institutions, to the diffusion of organizational structures and institution themselves. To study how ideas and narratives diffuse across text, we must first develop a method to identify whether texts share the same information and narratives, rather than the same broad themes or exact features. We propose a novel approach to measure this quantity of interest, which we call “narrative similarity,” by using large language models to distill texts to their core ideas and then compare the similarity of claims rather than of words, phrases, or sentences. The result is an estimand much closer to narrative similarity than what is possible with past relevant alternatives, including exact text reuse, which returns lexically similar documents; topic modeling, which returns topically similar documents; or an array of alternative approaches. We devise an approach to providing out-of-sample measures of performance (precision, recall, F1) and show that our approach outperforms relevant alternatives by a large margin. We apply our approach to an important case study: The spread of Russian claims about the development of a Ukrainian bioweapons program in U.S. mainstream and fringe news websites. While we focus on news in this application, our approach can be applied more broadly to the study of propaganda, misinformation, diffusion of policy and cultural objects, among other topics.","PeriodicalId":21849,"journal":{"name":"Sociological Methods & Research","volume":"62 1","pages":""},"PeriodicalIF":6.5000,"publicationDate":"2025-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Quantifying Narrative Similarity Across Languages\",\"authors\":\"Hannah Waight, Solomon Messing, Anton Shirikov, Margaret E. Roberts, Jonathan Nagler, Jason Greenfield, Megan A. Brown, Kevin Aslett, Joshua A. Tucker\",\"doi\":\"10.1177/00491241251340080\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"How can one understand the spread of ideas across text data? This is a key measurement problem in sociological inquiry, from the study of how interest groups shape media discourse, to the spread of policy across institutions, to the diffusion of organizational structures and institution themselves. To study how ideas and narratives diffuse across text, we must first develop a method to identify whether texts share the same information and narratives, rather than the same broad themes or exact features. We propose a novel approach to measure this quantity of interest, which we call “narrative similarity,” by using large language models to distill texts to their core ideas and then compare the similarity of claims rather than of words, phrases, or sentences. The result is an estimand much closer to narrative similarity than what is possible with past relevant alternatives, including exact text reuse, which returns lexically similar documents; topic modeling, which returns topically similar documents; or an array of alternative approaches. We devise an approach to providing out-of-sample measures of performance (precision, recall, F1) and show that our approach outperforms relevant alternatives by a large margin. We apply our approach to an important case study: The spread of Russian claims about the development of a Ukrainian bioweapons program in U.S. mainstream and fringe news websites. While we focus on news in this application, our approach can be applied more broadly to the study of propaganda, misinformation, diffusion of policy and cultural objects, among other topics.\",\"PeriodicalId\":21849,\"journal\":{\"name\":\"Sociological Methods & Research\",\"volume\":\"62 1\",\"pages\":\"\"},\"PeriodicalIF\":6.5000,\"publicationDate\":\"2025-06-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Sociological Methods & Research\",\"FirstCategoryId\":\"90\",\"ListUrlMain\":\"https://doi.org/10.1177/00491241251340080\",\"RegionNum\":2,\"RegionCategory\":\"社会学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"SOCIAL SCIENCES, MATHEMATICAL METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Sociological Methods & Research","FirstCategoryId":"90","ListUrlMain":"https://doi.org/10.1177/00491241251340080","RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"SOCIAL SCIENCES, MATHEMATICAL METHODS","Score":null,"Total":0}

引用次数: 0

摘要

人们如何理解思想在文本数据中的传播？这是社会学研究中的一个关键测量问题，从研究利益集团如何塑造媒体话语，到跨机构政策的传播，再到组织结构和机构本身的扩散。为了研究思想和叙事是如何在文本中传播的，我们必须首先开发一种方法来确定文本是否共享相同的信息和叙事，而不是相同的广泛主题或确切特征。我们提出了一种新的方法来衡量这种兴趣量，我们称之为“叙事相似性”，通过使用大型语言模型提取文本的核心思想，然后比较主张的相似性，而不是单词，短语或句子的相似性。与过去的相关替代方案相比，结果是一个更接近于叙事相似性的估计，包括精确的文本重用，它返回词汇相似的文档；主题建模，返回主题相似的文档；或者一系列的替代方法。我们设计了一种方法来提供样本外的性能度量（精度、召回率、F1），并表明我们的方法在很大程度上优于相关的替代方法。我们将我们的方法应用于一个重要的案例研究：俄罗斯关于乌克兰生物武器计划发展的说法在美国主流和边缘新闻网站上的传播。虽然我们在这个应用程序中关注的是新闻，但我们的方法可以更广泛地应用于研究宣传、错误信息、政策传播和文化对象等主题。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Quantifying Narrative Similarity Across Languages

How can one understand the spread of ideas across text data? This is a key measurement problem in sociological inquiry, from the study of how interest groups shape media discourse, to the spread of policy across institutions, to the diffusion of organizational structures and institution themselves. To study how ideas and narratives diffuse across text, we must first develop a method to identify whether texts share the same information and narratives, rather than the same broad themes or exact features. We propose a novel approach to measure this quantity of interest, which we call “narrative similarity,” by using large language models to distill texts to their core ideas and then compare the similarity of claims rather than of words, phrases, or sentences. The result is an estimand much closer to narrative similarity than what is possible with past relevant alternatives, including exact text reuse, which returns lexically similar documents; topic modeling, which returns topically similar documents; or an array of alternative approaches. We devise an approach to providing out-of-sample measures of performance (precision, recall, F1) and show that our approach outperforms relevant alternatives by a large margin. We apply our approach to an important case study: The spread of Russian claims about the development of a Ukrainian bioweapons program in U.S. mainstream and fringe news websites. While we focus on news in this application, our approach can be applied more broadly to the study of propaganda, misinformation, diffusion of policy and cultural objects, among other topics.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Sociological Methods & Research Multiple-

CiteScore

16.30

自引率

3.20%

发文量

期刊介绍： Sociological Methods & Research is a quarterly journal devoted to sociology as a cumulative empirical science. The objectives of SMR are multiple, but emphasis is placed on articles that advance the understanding of the field through systematic presentations that clarify methodological problems and assist in ordering the known facts in an area. Review articles will be published, particularly those that emphasize a critical analysis of the status of the arts, but original presentations that are broadly based and provide new research will also be published. Intrinsically, SMR is viewed as substantive journal but one that is highly focused on the assessment of the scientific status of sociology. The scope is broad and flexible, and authors are invited to correspond with the editors about the appropriateness of their articles.