Using AI to Summarize US Presidential Campaign TV Advertisement Videos, 1952-2012.

IF 6.9 2区综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES

Scientific Data Pub Date : 2025-09-24 DOI:10.1038/s41597-025-05558-9

Adam Breuer, Bryce J Dietrich, Michael H Crespin, Matthew Butler, J A Pryse, Kosuke Imai

{"title":"Using AI to Summarize US Presidential Campaign TV Advertisement Videos, 1952-2012.","authors":"Adam Breuer, Bryce J Dietrich, Michael H Crespin, Matthew Butler, J A Pryse, Kosuke Imai","doi":"10.1038/s41597-025-05558-9","DOIUrl":null,"url":null,"abstract":"<p><p>This paper introduces the largest and most comprehensive dataset of US presidential campaign television advertisements, available in digital format. The dataset also includes machine-searchable transcripts and high-quality summaries designed to facilitate a variety of academic research. To date, there has been great interest in collecting and analyzing US presidential campaign advertisements, but the need for manual procurement and annotation has led many to rely on smaller subsets. We design a large-scale, parallelized, AI-based analysis pipeline that automates the laborious process of preparing, transcribing, storyboarding, and summarizing videos. We then apply this methodology to the 9,707 presidential ads from the Julian P. Kanter Political Commercial Archive. We conduct extensive human evaluations to show that these transcripts and summaries match the quality of manually generated alternatives. We illustrate the value of this data by including an application that tracks the genesis and evolution of current focal issue areas over seven decades of presidential elections. Our analysis pipeline and codebase also show how to use LLM-based tools to obtain high-quality summaries for other video datasets.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"12 1","pages":"1552"},"PeriodicalIF":6.9000,"publicationDate":"2025-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12460618/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Scientific Data","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.1038/s41597-025-05558-9","RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}

引用次数: 0

Abstract

This paper introduces the largest and most comprehensive dataset of US presidential campaign television advertisements, available in digital format. The dataset also includes machine-searchable transcripts and high-quality summaries designed to facilitate a variety of academic research. To date, there has been great interest in collecting and analyzing US presidential campaign advertisements, but the need for manual procurement and annotation has led many to rely on smaller subsets. We design a large-scale, parallelized, AI-based analysis pipeline that automates the laborious process of preparing, transcribing, storyboarding, and summarizing videos. We then apply this methodology to the 9,707 presidential ads from the Julian P. Kanter Political Commercial Archive. We conduct extensive human evaluations to show that these transcripts and summaries match the quality of manually generated alternatives. We illustrate the value of this data by including an application that tracks the genesis and evolution of current focal issue areas over seven decades of presidential elections. Our analysis pipeline and codebase also show how to use LLM-based tools to obtain high-quality summaries for other video datasets.

Abstract Image

查看原文本刊更多论文

用人工智能总结1952-2012年美国总统竞选电视广告视频。

本文介绍了美国总统竞选电视广告的最大和最全面的数据集，可在数字格式。该数据集还包括机器可搜索的成绩单和高质量的摘要，旨在促进各种学术研究。迄今为止，人们对收集和分析美国总统竞选广告非常感兴趣，但由于需要人工采购和注释，许多人依赖于较小的子集。我们设计了一个大规模的、并行的、基于人工智能的分析管道，它自动化了准备、转录、故事板和总结视频的费力过程。然后，我们将这种方法应用于朱利安·p·坎特政治商业档案馆的9707个总统广告。我们进行了广泛的人工评估，以显示这些转录本和摘要与手动生成的替代方案的质量相匹配。我们通过包含一个应用程序来说明这些数据的价值，该应用程序跟踪了70多年来总统选举中当前焦点问题领域的起源和演变。我们的分析管道和代码库还展示了如何使用基于llm的工具来获取其他视频数据集的高质量摘要。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Scientific Data Social Sciences-Education

CiteScore

11.20

自引率

4.10%

发文量

689

审稿时长

16 weeks

期刊介绍： Scientific Data is an open-access journal focused on data, publishing descriptions of research datasets and articles on data sharing across natural sciences, medicine, engineering, and social sciences. Its goal is to enhance the sharing and reuse of scientific data, encourage broader data sharing, and acknowledge those who share their data. The journal primarily publishes Data Descriptors, which offer detailed descriptions of research datasets, including data collection methods and technical analyses validating data quality. These descriptors aim to facilitate data reuse rather than testing hypotheses or presenting new interpretations, methods, or in-depth analyses.