This Sample Seems to Be Good Enough! Assessing Coverage and Temporal Reliability of Twitter’s Academic API

Proceedings of the International AAAI Conference on Web and Social Media Pub Date : 2023-06-02 DOI:10.1609/icwsm.v17i1.22182

Jürgen Pfeffer, Angelina Mooseder, Jana Lasser, Luca Hammer, Oliver Stritzel, David Garcia

{"title":"This Sample Seems to Be Good Enough! Assessing Coverage and Temporal Reliability of Twitter’s Academic API","authors":"Jürgen Pfeffer, Angelina Mooseder, Jana Lasser, Luca Hammer, Oliver Stritzel, David Garcia","doi":"10.1609/icwsm.v17i1.22182","DOIUrl":null,"url":null,"abstract":"Because of its willingness to share data with academia and industry, Twitter has been the primary social media platform for scientific research as well as for consulting businesses and governments in the last decade. In recent years, a series of publications have studied and criticized Twitter's APIs and Twitter has partially adapted its existing data streams. The newest Twitter API for Academic Research allows to \"access Twitter's real-time and historical public data with additional features and functionality that support collecting more precise, complete, and unbiased datasets. The main new feature of this API is the possibility of accessing the full archive of all historic Tweets. In this article, we will take a closer look at the Academic API and will try to answer two questions. First, are the datasets collected with the Academic API complete? Secondly, since Twitter's Academic API delivers historic Tweets as represented on Twitter at the time of data collection, we need to understand how much data is lost over time due to Tweet and account removal from the platform. Our work shows evidence that Twitter's Academic API can indeed create (almost) complete samples of Twitter data based on a wide variety of search terms. We also provide evidence that Twitter's data endpoint v2 delivers better samples than the previously used endpoint v1.1. Furthermore, collecting Tweets with the Academic API at the time of studying a phenomenon rather than creating local archives of stored Tweets, allows for a straightforward way of following Twitter's developer agreement. Finally, we will also discuss technical artifacts and implications of the Academic API. We hope that our work can add another layer of understanding of Twitter data collections leading to more reliable studies of human behavior via social media data.","PeriodicalId":338112,"journal":{"name":"Proceedings of the International AAAI Conference on Web and Social Media","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the International AAAI Conference on Web and Social Media","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1609/icwsm.v17i1.22182","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Because of its willingness to share data with academia and industry, Twitter has been the primary social media platform for scientific research as well as for consulting businesses and governments in the last decade. In recent years, a series of publications have studied and criticized Twitter's APIs and Twitter has partially adapted its existing data streams. The newest Twitter API for Academic Research allows to "access Twitter's real-time and historical public data with additional features and functionality that support collecting more precise, complete, and unbiased datasets. The main new feature of this API is the possibility of accessing the full archive of all historic Tweets. In this article, we will take a closer look at the Academic API and will try to answer two questions. First, are the datasets collected with the Academic API complete? Secondly, since Twitter's Academic API delivers historic Tweets as represented on Twitter at the time of data collection, we need to understand how much data is lost over time due to Tweet and account removal from the platform. Our work shows evidence that Twitter's Academic API can indeed create (almost) complete samples of Twitter data based on a wide variety of search terms. We also provide evidence that Twitter's data endpoint v2 delivers better samples than the previously used endpoint v1.1. Furthermore, collecting Tweets with the Academic API at the time of studying a phenomenon rather than creating local archives of stored Tweets, allows for a straightforward way of following Twitter's developer agreement. Finally, we will also discuss technical artifacts and implications of the Academic API. We hope that our work can add another layer of understanding of Twitter data collections leading to more reliable studies of human behavior via social media data.

查看原文本刊更多论文

这个样本似乎足够好!评估Twitter学术API的覆盖范围和时间可靠性

由于愿意与学术界和工业界分享数据，Twitter在过去十年中一直是科学研究、咨询企业和政府的主要社交媒体平台。近年来，一系列出版物对Twitter的api进行了研究和批评，Twitter也对其现有数据流进行了部分调整。最新的Twitter学术研究API允许“访问Twitter的实时和历史公共数据，并具有额外的特性和功能，支持收集更精确，完整和公正的数据集。”这个API的主要新特性是可以访问所有历史tweet的完整存档。在本文中，我们将仔细研究Academic API，并尝试回答两个问题。首先，学术API收集的数据集是否完整?其次，由于Twitter的学术API提供了数据收集时在Twitter上表示的历史推文，我们需要了解由于推文和账户从平台上删除而随着时间的推移丢失了多少数据。我们的工作证明，Twitter的学术API确实可以基于各种各样的搜索词创建(几乎)完整的Twitter数据样本。我们还提供了证据，证明Twitter的数据端点v2比以前使用的端点v1.1提供了更好的样本。此外，在研究一种现象时，使用Academic API收集tweet，而不是创建存储tweet的本地存档，这是遵循Twitter开发人员协议的一种直接方式。最后，我们还将讨论学术API的技术构件和含义。我们希望我们的工作可以增加对Twitter数据收集的另一层理解，从而通过社交媒体数据对人类行为进行更可靠的研究。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the International AAAI Conference on Web and Social Media

自引率

0.00%

发文量