Seeder finder: identifying additional needles in the Twitter haystack

Nick Gramsky, H. Samet
{"title":"Seeder finder: identifying additional needles in the Twitter haystack","authors":"Nick Gramsky, H. Samet","doi":"10.1145/2536689.2536808","DOIUrl":null,"url":null,"abstract":"TwitterStand is a novel way to track the news cycle by allowing people to view and browse the news with a map query interface. TF-IDF scores for each document that is linked to by a tweet (also termed twanchor [22] when the document is a news article) are calculated after they enter the system and pass initial classification filters. These scores are used to cluster similar tweets. Clusters must contain tweets from reputable sources in order for the clusters to form. These reputable sources are known as seeders as they essentially seed a cluster. Seeders have become an integral part of the TwitterStand architecture. An optimal system monitors the set of seeders in order to find newsworthy tweets quickly.\n This paper proposes methods to improve the current list of seeders by augmenting the pool with previously undiscovered users while routinely eliminating those that do not bring any value. We consider a successful seeder one who is timely in the reporting of large newsworthy events. An analysis of the current seeders precedes a proposed approach and serves as the basis for quantifying future seeder churn. A qualitative analysis based on that approach is conducted in an effort to quantitatively evaluate the process.","PeriodicalId":107369,"journal":{"name":"Workshop on Location-based Social Networks","volume":"125 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"32","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Workshop on Location-based Social Networks","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2536689.2536808","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 32

Abstract

TwitterStand is a novel way to track the news cycle by allowing people to view and browse the news with a map query interface. TF-IDF scores for each document that is linked to by a tweet (also termed twanchor [22] when the document is a news article) are calculated after they enter the system and pass initial classification filters. These scores are used to cluster similar tweets. Clusters must contain tweets from reputable sources in order for the clusters to form. These reputable sources are known as seeders as they essentially seed a cluster. Seeders have become an integral part of the TwitterStand architecture. An optimal system monitors the set of seeders in order to find newsworthy tweets quickly. This paper proposes methods to improve the current list of seeders by augmenting the pool with previously undiscovered users while routinely eliminating those that do not bring any value. We consider a successful seeder one who is timely in the reporting of large newsworthy events. An analysis of the current seeders precedes a proposed approach and serves as the basis for quantifying future seeder churn. A qualitative analysis based on that approach is conducted in an effort to quantitatively evaluate the process.
播种机:在Twitter的干草堆中找出更多的针
TwitterStand是一种新颖的跟踪新闻周期的方式,它允许人们通过地图查询界面查看和浏览新闻。通过tweet链接的每个文档(当文档是新闻文章时也称为twanchor[22])在进入系统并通过初始分类过滤器后计算TF-IDF分数。这些分数用于聚类相似的tweet。集群必须包含来自信誉良好的来源的tweet,以便集群形成。这些信誉良好的信息源被称为播种者,因为它们实际上播下了一个集群的种子。播种器已经成为TwitterStand架构中不可或缺的一部分。一个最优的系统监控一组种子,以便快速找到有新闻价值的推文。本文提出了一些方法,通过增加以前未发现的用户池来改进当前的种子列表,同时常规地消除那些不带来任何价值的用户。我们认为一个成功的播种者是及时报道有新闻价值的大型事件。对当前播种机的分析先于提出的方法,并作为量化未来播种机搅拌的基础。为了对该过程进行定量评价,在此基础上进行了定性分析。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信