{"title":"Seeder finder: identifying additional needles in the Twitter haystack","authors":"Nick Gramsky, H. Samet","doi":"10.1145/2536689.2536808","DOIUrl":null,"url":null,"abstract":"TwitterStand is a novel way to track the news cycle by allowing people to view and browse the news with a map query interface. TF-IDF scores for each document that is linked to by a tweet (also termed twanchor [22] when the document is a news article) are calculated after they enter the system and pass initial classification filters. These scores are used to cluster similar tweets. Clusters must contain tweets from reputable sources in order for the clusters to form. These reputable sources are known as seeders as they essentially seed a cluster. Seeders have become an integral part of the TwitterStand architecture. An optimal system monitors the set of seeders in order to find newsworthy tweets quickly.\n This paper proposes methods to improve the current list of seeders by augmenting the pool with previously undiscovered users while routinely eliminating those that do not bring any value. We consider a successful seeder one who is timely in the reporting of large newsworthy events. An analysis of the current seeders precedes a proposed approach and serves as the basis for quantifying future seeder churn. A qualitative analysis based on that approach is conducted in an effort to quantitatively evaluate the process.","PeriodicalId":107369,"journal":{"name":"Workshop on Location-based Social Networks","volume":"125 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"32","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Workshop on Location-based Social Networks","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2536689.2536808","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 32
Abstract
TwitterStand is a novel way to track the news cycle by allowing people to view and browse the news with a map query interface. TF-IDF scores for each document that is linked to by a tweet (also termed twanchor [22] when the document is a news article) are calculated after they enter the system and pass initial classification filters. These scores are used to cluster similar tweets. Clusters must contain tweets from reputable sources in order for the clusters to form. These reputable sources are known as seeders as they essentially seed a cluster. Seeders have become an integral part of the TwitterStand architecture. An optimal system monitors the set of seeders in order to find newsworthy tweets quickly.
This paper proposes methods to improve the current list of seeders by augmenting the pool with previously undiscovered users while routinely eliminating those that do not bring any value. We consider a successful seeder one who is timely in the reporting of large newsworthy events. An analysis of the current seeders precedes a proposed approach and serves as the basis for quantifying future seeder churn. A qualitative analysis based on that approach is conducted in an effort to quantitatively evaluate the process.