A robust algorithm for determining the newsworthiness of microblogs

P. K. K. Madhawa, Ajantha S Atukorale
{"title":"A robust algorithm for determining the newsworthiness of microblogs","authors":"P. K. K. Madhawa, Ajantha S Atukorale","doi":"10.1109/ICTER.2015.7377679","DOIUrl":null,"url":null,"abstract":"Microblogging platforms such as Twitter have become a primary medium for people to share their experiences and opinions on a broad range of topics. Because posts on Twitter are publicly viewable by default, Twitter can be used to get up-to-date information on events like natural disasters, disease outbreaks or sports events. Building a cohesive summary out of tweets on long running events is an interesting problem which research community is interested in. But the abundance of tweets containing user opinions and their sentiments towards a topic necessitates the need of extracting newsworthy tweets from a large stream of tweets on a single topic. But most of such methods require large hand-labeled corpora to be used for training the model. But this is not practical for a rapidly updating medium like Twitter. In this paper we address this problem with the introduction of a novel heuristic based annotation scheme to generate training dataset for the system. A hand-labeled corpus of tweets is only used for benchmarking the objectivity classifier. Our classifier could achieve an F1-score of 80% on a manually annotated gold standard dataset.","PeriodicalId":142561,"journal":{"name":"2015 Fifteenth International Conference on Advances in ICT for Emerging Regions (ICTer)","volume":"278 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 Fifteenth International Conference on Advances in ICT for Emerging Regions (ICTer)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICTER.2015.7377679","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

Microblogging platforms such as Twitter have become a primary medium for people to share their experiences and opinions on a broad range of topics. Because posts on Twitter are publicly viewable by default, Twitter can be used to get up-to-date information on events like natural disasters, disease outbreaks or sports events. Building a cohesive summary out of tweets on long running events is an interesting problem which research community is interested in. But the abundance of tweets containing user opinions and their sentiments towards a topic necessitates the need of extracting newsworthy tweets from a large stream of tweets on a single topic. But most of such methods require large hand-labeled corpora to be used for training the model. But this is not practical for a rapidly updating medium like Twitter. In this paper we address this problem with the introduction of a novel heuristic based annotation scheme to generate training dataset for the system. A hand-labeled corpus of tweets is only used for benchmarking the objectivity classifier. Our classifier could achieve an F1-score of 80% on a manually annotated gold standard dataset.
一种确定微博新闻价值的稳健算法
像Twitter这样的微博平台已经成为人们就广泛的话题分享经验和观点的主要媒介。因为Twitter上的帖子默认情况下是公开可见的,所以Twitter可以用来获取自然灾害、疾病爆发或体育赛事等事件的最新信息。从长期运行的事件的tweet中构建一个有凝聚力的摘要是研究社区感兴趣的一个有趣的问题。但是,大量包含用户观点和他们对某个主题的看法的推文,使得需要从单个主题的大量推文中提取有新闻价值的推文。但是大多数这样的方法都需要使用大型手工标记的语料库来训练模型。但这对于像Twitter这样快速更新的媒体来说是不切实际的。在本文中,我们通过引入一种新的启发式注释方案来解决这个问题,该方案为系统生成训练数据集。手工标记的tweet语料库仅用于对客观性分类器进行基准测试。我们的分类器可以在手动注释的金标准数据集上达到80%的f1分数。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信