在社交媒体上检测新闻相关性:使用自动替代特征的两个案例研究

Á. Figueira, N. Guimarães
{"title":"在社交媒体上检测新闻相关性:使用自动替代特征的两个案例研究","authors":"Á. Figueira, N. Guimarães","doi":"10.1145/3110025.3122120","DOIUrl":null,"url":null,"abstract":"The expansion of social networks has contributed to the propagation of information relevant to general audiences. However, this is small percentage compared to all the data shared in such online platforms, which also includes private/personal information, simple chat messages and the recent called 'fake news'. In this paper, we make an exploratory analysis on two social networks to extract features that are indicators of relevant information in social network messages. Our goal is to build accurate machine learning models that are capable of detecting what is journalistically relevant. We conducted two experiments on CrowdFlower to build a solid ground truth for the models, by comparing the number of evaluations per post against the number of posts classified. The results show evidence that increasing the number of samples will result in a better performance on the relevancy classification task, even when relaxing in the number of evaluations per post. In addition, results show that there are significant correlations between the relevance of a post and its interest and whether is meaningfully for the majority of people. Finally, we achieve approximately 80% accuracy in the task of relevance detection using a small set of learning algorithms.","PeriodicalId":399660,"journal":{"name":"Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2017","volume":"55 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Detecting Journalistic Relevance on Social Media: A two-case study using automatic surrogate features\",\"authors\":\"Á. Figueira, N. Guimarães\",\"doi\":\"10.1145/3110025.3122120\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The expansion of social networks has contributed to the propagation of information relevant to general audiences. However, this is small percentage compared to all the data shared in such online platforms, which also includes private/personal information, simple chat messages and the recent called 'fake news'. In this paper, we make an exploratory analysis on two social networks to extract features that are indicators of relevant information in social network messages. Our goal is to build accurate machine learning models that are capable of detecting what is journalistically relevant. We conducted two experiments on CrowdFlower to build a solid ground truth for the models, by comparing the number of evaluations per post against the number of posts classified. The results show evidence that increasing the number of samples will result in a better performance on the relevancy classification task, even when relaxing in the number of evaluations per post. In addition, results show that there are significant correlations between the relevance of a post and its interest and whether is meaningfully for the majority of people. Finally, we achieve approximately 80% accuracy in the task of relevance detection using a small set of learning algorithms.\",\"PeriodicalId\":399660,\"journal\":{\"name\":\"Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2017\",\"volume\":\"55 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-07-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2017\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3110025.3122120\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2017","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3110025.3122120","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7

摘要

社会网络的扩大促进了与一般受众相关的信息的传播。然而,与这些在线平台上共享的所有数据相比,这一比例很小,这些数据还包括私人/个人信息、简单的聊天信息和最近所谓的“假新闻”。本文对两个社交网络进行探索性分析,提取社交网络消息中相关信息指标的特征。我们的目标是建立准确的机器学习模型,能够检测出与新闻相关的内容。我们在CrowdFlower上进行了两次实验,通过比较每篇帖子的评价数量和分类的帖子数量,为模型建立坚实的基础。结果表明,即使在每个职位的评价数量减少的情况下,增加样本数量也会导致相关性分类任务的更好表现。此外,结果显示,帖子的相关性与其兴趣以及是否对大多数人有意义之间存在显着相关性。最后,我们使用一小组学习算法在相关性检测任务中实现了大约80%的准确率。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Detecting Journalistic Relevance on Social Media: A two-case study using automatic surrogate features
The expansion of social networks has contributed to the propagation of information relevant to general audiences. However, this is small percentage compared to all the data shared in such online platforms, which also includes private/personal information, simple chat messages and the recent called 'fake news'. In this paper, we make an exploratory analysis on two social networks to extract features that are indicators of relevant information in social network messages. Our goal is to build accurate machine learning models that are capable of detecting what is journalistically relevant. We conducted two experiments on CrowdFlower to build a solid ground truth for the models, by comparing the number of evaluations per post against the number of posts classified. The results show evidence that increasing the number of samples will result in a better performance on the relevancy classification task, even when relaxing in the number of evaluations per post. In addition, results show that there are significant correlations between the relevance of a post and its interest and whether is meaningfully for the majority of people. Finally, we achieve approximately 80% accuracy in the task of relevance detection using a small set of learning algorithms.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信