在社交媒体上检测新闻相关性:使用自动替代特征的两个案例研究

Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2017 Pub Date : 2017-07-31 DOI:10.1145/3110025.3122120

Á. Figueira, N. Guimarães

{"title":"在社交媒体上检测新闻相关性:使用自动替代特征的两个案例研究","authors":"Á. Figueira, N. Guimarães","doi":"10.1145/3110025.3122120","DOIUrl":null,"url":null,"abstract":"The expansion of social networks has contributed to the propagation of information relevant to general audiences. However, this is small percentage compared to all the data shared in such online platforms, which also includes private/personal information, simple chat messages and the recent called 'fake news'. In this paper, we make an exploratory analysis on two social networks to extract features that are indicators of relevant information in social network messages. Our goal is to build accurate machine learning models that are capable of detecting what is journalistically relevant. We conducted two experiments on CrowdFlower to build a solid ground truth for the models, by comparing the number of evaluations per post against the number of posts classified. The results show evidence that increasing the number of samples will result in a better performance on the relevancy classification task, even when relaxing in the number of evaluations per post. In addition, results show that there are significant correlations between the relevance of a post and its interest and whether is meaningfully for the majority of people. Finally, we achieve approximately 80% accuracy in the task of relevance detection using a small set of learning algorithms.","PeriodicalId":399660,"journal":{"name":"Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2017","volume":"55 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Detecting Journalistic Relevance on Social Media: A two-case study using automatic surrogate features\",\"authors\":\"Á. Figueira, N. Guimarães\",\"doi\":\"10.1145/3110025.3122120\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The expansion of social networks has contributed to the propagation of information relevant to general audiences. However, this is small percentage compared to all the data shared in such online platforms, which also includes private/personal information, simple chat messages and the recent called 'fake news'. In this paper, we make an exploratory analysis on two social networks to extract features that are indicators of relevant information in social network messages. Our goal is to build accurate machine learning models that are capable of detecting what is journalistically relevant. We conducted two experiments on CrowdFlower to build a solid ground truth for the models, by comparing the number of evaluations per post against the number of posts classified. The results show evidence that increasing the number of samples will result in a better performance on the relevancy classification task, even when relaxing in the number of evaluations per post. In addition, results show that there are significant correlations between the relevance of a post and its interest and whether is meaningfully for the majority of people. Finally, we achieve approximately 80% accuracy in the task of relevance detection using a small set of learning algorithms.\",\"PeriodicalId\":399660,\"journal\":{\"name\":\"Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2017\",\"volume\":\"55 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-07-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2017\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3110025.3122120\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2017","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3110025.3122120","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

摘要

社会网络的扩大促进了与一般受众相关的信息的传播。然而，与这些在线平台上共享的所有数据相比，这一比例很小，这些数据还包括私人/个人信息、简单的聊天信息和最近所谓的“假新闻”。本文对两个社交网络进行探索性分析，提取社交网络消息中相关信息指标的特征。我们的目标是建立准确的机器学习模型，能够检测出与新闻相关的内容。我们在CrowdFlower上进行了两次实验，通过比较每篇帖子的评价数量和分类的帖子数量，为模型建立坚实的基础。结果表明，即使在每个职位的评价数量减少的情况下，增加样本数量也会导致相关性分类任务的更好表现。此外，结果显示，帖子的相关性与其兴趣以及是否对大多数人有意义之间存在显着相关性。最后，我们使用一小组学习算法在相关性检测任务中实现了大约80%的准确率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Detecting Journalistic Relevance on Social Media: A two-case study using automatic surrogate features

The expansion of social networks has contributed to the propagation of information relevant to general audiences. However, this is small percentage compared to all the data shared in such online platforms, which also includes private/personal information, simple chat messages and the recent called 'fake news'. In this paper, we make an exploratory analysis on two social networks to extract features that are indicators of relevant information in social network messages. Our goal is to build accurate machine learning models that are capable of detecting what is journalistically relevant. We conducted two experiments on CrowdFlower to build a solid ground truth for the models, by comparing the number of evaluations per post against the number of posts classified. The results show evidence that increasing the number of samples will result in a better performance on the relevancy classification task, even when relaxing in the number of evaluations per post. In addition, results show that there are significant correlations between the relevance of a post and its interest and whether is meaningfully for the majority of people. Finally, we achieve approximately 80% accuracy in the task of relevance detection using a small set of learning algorithms.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2017

自引率

0.00%

发文量