Real-Time Prediction of Online False Information Purveyors and their Characteristics

InfoSciRN: Machine Learning (Sub-Topic) Pub Date : 2020-10-30 DOI:10.2139/ssrn.3725919

Anil R. Doshi, S. Raghavan, W. Schmidt

{"title":"Real-Time Prediction of Online False Information Purveyors and their Characteristics","authors":"Anil R. Doshi, S. Raghavan, W. Schmidt","doi":"10.2139/ssrn.3725919","DOIUrl":null,"url":null,"abstract":"Disinformation, misinformation, and other 'fake news' - collectively false information - is quick and inexpensive to create and distribute in our increasingly digital and connected world. Identifying false information early and cost effectively can offset some of those operational advantages. In this paper, we develop light-weight machine learning models that utilize (1) a novel data set tracking browsing behavior and (2) domain registration data that is available for all websites when they are established. Using only the domain registration data, we develop and demonstrate a machine learning classifier that identifies domains, at the time the domain is registered, that will go on to produce false information. We then combine this data with our browsing data and develop a machine learning classifier that identifies false information domains whose content is most associated with higher levels of consumption. Finally, we use our data to identify false information domains that will cease operations after an event of interest, in our case the 2016 U.S. presidential election. We theorize that the last category involves actors seeking primarily to manipulate perceptions and outcomes of that event.","PeriodicalId":189628,"journal":{"name":"InfoSciRN: Machine Learning (Sub-Topic)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"InfoSciRN: Machine Learning (Sub-Topic)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2139/ssrn.3725919","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Disinformation, misinformation, and other 'fake news' - collectively false information - is quick and inexpensive to create and distribute in our increasingly digital and connected world. Identifying false information early and cost effectively can offset some of those operational advantages. In this paper, we develop light-weight machine learning models that utilize (1) a novel data set tracking browsing behavior and (2) domain registration data that is available for all websites when they are established. Using only the domain registration data, we develop and demonstrate a machine learning classifier that identifies domains, at the time the domain is registered, that will go on to produce false information. We then combine this data with our browsing data and develop a machine learning classifier that identifies false information domains whose content is most associated with higher levels of consumption. Finally, we use our data to identify false information domains that will cease operations after an event of interest, in our case the 2016 U.S. presidential election. We theorize that the last category involves actors seeking primarily to manipulate perceptions and outcomes of that event.

查看原文本刊更多论文

网络虚假信息提供者的实时预测及其特征

在我们这个日益数字化和互联化的世界里，虚假信息、错误信息和其他“假新闻”——统称为虚假信息——可以快速而廉价地创造和传播。及早发现虚假信息并降低成本，可以抵消部分运营优势。在本文中，我们开发了轻量级机器学习模型，该模型利用(1)跟踪浏览行为的新数据集和(2)所有网站建立时可用的域名注册数据。仅使用域注册数据，我们开发并演示了一个机器学习分类器，该分类器在域注册时识别域，这将继续产生错误信息。然后，我们将这些数据与我们的浏览数据结合起来，开发了一个机器学习分类器，该分类器可以识别其内容与较高消费水平最相关的虚假信息域。最后，我们使用我们的数据来识别虚假信息域，这些虚假信息域将在感兴趣的事件后停止运营，在我们的案例中是2016年美国总统大选。我们的理论认为，最后一类行为者主要寻求操纵事件的感知和结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

InfoSciRN: Machine Learning (Sub-Topic)

自引率

0.00%

发文量