Using BERT to Extract Topic-Independent Sentiment Features for Social Media Bot Detection

2020 11th IEEE Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON) Pub Date : 2020-10-28 DOI:10.1109/UEMCON51285.2020.9298158

Maryam Heidari, James H. Jones

{"title":"Using BERT to Extract Topic-Independent Sentiment Features for Social Media Bot Detection","authors":"Maryam Heidari, James H. Jones","doi":"10.1109/UEMCON51285.2020.9298158","DOIUrl":null,"url":null,"abstract":"Millions of online posts about different topics and products are shared on popular social media platforms. One use of this content is to provide crowd-sourced information about a specific topic, event, or product. However, this use raises an important question: what percentage of the information available through these services is trustworthy? In particular, might some of this information be generated by a machine, i.e., a \"bot\" instead of a human? Bots can be, and often are, purposely designed to generate enough volume to skew an apparent trend or position on a topic, yet the consumer of such content cannot easily distinguish a bot post from a human post. This paper introduces a new model that uses Bidirectional Encoder Representations from Transformers (Google Bert) for sentiment classification of tweets to identify topic-independent features for the social media bot detection model. Using a Natural Language Processing approach to derive topic-independent features for the new bot detection model distinguishes this work from previous bot detection models. We achieve 94% accuracy classifying the contents of Cresci data set [1] as generated by a bot or a human, where the most accurate prior work achieved an accuracy of 92%.","PeriodicalId":433609,"journal":{"name":"2020 11th IEEE Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"61","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 11th IEEE Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/UEMCON51285.2020.9298158","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 61

Abstract

Millions of online posts about different topics and products are shared on popular social media platforms. One use of this content is to provide crowd-sourced information about a specific topic, event, or product. However, this use raises an important question: what percentage of the information available through these services is trustworthy? In particular, might some of this information be generated by a machine, i.e., a "bot" instead of a human? Bots can be, and often are, purposely designed to generate enough volume to skew an apparent trend or position on a topic, yet the consumer of such content cannot easily distinguish a bot post from a human post. This paper introduces a new model that uses Bidirectional Encoder Representations from Transformers (Google Bert) for sentiment classification of tweets to identify topic-independent features for the social media bot detection model. Using a Natural Language Processing approach to derive topic-independent features for the new bot detection model distinguishes this work from previous bot detection models. We achieve 94% accuracy classifying the contents of Cresci data set [1] as generated by a bot or a human, where the most accurate prior work achieved an accuracy of 92%.

查看原文本刊更多论文

基于BERT提取话题无关情感特征的社交媒体机器人检测

在流行的社交媒体平台上，数以百万计的关于不同话题和产品的在线帖子被分享。此内容的一个用途是提供关于特定主题、事件或产品的众包信息。然而，这种用法提出了一个重要的问题:通过这些服务提供的信息中有多少是可信的?特别是，其中一些信息可能是由机器(即“bot”)而不是人类生成的吗?机器人可以，而且经常是，故意设计来产生足够的量来扭曲一个主题的明显趋势或立场，然而这些内容的消费者无法轻易区分机器人的帖子和人类的帖子。本文介绍了一种新的模型，该模型使用来自变形金刚的双向编码器表示(Google Bert)对推文进行情感分类，为社交媒体机器人检测模型识别与主题无关的特征。使用自然语言处理方法为新的机器人检测模型派生与主题无关的特征，将这项工作与以前的机器人检测模型区分开来。我们将Cresci数据集[1]的内容分类为机器人或人类生成的准确率达到了94%，其中最准确的先前工作达到了92%的准确率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2020 11th IEEE Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON)

自引率

0.00%

发文量