Linguistic Pattern Mining for Data Analysis in Microblog Texts using Word Embeddings

Danielly Sorato, Renato Fileto
{"title":"Linguistic Pattern Mining for Data Analysis in Microblog Texts using Word Embeddings","authors":"Danielly Sorato, Renato Fileto","doi":"10.1145/3330204.3330228","DOIUrl":null,"url":null,"abstract":"Microblog posts (e.g. tweets) often contain users opinions and thoughts about events, products, people, organizations, among other possibilities. However, the usage of social media to promote online disinformation and manipulation is not an uncommon occurrence. Analyzing the characteristics of such discourses in social media is essential for understanding and fighting such actions. Extracting recurrent fragments of text, i.e. word sequences, which are semantically similar can lead to the discovery of linguistic patterns used in certain kinds of discourse. Therefore, we aim to use such patterns to encapsulate frequent discourses textually expressed in microblog posts. In this paper, we propose to exploit linguistic patterns in the context of the 2016 United Estates presidential election. Through a technique that we call Short Semantic Pattern (SSP) mining, we were able to extract sequences of words that share a similar meaning in their word embedding representation. In the experiments we investigate the incidence of SSP instances regarding political adversaries and media in tweets posted by Donald Trump, during the presidential election campaign. Experimental results show a high preponderance of some statements of Donald Trump towards their adversaries and expressions that often appeared in such tweets.","PeriodicalId":348938,"journal":{"name":"Proceedings of the XV Brazilian Symposium on Information Systems","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the XV Brazilian Symposium on Information Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3330204.3330228","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

Microblog posts (e.g. tweets) often contain users opinions and thoughts about events, products, people, organizations, among other possibilities. However, the usage of social media to promote online disinformation and manipulation is not an uncommon occurrence. Analyzing the characteristics of such discourses in social media is essential for understanding and fighting such actions. Extracting recurrent fragments of text, i.e. word sequences, which are semantically similar can lead to the discovery of linguistic patterns used in certain kinds of discourse. Therefore, we aim to use such patterns to encapsulate frequent discourses textually expressed in microblog posts. In this paper, we propose to exploit linguistic patterns in the context of the 2016 United Estates presidential election. Through a technique that we call Short Semantic Pattern (SSP) mining, we were able to extract sequences of words that share a similar meaning in their word embedding representation. In the experiments we investigate the incidence of SSP instances regarding political adversaries and media in tweets posted by Donald Trump, during the presidential election campaign. Experimental results show a high preponderance of some statements of Donald Trump towards their adversaries and expressions that often appeared in such tweets.
基于词嵌入的微博文本数据挖掘
微博帖子(如tweets)通常包含用户对事件、产品、人物、组织等的看法和想法。然而,利用社交媒体促进在线虚假信息和操纵并不罕见。分析社交媒体中此类话语的特征对于理解和打击此类行为至关重要。提取文本中重复出现的片段,即语义相似的词序列,可以发现某些类型话语中使用的语言模式。因此,我们的目标是用这种模式来封装在微博中频繁表达的语篇。在本文中,我们建议在2016年美国总统选举的背景下利用语言模式。通过一种我们称为短语义模式(SSP)挖掘的技术,我们能够提取在单词嵌入表示中具有相似含义的单词序列。在实验中,我们调查了唐纳德·特朗普在总统竞选期间发布的推文中关于政治对手和媒体的SSP实例的发生率。实验结果显示,唐纳德·特朗普对对手的一些言论和经常出现在这类推文中的表达具有很高的优势。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信