社交图生成与分析的数据挖掘框架

Danish Kumar, Md. Imran Hossain Showrov
{"title":"社交图生成与分析的数据挖掘框架","authors":"Danish Kumar, Md. Imran Hossain Showrov","doi":"10.1109/ICIET48527.2019.9290584","DOIUrl":null,"url":null,"abstract":"Due to the increasing popularity and easy accessibility of social networking services, the number of users in social networks is increasing rapidly. As a result, their size and user-generated content are growing day-to-day. One of the requirements is to capture such a huge amount of data and analyze them for desired purposes, such as target marketing, recommender system design, open-source intelligence, and cybersecurity. Twitter is one of the most popular social network sites (aka microblogging site) and it is used by almost every person for the news update, information sharing, viral marketing, etc using 280 characters. In this paper, we have applied a text analytics framework to analyze Twitter data at different levels of granularity. One of the distinguishing features of the proposed framework is to exploit both content and structural information for tweets analysis. The proposed framework first models the tweets into a multi-attributed graph, wherein tweets are rep-resented as nodes and inter-tweet relationships are represented as edges. For node labeling, we have used NLP techniques to identify features from tweets, whereas edges are labeled using meta-data (such as hashtags, mentions, followers, etc.) that are common to the tweets connected by the edges. For analyzing the multi-attributed graph, we have considered two algorithms, MAG-Dist and MAG-Sim, to convert the multi-attributed graph into a simple and similarity graph. Finally, we have applied MCL (Markov Clustering) algorithm to cluster the nodes of the multi-attributed graph, each group consisting of a particular subset of tweets representing an event. The experimental evaluation of the proposed approach is done on a real dataset crawled from Twitter.","PeriodicalId":427838,"journal":{"name":"2019 2nd International Conference on Innovation in Engineering and Technology (ICIET)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"A Data Mining Framework for Social Graph Generation and Analysis\",\"authors\":\"Danish Kumar, Md. Imran Hossain Showrov\",\"doi\":\"10.1109/ICIET48527.2019.9290584\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Due to the increasing popularity and easy accessibility of social networking services, the number of users in social networks is increasing rapidly. As a result, their size and user-generated content are growing day-to-day. One of the requirements is to capture such a huge amount of data and analyze them for desired purposes, such as target marketing, recommender system design, open-source intelligence, and cybersecurity. Twitter is one of the most popular social network sites (aka microblogging site) and it is used by almost every person for the news update, information sharing, viral marketing, etc using 280 characters. In this paper, we have applied a text analytics framework to analyze Twitter data at different levels of granularity. One of the distinguishing features of the proposed framework is to exploit both content and structural information for tweets analysis. The proposed framework first models the tweets into a multi-attributed graph, wherein tweets are rep-resented as nodes and inter-tweet relationships are represented as edges. For node labeling, we have used NLP techniques to identify features from tweets, whereas edges are labeled using meta-data (such as hashtags, mentions, followers, etc.) that are common to the tweets connected by the edges. For analyzing the multi-attributed graph, we have considered two algorithms, MAG-Dist and MAG-Sim, to convert the multi-attributed graph into a simple and similarity graph. Finally, we have applied MCL (Markov Clustering) algorithm to cluster the nodes of the multi-attributed graph, each group consisting of a particular subset of tweets representing an event. The experimental evaluation of the proposed approach is done on a real dataset crawled from Twitter.\",\"PeriodicalId\":427838,\"journal\":{\"name\":\"2019 2nd International Conference on Innovation in Engineering and Technology (ICIET)\",\"volume\":\"18 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-12-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 2nd International Conference on Innovation in Engineering and Technology (ICIET)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICIET48527.2019.9290584\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 2nd International Conference on Innovation in Engineering and Technology (ICIET)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICIET48527.2019.9290584","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

摘要

由于社交网络服务的日益普及和易用性,社交网络的用户数量正在迅速增加。因此,它们的规模和用户生成的内容每天都在增长。其中一个要求是捕获如此大量的数据并对其进行分析,以达到预期的目的,例如目标营销、推荐系统设计、开源智能和网络安全。Twitter是最受欢迎的社交网站之一(又名微博网站),几乎每个人都用它来更新新闻,分享信息,进行病毒式营销等。在本文中,我们应用了一个文本分析框架来分析不同粒度级别的Twitter数据。该框架的一个显著特征是同时利用推文的内容和结构信息进行分析。该框架首先将推文建模成一个多属性图,其中推文被表示为节点,推间关系被表示为边。对于节点标记,我们使用了NLP技术来识别推文的特征,而边缘则使用元数据(如hashtag、提及、关注者等)来标记,这些元数据是由边缘连接的推文所共有的。为了分析多属性图,我们考虑了MAG-Dist和MAG-Sim两种算法,将多属性图转换成简单的相似图。最后,我们应用了MCL(马尔可夫聚类)算法对多属性图的节点进行聚类,每一组由代表一个事件的tweet的特定子集组成。在从Twitter抓取的真实数据集上对所提出的方法进行了实验评估。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
A Data Mining Framework for Social Graph Generation and Analysis
Due to the increasing popularity and easy accessibility of social networking services, the number of users in social networks is increasing rapidly. As a result, their size and user-generated content are growing day-to-day. One of the requirements is to capture such a huge amount of data and analyze them for desired purposes, such as target marketing, recommender system design, open-source intelligence, and cybersecurity. Twitter is one of the most popular social network sites (aka microblogging site) and it is used by almost every person for the news update, information sharing, viral marketing, etc using 280 characters. In this paper, we have applied a text analytics framework to analyze Twitter data at different levels of granularity. One of the distinguishing features of the proposed framework is to exploit both content and structural information for tweets analysis. The proposed framework first models the tweets into a multi-attributed graph, wherein tweets are rep-resented as nodes and inter-tweet relationships are represented as edges. For node labeling, we have used NLP techniques to identify features from tweets, whereas edges are labeled using meta-data (such as hashtags, mentions, followers, etc.) that are common to the tweets connected by the edges. For analyzing the multi-attributed graph, we have considered two algorithms, MAG-Dist and MAG-Sim, to convert the multi-attributed graph into a simple and similarity graph. Finally, we have applied MCL (Markov Clustering) algorithm to cluster the nodes of the multi-attributed graph, each group consisting of a particular subset of tweets representing an event. The experimental evaluation of the proposed approach is done on a real dataset crawled from Twitter.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信