社交图生成与分析的数据挖掘框架

2019 2nd International Conference on Innovation in Engineering and Technology (ICIET) Pub Date : 2019-12-23 DOI:10.1109/ICIET48527.2019.9290584

Danish Kumar, Md. Imran Hossain Showrov

{"title":"社交图生成与分析的数据挖掘框架","authors":"Danish Kumar, Md. Imran Hossain Showrov","doi":"10.1109/ICIET48527.2019.9290584","DOIUrl":null,"url":null,"abstract":"Due to the increasing popularity and easy accessibility of social networking services, the number of users in social networks is increasing rapidly. As a result, their size and user-generated content are growing day-to-day. One of the requirements is to capture such a huge amount of data and analyze them for desired purposes, such as target marketing, recommender system design, open-source intelligence, and cybersecurity. Twitter is one of the most popular social network sites (aka microblogging site) and it is used by almost every person for the news update, information sharing, viral marketing, etc using 280 characters. In this paper, we have applied a text analytics framework to analyze Twitter data at different levels of granularity. One of the distinguishing features of the proposed framework is to exploit both content and structural information for tweets analysis. The proposed framework first models the tweets into a multi-attributed graph, wherein tweets are rep-resented as nodes and inter-tweet relationships are represented as edges. For node labeling, we have used NLP techniques to identify features from tweets, whereas edges are labeled using meta-data (such as hashtags, mentions, followers, etc.) that are common to the tweets connected by the edges. For analyzing the multi-attributed graph, we have considered two algorithms, MAG-Dist and MAG-Sim, to convert the multi-attributed graph into a simple and similarity graph. Finally, we have applied MCL (Markov Clustering) algorithm to cluster the nodes of the multi-attributed graph, each group consisting of a particular subset of tweets representing an event. The experimental evaluation of the proposed approach is done on a real dataset crawled from Twitter.","PeriodicalId":427838,"journal":{"name":"2019 2nd International Conference on Innovation in Engineering and Technology (ICIET)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"A Data Mining Framework for Social Graph Generation and Analysis\",\"authors\":\"Danish Kumar, Md. Imran Hossain Showrov\",\"doi\":\"10.1109/ICIET48527.2019.9290584\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Due to the increasing popularity and easy accessibility of social networking services, the number of users in social networks is increasing rapidly. As a result, their size and user-generated content are growing day-to-day. One of the requirements is to capture such a huge amount of data and analyze them for desired purposes, such as target marketing, recommender system design, open-source intelligence, and cybersecurity. Twitter is one of the most popular social network sites (aka microblogging site) and it is used by almost every person for the news update, information sharing, viral marketing, etc using 280 characters. In this paper, we have applied a text analytics framework to analyze Twitter data at different levels of granularity. One of the distinguishing features of the proposed framework is to exploit both content and structural information for tweets analysis. The proposed framework first models the tweets into a multi-attributed graph, wherein tweets are rep-resented as nodes and inter-tweet relationships are represented as edges. For node labeling, we have used NLP techniques to identify features from tweets, whereas edges are labeled using meta-data (such as hashtags, mentions, followers, etc.) that are common to the tweets connected by the edges. For analyzing the multi-attributed graph, we have considered two algorithms, MAG-Dist and MAG-Sim, to convert the multi-attributed graph into a simple and similarity graph. Finally, we have applied MCL (Markov Clustering) algorithm to cluster the nodes of the multi-attributed graph, each group consisting of a particular subset of tweets representing an event. The experimental evaluation of the proposed approach is done on a real dataset crawled from Twitter.\",\"PeriodicalId\":427838,\"journal\":{\"name\":\"2019 2nd International Conference on Innovation in Engineering and Technology (ICIET)\",\"volume\":\"18 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-12-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 2nd International Conference on Innovation in Engineering and Technology (ICIET)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICIET48527.2019.9290584\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 2nd International Conference on Innovation in Engineering and Technology (ICIET)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICIET48527.2019.9290584","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

由于社交网络服务的日益普及和易用性，社交网络的用户数量正在迅速增加。因此，它们的规模和用户生成的内容每天都在增长。其中一个要求是捕获如此大量的数据并对其进行分析，以达到预期的目的，例如目标营销、推荐系统设计、开源智能和网络安全。Twitter是最受欢迎的社交网站之一(又名微博网站)，几乎每个人都用它来更新新闻，分享信息，进行病毒式营销等。在本文中，我们应用了一个文本分析框架来分析不同粒度级别的Twitter数据。该框架的一个显著特征是同时利用推文的内容和结构信息进行分析。该框架首先将推文建模成一个多属性图，其中推文被表示为节点，推间关系被表示为边。对于节点标记，我们使用了NLP技术来识别推文的特征，而边缘则使用元数据(如hashtag、提及、关注者等)来标记，这些元数据是由边缘连接的推文所共有的。为了分析多属性图，我们考虑了MAG-Dist和MAG-Sim两种算法，将多属性图转换成简单的相似图。最后，我们应用了MCL(马尔可夫聚类)算法对多属性图的节点进行聚类，每一组由代表一个事件的tweet的特定子集组成。在从Twitter抓取的真实数据集上对所提出的方法进行了实验评估。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A Data Mining Framework for Social Graph Generation and Analysis

Due to the increasing popularity and easy accessibility of social networking services, the number of users in social networks is increasing rapidly. As a result, their size and user-generated content are growing day-to-day. One of the requirements is to capture such a huge amount of data and analyze them for desired purposes, such as target marketing, recommender system design, open-source intelligence, and cybersecurity. Twitter is one of the most popular social network sites (aka microblogging site) and it is used by almost every person for the news update, information sharing, viral marketing, etc using 280 characters. In this paper, we have applied a text analytics framework to analyze Twitter data at different levels of granularity. One of the distinguishing features of the proposed framework is to exploit both content and structural information for tweets analysis. The proposed framework first models the tweets into a multi-attributed graph, wherein tweets are rep-resented as nodes and inter-tweet relationships are represented as edges. For node labeling, we have used NLP techniques to identify features from tweets, whereas edges are labeled using meta-data (such as hashtags, mentions, followers, etc.) that are common to the tweets connected by the edges. For analyzing the multi-attributed graph, we have considered two algorithms, MAG-Dist and MAG-Sim, to convert the multi-attributed graph into a simple and similarity graph. Finally, we have applied MCL (Markov Clustering) algorithm to cluster the nodes of the multi-attributed graph, each group consisting of a particular subset of tweets representing an event. The experimental evaluation of the proposed approach is done on a real dataset crawled from Twitter.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2019 2nd International Conference on Innovation in Engineering and Technology (ICIET)

自引率

0.00%

发文量